Search | arXiv e-print repository

LSM-2: Learning from Incomplete Wearable Sensor Data

Authors: Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff

Abstract: Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-… ▽ More Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: Xu and Narayanswamy are co-first authors. McDuff and Liu are co-last authors

arXiv:2504.21242 [pdf]

Passive Measurement of Autonomic Arousal in Real-World Settings

Authors: Samy Abdel-Ghaffar, Isaac Galatzer-Levy, Conor Heneghan, Xin Liu, Sarah Kernasovskiy, Brennan Garrett, Andrew Barakat, Daniel McDuff

Abstract: The autonomic nervous system (ANS) is activated during stress, which can have negative effects on cardiovascular health, sleep, the immune system, and mental health. While there are ways to quantify ANS activity in laboratories, there is a paucity of methods that have been validated in real-world contexts. We present the Fitbit Body Response Algorithm, an approach to continuous remote measurement… ▽ More The autonomic nervous system (ANS) is activated during stress, which can have negative effects on cardiovascular health, sleep, the immune system, and mental health. While there are ways to quantify ANS activity in laboratories, there is a paucity of methods that have been validated in real-world contexts. We present the Fitbit Body Response Algorithm, an approach to continuous remote measurement of ANS activation through widely available remote wrist-based sensors. The design was validated via two experiments, a Trier Social Stress Test (n = 45) and ecological momentary assessments (EMA) of perceived stress (n=87), providing both controlled and ecologically valid test data. Model performance predicting perceived stress when using all available sensor modalities was consistent with expectations (accuracy=0.85) and outperformed models with access to only a subset of the signals. We discuss and address challenges to sensing that arise in real world settings that do not present in conventional lab environments. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2410.13638 [pdf, other]

Scaling Wearable Foundation Models

Authors: Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

Abstract: Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre… ▽ More Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:cs/0607024 [pdf, ps, other]

Results on Parity-Check Matrices with Optimal Stopping and/or Dead-End Set Enumerators

Authors: Jos H. Weber, Khaled A. S. Abdel-Ghaffar

Abstract: The performance of iterative decoding techniques for linear block codes correcting erasures depends very much on the sizes of the stopping sets associated with the underlying Tanner graph, or, equivalently, the parity-check matrix representing the code. In this paper, we introduce the notion of dead-end sets to explicitly demonstrate this dependency. The choice of the parity-check matrix entails… ▽ More The performance of iterative decoding techniques for linear block codes correcting erasures depends very much on the sizes of the stopping sets associated with the underlying Tanner graph, or, equivalently, the parity-check matrix representing the code. In this paper, we introduce the notion of dead-end sets to explicitly demonstrate this dependency. The choice of the parity-check matrix entails a trade-off between performance and complexity. We give bounds on the complexity of iterative decoders achieving optimal performance in terms of the sizes of the underlying parity-check matrices. Further, we fully characterize codes for which the optimal stopping set enumerator equals the weight enumerator. △ Less

Submitted 7 July, 2006; originally announced July 2006.

Comments: 8 pages, submitted to IEEE Transactions on Information Theory

arXiv:cs/0603007 [pdf, ps, other]

Complete Enumeration of Stopping Sets of Full-Rank Parity-Check Matrices of Hamming Codes

Authors: Khaled A. S. Abdel-Ghaffar, Jos H. Weber

Abstract: Stopping sets, and in particular their numbers and sizes, play an important role in determining the performance of iterative decoders of linear codes over binary erasure channels. In the 2004 Shannon Lecture, McEliece presented an expression for the number of stopping sets of size three for a full-rank parity-check matrix of the Hamming code. In this correspondence, we derive an expression for t… ▽ More Stopping sets, and in particular their numbers and sizes, play an important role in determining the performance of iterative decoders of linear codes over binary erasure channels. In the 2004 Shannon Lecture, McEliece presented an expression for the number of stopping sets of size three for a full-rank parity-check matrix of the Hamming code. In this correspondence, we derive an expression for the number of stopping sets of any given size for the same parity-check matrix. △ Less

Submitted 2 March, 2006; originally announced March 2006.

Comments: 7 pages, submitted to the IEEE Transactions on Information Theory

Showing 1–5 of 5 results for author: Abdel-Ghaffar, S