How a New U.S. Health Study Is Tackling Bias in Wearable Data Research
A new nationwide research initiative is changing how we think about wearable health data—and who gets represented in it. The American Life in Realtime (ALiR) study, recently published in PNAS Nexus, is setting a new benchmark for equity in digital health research. The project’s mission is straightforward: make sure that the kind of data driving our future health technologies and AI models actually represents everyone, not just the most tech-savvy or well-off.
The Problem: Bias in Wearable and Person-Generated Health Data
Wearables like smartwatches and fitness trackers have become an integral part of health monitoring. They collect person-generated health data (PGHD)—information about our physical activity, sleep, heart rate, and even stress levels. This continuous stream of data is invaluable for precision health, an approach that tailors healthcare to each individual’s biology, lifestyle, and environment.
But there’s a catch. Most large-scale datasets used in AI-driven health research rely on data from people who already own these devices. These participants are typically younger, wealthier, and more educated, and they tend to be early adopters of technology. Groups such as older adults, lower-income individuals, Black and Indigenous people, and those in rural areas are often underrepresented. This lack of diversity leads to biased models that don’t generalize well and can even reinforce health inequities.
For example, during the COVID-19 pandemic, wearable-based studies trying to detect early infection struggled to include people with limited internet access or technology literacy. Many of these studies used convenience samples—participants who volunteered because they already had the necessary devices and digital access—leaving out those who couldn’t afford or didn’t trust the technology.
The ALiR Study: Fixing the Representation Gap
The American Life in Realtime (ALiR) study was designed to close these gaps. Led by researchers at the University of Southern California’s Dornsife Center and partners, the project set out to create a longitudinal, nationally representative dataset using best practices in probability-based sampling and FAIR data standards (Findable, Accessible, Interoperable, Reusable).
What makes ALiR different is that it doesn’t rely on participants bringing their own devices. Instead, the study provides wearable devices and internet access to participants who need them. This ensures that data reflects the diversity of the entire U.S. population—not just those who already have digital tools.
Study Design and Recruitment
To build this representative sample, researchers drew participants from the Understanding America Study (UAS), a large, nationally representative panel of U.S. adults. Between August 2021 and March 2022, the ALiR team invited 2,468 UAS members to participate. They intentionally oversampled racial and ethnic minorities and individuals with lower education levels to make sure that underrepresented groups were fully included.
Out of those invited, 1,386 people (64%) consented, and 1,038 (75% of consenters) completed enrollment. The study provided each participant with a wearable device (Fitbit Inspire 2) and, for those without reliable internet access, a Samsung Galaxy tablet with 4G connectivity. This hardware distribution strategy was key—it directly addressed one of the biggest barriers in health tech research: unequal access to digital devices and broadband.
Researchers found interesting patterns in participation. Statistical analysis (including logistic and random forest modeling) showed that older adults were less likely to consent, while lower education was linked to lower enrollment. These findings highlight that even when devices are provided, social and psychological barriers, such as mistrust or disinterest, can still affect who takes part in research.
Continuous, Rich, and Linked Data Collection
Once enrolled, participants used a custom mobile app that synced with their wearables to continuously collect biometric data—things like heart rate, step count, and sleep patterns. But ALiR didn’t stop there.
Every one to three days, participants also filled out short surveys capturing details about their mental and physical health, daily behaviors, demographics, and social and environmental exposures. These included sensitive but crucial factors like income, housing conditions, and discrimination.
To deepen the analysis, the study linked participants’ responses with contextual datasets such as weather patterns, air quality indexes, healthcare access, and local crime data. This allowed researchers to examine not only individual health metrics but also the environmental and structural factors shaping people’s wellbeing.
This combination of wearable data, frequent self-reports, and linked environmental data gives ALiR an unprecedented level of depth and contextual richness, setting it apart from most digital health datasets in use today.
Who Took Part: Diversity and Representation
The results speak for themselves. ALiR achieved remarkable diversity and representativeness compared to traditional wearable-based studies.
- Racial and ethnic minorities made up 54% of participants (compared to 38% in the general population).
- White participants accounted for 46%, lower than their 62% share of the U.S. population—reflecting intentional oversampling for inclusivity.
- 77% of participants had never used a wearable device before.
- 2% had no internet access prior to the study.
Weighted statistical adjustments helped correct for minor imbalances, though retirees and people with hypertension were still slightly underrepresented.
This level of inclusion shows that with thoughtful design—providing hardware, internet access, and targeted outreach—researchers can overcome the long-standing digital divide in health data.
Testing for Fairness: How ALiR Performed
To see how this diverse dataset could improve fairness in AI models, researchers used ALiR data to train a COVID-19 infection classification model and compared it to one trained on the All of Us Fitbit dataset, a large but “bring-your-own-device” (BYOD) dataset maintained by the National Institutes of Health.
Here’s what they found:
- The ALiR-based model achieved an Area Under the Curve (AUC) of 0.84, both in-sample and out-of-sample—showing stable, reliable performance across all demographic groups.
- The All of Us model, meanwhile, scored 0.93 in-sample but dropped to 0.68 out-of-sample, a 35% drop in accuracy.
- The largest performance declines (between 22% and 40%) occurred among older women and non-white participants.
In other words, the ALiR data helped produce a model that was less biased and more generalizable, even though it came from a smaller sample. This challenges the common assumption that “bigger datasets are always better.” What matters more is who’s included and how the data are collected.
Remaining Challenges and Future Directions
Despite its success, ALiR isn’t perfect. The study still saw lower participation among older adults, even with free devices and internet access. This suggests that other factors—like lack of trust in research, privacy concerns, or low interest—play roles that technology alone can’t fix.
Another challenge is long-term engagement. Keeping participants active and responsive over time is crucial for longitudinal studies. The team is currently working on methods to maintain participant involvement beyond the initial phase.
Still, ALiR’s foundation is solid. Its use of probability-based sampling, FAIR data principles, and transparent methodology makes it one of the most rigorous digital health studies to date. Importantly, the researchers plan to make both the dataset and the mobile app code publicly available in late 2025, allowing other scientists to develop and test equitable AI models.
Why This Study Matters for the Future of Digital Health
The ALiR study offers a powerful lesson for the future of AI and digital health: representativeness matters as much as raw data volume. If health algorithms are to guide medical decisions, they must work well across all populations—not just the ones easiest to study.
Wearables and mobile health tools are expanding fast, but without thoughtful design, they risk widening the health equity gap. ALiR shows that it’s possible to build inclusivity into the research process itself—by removing technological barriers, respecting diversity, and using transparent methods.
When the dataset becomes public in 2025, it could serve as a gold standard for evaluating fairness in AI-driven health applications, from disease prediction models to behavioral interventions.
Beyond ALiR: Understanding Person-Generated Health Data
To appreciate the significance of ALiR, it helps to understand what person-generated health data (PGHD) really means. Unlike traditional medical data, which comes from hospitals or lab tests, PGHD is collected directly from individuals through devices they use in everyday life—smartwatches, fitness trackers, apps, or sensors.
PGHD is valuable because it captures real-world behavior: sleep cycles, exercise, stress, mood, diet, and exposure to environmental risks. These factors influence over 70% of modifiable health outcomes, from diabetes to cardiovascular disease. However, because these data often come from people with access to technology, the resulting datasets skew toward certain demographics.
Studies like ALiR remind us that for AI in healthcare to be truly fair, it must learn from data that reflect the full spectrum of human experience, not just a privileged subset.
Reference
Research Paper: American Life in Realtime: Benchmark, publicly available person-generated health data for equity in precision health. PNAS Nexus, 2025.