Social, structural and environmental determinants of health, such as food or housing insecurity, systemic racism or chronic stress, account for 60–80% of the modifiable risk of disparities in marginalized populations. Such determinants have been difficult to address systematically because of their complexity, multidimensionality and heterogeneity. Emerging precision health methods use large-scale person-generated health data from smartphones and wearables to better characterize and, ultimately, improve health and well-being through strategies customized to individual context and need. Applying artificial intelligence and machine learning to person-generated health data allows unprecedented assessment of recursive, networked and latent associations between everyday life and health, including social, structural and environmental exposures, behaviors, biometrics, and health outcomes. Thus, precision health provides an important opportunity for reducing health disparities among minoritized racial or ethnic groups, or those who are under-resourced.
Despite the potential for improving health equity, the research community lacks benchmark training datasets of person-generated health data, which limits the ability to develop precision health models that are equally effective across diverse populations. Both the validity and the generalizability of an artificial intelligence or machine learning system are intrinsically tied to the underlying training data. The ideal benchmark dataset should feature high-quality, well-characterized data that comprehensively represent the target population in order to instill the highest standards of scientific transparency and rigor to model development, validation and evaluation.
Person-generated health data cohorts in the US National Institutes of Health’s All of Us research program, UK Biobank, the Framingham Heart Study and the majority of commercial studies rely on convenience sampling and/or ‘bring your own device’ designs. Consequently, those who lack access to digital technologies (who tend to be older, Black, Latino, Indigenous, poorer and sicker) are systematically under-represented The National Health and Nutrition Examination Survey is representative, but it uses a cross-sectional design and a 1-week accelerometer measurement period, which limits its ability to assess temporal effects or account for seasonality. The absence of a benchmark dataset risks the introduction of systemic bias, exacerbation of health disparities, and causing of additional patient harm in already marginalized groups.
To bridge this critical gap, we created American Life in Realtime (ALiR), a publicly available benchmark dataset, cohort and research infrastructure for person-generated health data. ALiR has four primary objectives that advance equitable precision health: promoting inclusive representation; encouraging methodological rigor in artificial intelligence and machine learning; fostering interdisciplinary collaboration and transparency; and facilitating comprehensive exploration of the dynamic interplay between everyday life and health.
Read the full article here.