Next Generation Patient Engagement & Enriched Insights

May 24, 2022
Thought Leadership

May 24, 2022
Thought Leadership

Next Generation Patient Engagement & Enriched Insights

May 24, 2022
Thought Leadership
Eve: Evidation's brand mark which is a yellow glowing orb

Authors: Luca Foschini, Bray Patrick-Lake, Anusha Narayan and Mikki Nasch

Top left picture: Couple looking at their smartwatch after exercise  Top right pictures: Nurse talking to elderly woman about her health  Bottom picture: Doctors talkings

Table of contents


The Value of Person-Generated Health Data (PGHD) to Clinical Decision Making

Challenges of PGHD: Too Much Information for Physicians?

Contextualizing PGHD to Optimize Its Value
1. Direct-to-individual confirmation
2. Personalization
3. Data quality checks
4. Triangulating context through multi-stream PGHD

Contextualization in Action and Outlook for the Future


Person-generated health data (PGHD) is a novel form of health data that is generated by individuals themselves, through the course of their lived experience outside the clinic walls. PGHD typically includes self-reported information on symptoms and quality of life, as well as data from wearable sensors.Such sensors are relatively easy to access and use, which facilitates the collection of PGHD and the creation of a longitudinal, dense representation of a patient's health and experience. This affords us an unprecedented opportunity to measure what matters to individuals in an equitable way.

Over the last decade, the idea of PGHD and its value have become more widely recognized across clinical research and development, public health, and care delivery. In all these settings, the promise of PGHDis extremely enticing — generating rich insights for more rapid and efficient design of effective medical products, personalized care that mitigates some of the inequalities in our current health system, and providing much needed visibility to clinicians on their patients’ health and medical product use between visits. However, as stakeholders experiment with the use of PGHD, they have encountered challenges in collecting, analyzing and incorporating such novel data into care settings. Such challenges are not uncommon during the early adoption and integration phase of any novel data stream, and particularly so given the complex and heterogeneous healthcare system in the US. In this article, we frame the value of PGHD in clinical settings in particular, lay out the common challenges that are barriers to its adoption, and ways in which these challenges can be mitigated to increase confidence in PGHD-derived insights. In turn, such robust insights can be used to improve clinical practice, implement better population health measures, and improve medical product design.

Older couple looking at their smartwatches after exercise

The Value of PGHD to Clinical Decision Making

Chart on how many Americans use smart watches or fitness trackers

There is a substantial number of individuals wearing devices that collect health-related data outside of clinical settings1. There is also a substantial body of evidence suggesting that these data can inform medical decision-making2. PGHD can be used for a continuous view of patient health, reducing the impact of recall and recency biases in care, and allowing real-time monitoring for risk and exacerbations.

The ability to acquire more continuous data longitudinally is a significant advantage of PGHD, as opposed to the episodic ‘snapshot’ nature of traditional clinic-based measurements. First, the latter type of data acquisition may be biased due to the adverse effects of the so-called ‘white coat syndrome’ that may falsely elevate heart rates, blood pressures, and even blood sugar levels. Measurements taken in environments that are more comfortable to patients are more likely to reflect clinically relevant values. Second, another advantage of longitudinal data is that reliable trends and trajectories can be robustly inferred even in the presence of single noisy measurements3, enabling individuals to track their progress from baseline to recovery, providing a more personalized (and personal) definition of ‘better’.

Chart: PGHD data
By leveraging person generated health data (PGHD),the vast majority of the information that had previously been invisible to clinicians from patients' lived experience can be used to uncover novel indicators of health

Real-time data acquisition and recording also reduce the impact of recall bias and increase accuracy of reporting. For example, a cell phone-based application designed to capture adverse events during cancer treatment was found to enhance the ability of patients to accurately communicate these events to stakeholders4.

PGHD integrated with machine learning algorithms can help identify patients that may be at risk for exacerbations. Subtle changes in measurable parameters may signal the risk of onset or exacerbations of chronic diseases. For example, changes in airflow values measured on a home spirometer may signal the onset of subacute exacerbations of cystic fibrosis or chronic obstructive pulmonary disease. Algorithms may be able to identify more subtle changes that may be imperceptible to the patients themselves, such as changes in sleep behavior or activity patterns(5). And recurring patterns between chronic conditions and their triggers can be investigated through “n-of-1” or single-subject statistical techniques traditionally used in clinical medicine and psychology.

Challenges of PGHD:
Too Much Information for Physicians?

PGHD undoubtedly has great promise6, but has also presented health systems with a significant challenge: the huge increase in the volume and diversity of data risks overworking physicians7 and driving burnout8. An analogous situation occurs with the wearers of devices that generate alerts; the phenomenon is sometimes referred to as ‘alert fatigue’ or ‘alarm fatigue’. Some users reported that they would receive so many alerts during the day that they would begin to ignore them or even disable the alarm. From the perspective of clinicians, the concern is that the number of alerts generated by PGHD may cause fatigue or even physician burnout. However, the issue seems to be not with the data itself, but on the design of the systems through which the data is presented to the physician. Research has shown that use of PGHD can improve outcomes (quality of life, survival metrics) and lower health system burdens (reduced emergency department usage) 9,10. Physicians aided by PatientReported Outcomes (PROs) and PGHD showed greater expression of being able to practice medicine, freed from many constraints that had previously hampered the delivery of quality healthcare. Intelligent filtering of PGHD has contributed tor educing physician burnout. This means that information is prioritized for displaying to the physician based on relevance, and decision support systems integrate and summarize data as appropriate.

Clinician burden is further exacerbated by the lack of interoperability, inability to integrate this data into existing workflows and healthcare record systems. Simply put, interoperability refers to the ability of systems to communicate with one another, without requiring time-consuming and expensive copying and loading of data. The banking industry solved this problem years ago, with the result being that financial transactions, even withdrawals and deposits, can be performed by anyone with a debit card anywhere in the world. Although no similar solution in the healthcare industry appears imminent, many PGHD systems do indeed integrate with existing electronic health records. Coupled with decision support software, this integration permits seamless access to and interpretation of PGHD11.

Another non-trivial challenge with PGHD is to determine the optimal use of this data and creation of best practices designed to make actionable decisions that would be accepted by patients and clinicians. Understanding when and how to use PGHD is critical to building trust necessary for widespread adoption and incorporation into clinical decision making. When the use of PGHD triggers an alert that was later seen to be unnecessary, this causes burden to the clinician as well as anxiety to the individual. Such errors are typically a result of optimizing for sensitivity of a test (how often the test ‘gets it right’) instead of specificity (how often it avoids ‘getting it wrong’, see sidebar). For example, a highly sensitive test lacking in specificity might lead to a false-positive alert on a mammogram or prostate-specific antigen test, resulting in unnecessary biopsies that are expensive and anxiety-generating for a patient. Such false positives are called Type 1 errors (see below).

A Type 1 error is not a property of a given datapoint, e.g., a heart rate measured at 150bpm. A normal resting heart rate for adults ranges from 60 to 100bpm. The classification as a false positive is instead the consequence of a decision made on that datapoint, e.g.,send an alert when bpm reaches 150. That decision may take into consideration many other factors, e.g., what’s the normal for a population, what was the activity performed during the measurement, how does the measurement compare with that of the same individuals in the past, whether the sensor can be considered reliable, etc. For example,150bpm may warrant concern if it’s measured at rest, or during sleep, especially if measured for a prolonged period. But it may be perfectly fine during high-intensity activity for an athlete. The factors that go into the decision that the datapoint enables are the context that allows one to interpret the datapoint into generating an alert. The richer the context, the more chance to accurately interpret the decision.

Yellow divider line


Type 1 and Type 2 errors explained

In medicine, a false-positive finding occurs when a test suggests that an individual has a disease, condition, or event, when in fact they do not. This is called a Type 1 error.

When a test fails to flag as positive patients who actually have a disease, this is a false-negative, or a Type 2 error.A useful analogy here is a music service which recommends songs based on what else you have listened to or liked. A service with high sensitivity, or true positive rate, is one that consistently recommends songs that you do like. A service with high specificity, or true negative rate, will consistently skip over songs that you have said you don’t like. Typically, sensitivity and specificity need to be balanced, e.g. you can have a 100% sensitivity by simply recommending everything, but this will result in terrible specificity because all the songs you don’t like will also be recommended.

For example, suppose the music service tries to label 20 songs, half of which you actually like. It labels 8 of them as ones you like. Of the 10 songs you like, it labels only 6 of these correctly, so the service’s sensitivity is 6 out of 10, or 60%. Of the 10 songs you don’t like, it labels 8 of these correctly, so the service’s specificity is 80%. If it labeled all 20 songs as ones you’d like, its sensitivity would be 100% (10 out of 10) but its specificity would be 0% (0 out of 10). Sensitive medical tests are highly likely to identify a patient who has a particular disease. Specific medical tests are highly likely to identify a patient who does not have a particular disease. The price paid for a sensitive test that is not also specific is that it is likely to identify patients that do not in fact have a disease. This is a version of conviction of the innocent, or a Type 1 error. The price paid for a specific test that is not also sensitive is that it is likely to miss patients that do in fact have a disease. In other words, patients who actually have a disease may not be flagged as positive. These are called false-negative findings, or a guilty person walking free. This is called a Type 2 error.

An ideal medical test is one that is highly sensitive and highly specific. Because this is difficult to achieve in practice, in cases in which the cost of the consequences of missing a positive is higher than that of falsely identifying a negative, the test is made as sensitive as possible without sacrificing too much specificity.

Yellow divider line

Contextualizing PGHD to Optimize Its Value

Below, we detail four ways that the contextualization of PGHD can help minimizeType 1 errors. These include:

  1. Direct-to-individual confirmation
  1. Personalization
  1. Data quality checks
  1. Triangulating context through multi-stream PGHD

1. Direct-to-individual confirmation

Devices connected to a patient offer the simplest method to increase confidence in an observation and reduce Type 1 errors — asking a patient directly. The following example illustrates how such after-the-fact, or ‘post hoc’ contextualization might occur in practice.

Imagine a situation where passively-collected data are analyzed in near-realtime, and a model predicts that an event is occurring or has recently occurred.The individual can be prompted to confirm or deny the event, and provide additional relevant data.

For example, if a smart scale is registering a 5-pound weight increase in a week, it could be a sign of life-threatening water retention (edema)12, but it might also be a different family member of similar weight using the scale. Confirming the weight measurement with the primary scale user is an easy way to disambiguate between the two situations. Similarly, heart rate measurements via older PPG sensors may have been inaccurate during a shower. Confirming that context with the user may prevent generating false alerts of tachycardia.

Evidation app showing push notifications when positive prediction is cast

While effective, this strategy must be used sparingly, as the confirmation prompts may be burdensome to individuals. Such prompts should be reduced in frequency either manually or automatically, using artificial intelligence techniques such as active learning and reinforcement learning approaches.

2. Personalization

The idea of personalized medicine — that better outcomes can be achieved by getting the right treatment to the right patient at the right time — is now a familiar one. The same idea applies in the contextualization of PGHD to reduce false positives. In this scenario, whether a heart rate of 150 beats per minute is too much or normal depends on the individual’s resting heart rate, or baseline.

For a given individual, a ‘z-score’ approach can help describe how far from typical (i.e., how unusual) a given observation is. Briefly, using an individual's ongoing data stream for a rolling baseline, and assessing the distributions of signals per individual, can lead to insight into what “normal” looks like for a given person. The signal from the patient can then be matched against this personalized normal range, and an alert can be triggered when the signal is out of range.

Chart that shows days since flu onset
Comparison of 2 cohorts13, one experiencing flu or influenza-like illness (ILI) with self-reported symptoms starting at day 0 (red) and one not reporting experiencing symptoms (light blue). The y axis shows the decrease in mobility (steps lost) for each day (x axis) due to flu infection. The delta step is computed at the individual level based on the estimated mobility for an individual in a given day based on their history, and then aggregated across the population.

In a similar example, data collected prior to surgical events can be used to examine the impact of a surgical event14. These retrospective data can be used to define what “normal” means for that individual and thereby provide personalized estimates of when that patient has recovered.

Such personalized contextualization is only possible with PGHD because the individual is collecting data continuously, including before an event (e.g., before they become ill). Medical devices are typically only supplied once a participant enrolls in a trial or receives a diagnosis; therefore, the data are only available after an event (post-hoc), thus making it impossible to e.g., compare against a pre-event baseline. However, bring-your-own-device (BYOD) models can be facilitated at scale through platforms with direct connections with individuals in order to capture the daily lived experiences of “patients in the wild” who are using consumer fitness trackers and disease management apps.

3. Data Quality Checks

Type 1 errors may be generated by inaccurate and biased data. For that reason, data quality checks can be a powerful tool to reduce false positives. In theReal World Data (RWD) world, data quality is the foundational premise of any analytics application. Data verification checks are the first step in processingRWD and the generation of Real World Evidence (RWE). These principles can be extended to cover PGHD, as demonstrated by recent work on the subject by a consortium of life science and digital health institutions convened by the Duke Margolis Center for Health Policy15.

There are three major aspects of data verification:

Conformance — Is the format of the data as expected?
Understanding the data format is key to interpreting it accurately. An example of this is how to encode missing measurements vs. zero measurements.Some data formats (e.g., Apple Watch data in health kit) will report only minutes with non-zero steps, implying that non-reported minutes should be intended as having zero step counts. In other cases (e.g., Fitbit intraday API)step counts for every minute of the day are reported, therefore it is to be expected that if any missing minute of data is found, it must be due to an error of transmission or data processing. Knowing the difference between the two formats is essential in order to appropriately interpret the data.

Completeness—Is the missingness level tolerable?
Data completeness checks are very important for PGHD, given its nature of being collected in real-life conditions. It is expected that there will be missing data. No individual can be expected to wear a sensor constantly without recharging the device nor complete every survey. Missing data is, in fact, often a behavioral feature and not a flaw — i.e., a person takes their device off when they shower each morning. How missing data is dealt with is important; simply discarding non-adherent data is no longer best practice.Instead, it is better either to include “missingness features” in a model(informative missingness)16 and/or to impute values that fill in gaps in data using typical patterns for that individual17.

Plausibility —Does the data make sense in the specific context of use?
Continuous collection of data provides a large volume of observations and can confidently remove raw data points that are likely to derive from technical artifacts rather than actual behavior. An example is examining unprocessed consumer-grade wearable data to identify and remove anomalous patterns(18) that are unlikely to represent actual vital signs.

Chart that shows daily steps
A daily step count of 70k that is not implausible at the daily level (left) — (think ultramarathon) becomes immediately implausible when scrutinized at the minute level (right) as a strike of exactly 200 steps/minute for more than 2 hours is more likely to be a sensor malfunctioning.

4. Triangulate context using multi-modal PGHD

Most wearable sensors do not measure a single data stream. Instead, most consumer-grade wearables monitor activity, heart rate, and sleep simultaneously.For this reason, a given signal can be contextualized relative to the other data streams collected for that individual. To return to the example where an individual has a high heart rate alert, if the sensor measuring the heart rate also has accelerometers it can give context about the activity performed during the measurements (for example, rest vs. high-intensity exercise). This can then be used to assess the likelihood of a true ‘high’ heart rate. Integration of multiple data streams to calculate derived metrics has also been done in other scenarios such as to understand signs of cognitive impairment or chronic pain(15).

Contextualization in Action and Outlook for the Future

Increasingly, there are examples where these techniques are used in practice to ensure that the signal from PGHD is relevant, reliable, and consumable. The direct-to-individual approach was used in a two-stage detection model in a recent study of influenza-like infections, where data from a wearable sensor indicated the individual was exhibiting symptoms of COVID or an influenza-likeillness. A survey was then deployed to the individual to inquire about other symptoms. This combination of subjective and objective data is often more informative than one or the other alone.

In a similar vein, as part of a program Evidation launched in 202119 with the American College of Cardiology (ACC) to support individuals’ heart health journeys, individualized summary reports chart trajectories of the patient’s self-reported symptoms along with device data summaries. These patient-facing reports were developed with clinician input on utility, and can reduce the reliance on qualitative, remembered data and help improve the patient clinician interaction. Crucially, they can help answer the primary question of “what brings you in today, and how have you been doing since your last visit?”

Heart health daily report


In summary, PGHD from consumer-grade wearables and patient self-reported outcomes can provide a direct connection to patients’ lived experience, and is increasingly a mainstay of RWE. Now more than ever, it is imperative to understand the nuances involved in generating, collecting and interpreting this novel form of data, and to build checks and safeguards to improve its quality and usability. Such contextualization has the potential to lead to increasing confidence in PGHD and popularize its use, which in turn can improve clinical decision making.


1. Vogels EA.
About one-in-five Americans use a smartwatch or fitness tracker
Pew Research Center
Published August 14, 2020
Accessed May 10, 2022

2. Ross C.
AI caught a hidden problem in one patient’s heart. Can it work for others?
Published April 26, 2021
Accessed May 10, 2022

3. Mindell D, Moss F.
How an inventor you’ve probably never heard of shaped the modern world
MIT Technology Review
Published September 5, 2016
Accessed May 10, 2022

4. Absolom K, Warrington L, Hudson E, et al.
PhaseIII randomized controlled trial of eRAPID: eHealth intervention during chemotherapy.
Journal of Clinical Oncology
2021;39(7):734-747. doi: 10.1200/JCO.20.02015
Epub 2021 Jan 8

5. Shapiro A, Marinsek N, Clay I, et al.
Characterizing COVID-19 and influenza illnesses in the real world via person-generated health data
2020;2(1):100188. doi: 10.1016/j.patter.2020.100188

6. Saxon L, Skoll, D.
We’ve entered a new era of streaming health care. Now what?
IEEE Spectrum
Published November 5, 2021
Accessed May 10, 2022

7. Jercich K.
How patient-generated data contributes to clinician burnout
Healthcare IT News
Published April 7, 2021
Accessed May 10, 2022

8. Ye, J.
The impact of electronic health record-integrated patient-generated health data on clinician burnout
Journal of the American Medical Informatics Association
2021;28(5):1051-1056. doi: 10.1093/jamia/ocab017

9. Basch E, Deal A, Dueck A, et al.
Overall survival results of a trial assessing patient-reported outcomes for symptom monitoring during routine cancer treatment
Journal of the American Medical Association
2017;318(2):197-198. doi:10.1001/jama.2017.7156

10. Helwig A.
The impact of patient-reported outcome measures
RTI Health Advance
Published September 14, 2021
Accessed May 10, 2022

11. Jones J, Gottlieb D, Mandel J, et al.
A landscape survey of planned SMART/HL7 bulk FHIR data access API implementations and tools
Journal of the AmericanMedical Informatics Association
2021;28(6):1284-1287.doi: 10.1093/jamia/ocab028

12. Editorial Team
Are swelling and sudden weight gain symptoms of heart failure?
Published December 9, 2019
Accessed May 10, 2022

13. Mezlini A, Shapiro A, Daza E, et al.
Estimating the burden of influenza on daily activity at population scale using commercial wearable sensors.

14. Ramirez E, Marinsek N, Bradshaw B, Kanard R,Foschini L.
Continuous digital assessment for weight loss surgery patients
Karger Digital Biomarkers
2020;4:13—20.doi: 10.1159/000506417

15. Mahendraratnam N, Silcox C, Mercon K, et al
Determining Real-World Data’s Fitness for Use and the Role of Reliability
Washington, DC. September 2019

16. Chen R, Jankovic F, Marinsek N, et al.
Developing measures of cognitive impairment in the real world from consumer-grade multimodal sensor streams
Association for Computing Machinery
2019:2145-2155.doi: 10:1145/3292500.3330690

16 Next Generation Patient Engagement & Enriched Insights

17. Cheng L-F, Stuck D, Quisel TR, Foschini L.
The impact of missing data in user-generated Health time series.
Semantic Scholar

Published 2017
Accessed May 10, 2022

18. Dunn, J., Kidzinski, L., Runge, R. et al
Wearable sensors enable personalized predictions of clinical laboratory measurements
Nat Med 27, 1105—1112 (2021)

19. Evidation Health, American College of Cardiology
Announce Achievement for Heart Health, a Program to Help Individuals Monitor and Improve CardiovascularHealth.
December 2020
Accessed May 10, 2022

Related Therapeutic Areas:

No related Therapeutic areas found.
No items found.