Mobile wearable devices and apps have created new pathways for individuals to collect health and well-being data outside the point of care and over time. However, data quality issues such as missing data due to the data collection happening in free-living conditions may impact the utility of the data and has severely limited adoption of consumer-generated data in clinical settings. In this work, we take a first step at quantifying the impact of data missingness in mHealth time series and propose ways to mitigate it via imputation. First, we compare the performance of different imputation strategies in reconstructing known portions of mHealth time series. Second, we investigate the benefits of performing imputation as a pre- processing step of a classification pipeline using a multi-task convolutional neural network (MT-CNN) to predict self-reported chronic conditions from the mHealth time series.
Imputing data can improve the performance of classifiers when dealing with missing data.
We additionally study changes in classification performance as a function of artificially increasing data missing rate in the mHealth time series. We find that imputers based on Gaussian Processes (GP) outperform simpler baseline when they include time kernels that allow learning user-specific behavioral patterns. We also observe that the performance of MT-CNN classifiers is very robust to missingness in the data and may benefit only marginally from the imputation pre-processing.