Suitability of Electronic Health Record Data for Computational Phenotyping of Diabetes Mellitus at Nairobi Hospital, Nairobi City County, Kenya

Main Article Content

Amos Otieno Olwendo
George Ochieng
Kenneth Rucha


EHR; Computational Phenotyping; Data Quality; Density-based Clustering; Usability


This research aims to determine the applicability of routine healthcare in clinical informatics research.  One of the key areas of research in precision medicine is computational phenotyping from longitudinal Electronic Health Record (EHR) data. The objective of this research was to determine how the interplay of EHR software design, the use of a data dictionary, the process of data collection, and the training and motivation of the human resource involved in the collection and entry of data into the EHR affect the quality of EHR data thus the suitability of such data for utility in computational phenotyping of diabetes mellitus. This research employed a prospective/retrospective study design at the diabetes clinic in Nairobi Hospital. The first source of data was from interviews with 32 staff; nurses, doctors, and health record officers using a referenced peer-reviewed usability questionnaire. Thereafter, a sample of EHR data collected during routine care between January 2012 and December 2016 was also analyzed by looking into the quality of clusters identified in the data using a density-based clustering algorithm and Statistical Package for Social Sciences (SPSS) version 21. Regression analysis shows that software design and the utility of a data dictionary explained 50.7% and 32.3% respectively in the improvement of the suitability of EHR data for computational phenotyping of diabetes mellitus. Also, EHR software was rated useful (82%) in accomplishing users’ daily tasks. However, EHR data were found to be unsuitable for utility in computational phenotyping of diabetes.   Despite the fact that 88% of EHR data were clustered as noise, the clustering algorithm identified a total of 23 clusters from the diabetes dataset. However, with improved quality of EHR data, sub-phenotyping tasks would be achievable. This research concludes that the poor quality of EHR data is a result of employees’ unmet intrinsic factors of motivation.