Big Data Study Reveals Possible Subtypes of Type 2 Diabetes

In recent years, there’s been a lot of talk about how “Big Data” stands to revolutionize biomedical research. Indeed, we’ve already gained many new insights into health and disease thanks to the power of new technologies to generate astonishing amounts of molecular data—DNA sequences, epigenetic marks, and metabolic signatures, to name a few. But what’s often overlooked is the value of combining all that with a more mundane type of Big Data: the vast trove of clinical information contained in electronic health records (EHRs).

In a recent study in Science Translational Medicine  [1], NIH-funded researchers demonstrated the tremendous potential of using EHRs, combined with genome-wide analysis, to learn more about a common, chronic disease—type 2 diabetes. Sifting through the EHR and genomic data of more than 11,000 volunteers, the researchers uncovered what appear to be three distinct subtypes of type 2 diabetes. Not only does this work have implications for efforts to reduce this leading cause of death and disability, it provides a sneak peek at the kind of discoveries that will be made possible by the new Precision Medicine Initiative’s national research cohort, which will enroll 1 million or more volunteers who agree to share their EHRs and genomic information.

In the latest study, a research team, led by Li Li and Joel Dudley of the Icahn School of Medicine at Mount Sinai, New York, started with EHR data from a racially and socioeconomically diverse cohort of 11,210 hospital outpatients. Of these volunteers, 2,551 had been diagnosed with type 2 diabetes, which is the most common form of diabetes.

Without focusing on any particular disease or condition, the researchers first sought to identify similarities among all participants, based on their lab results, blood pressure readings, height, weight, and other routine clinical information in their EHRs. The approach was similar to building a social network with connections forged, not on friendships, but medical information. When the resulting network was color-coded to reveal participants with type 2 diabetes, an interesting pattern emerged. Instead of being located in one, large clump on this “map,” the points indicating people with type 2 diabetes were actually grouped into several smaller, distinct clusters, suggesting the disease may have subtypes.

To take a closer look, the researchers rebuilt the network to include only participants with type 2 diabetes. They then reanalyzed the EHRs based on 73 clinical characteristics, including gender, glucose levels, and white blood cell counts. That work confirmed that there were three distinct subtypes of type 2 diabetes among study participants.