David Carrell, PhD, is a GHRI clinical data scientist with a special interest in natural language processing.
So many major changes develop imperceptibly over time. Suddenly, one day we find that our children have crept past us in height. Another morning we notice that our city’s daily traffic has deteriorated from slow to gridlock. Life changes are so often an accumulation of tiny alterations rather than a single, large event. A challenge in clinical research is detecting the small steps that lead to major health changes.
Recently, researchers at Arizona State University used a powerful informatics approach to track the gradual cognitive decline of a person with Alzheimer’s disease. To detect changing cognitive ability, the scientists looked for small differences over time in the speech of a single person with Alzheimer’s: President Ronald Reagan. President Reagan was elected in 1980 and diagnosed with Alzheimer’s in 1994, after he left office. As president, all his official words, including unscripted exchanges during press conferences, were written down. That’s a rich data source and a lot of words to analyze.
To deal with all that text, the ASU scientists used an approach I use in my own work at Group Health Research Institute: natural language processing (NLP). NLP uses computer methods to get data from text for analysis. In the ASU project, NLP was used on news conference transcripts. The point of the project was not to confirm that President Reagan had Alzheimer’s. Instead, the researchers showed the power of NLP.
By comparing President Reagan’s unscripted speech with the language of President George H.W. Bush, who does not have Alzheimer’s, the researchers detected subtle changes in word use and grammar over time that indicated gradually declining cognitive function.
The NLP work that my colleagues and I do is even more complex than working with press transcripts. Our data source is doctors’ notes (after removing information that might identify a patient), which include jargon, brand names, abbreviations, and content such as drug-warning “boilerplate” text that is copied and pasted into notes.
For example, in a project with Group Health’s Actuary and Underwriting department, my colleague David Cronkite (a Master’s level computational linguist) and I looked for suggestions of polyneuropathy—nerve damage that causes pain—in doctors’ notes. Gradually increasing polyneuropathy could indicate declining patient function, possible opioid use, or other medical situations that patients and their physicians want to monitor. We used NLP combined with machine learning, which uses statistical methods to identify when words are associated with something of interest. In the polyneuropathy project, we identified terms that suggested that a patient’s pain was getting more severe. We predicted with very high accuracy when a medical record had evidence of possible undiagnosed polyneuropathy. Although our project was just a test of the methods, we hope that eventually, we’ll be able to flag patient records that hold hints of a developing condition so a physician or medical record specialist can look more closely.
Like the ASU researchers, the goal of my research is to find ways to use information we already have—such as electronic health records—in the service of health care. Conditions like Alzheimer’s or polyneuropathy signal their presence with such tiny clues that they go undetected for years, even by patients and their doctors. NLP and machine learning could help us hear those signals in time to prevent or delay these slow-onset diseases.
David Carrell, PhD