Large language models and knowledge graphs: solutions for health data management - Millennium Institute Foundational Research on Data

May, 2024. A proposal that combines the use of large language models with data graphs for clinical data: this is the innovation they propose in the paper Augmented non-hallucinating large language models as medical information curators published in NPJ Digital Medicinethe researchers Stephen Gilbert y Jakob Nikolas Katherfrom the Else Kröner Fresenius Center for Digital Health at the Dresden University of Technology, Germany, along with the deputy director of the Millennium Institute Foundational Research on Data, Aidan Hoganfrom the Department of Computer Science at the University of Chile.

Medical data present a major challenge for today's data science: they are complex, they must necessarily be safeguarded and there are a number of challenges for their efficient management. All this complexity has prompted the search for search for alternatives that can enable its application and make this valuable information usable. can be used by health care by health care providers for the benefit of their patients, for the benefit of their patients.

Large language models (LLM) -very large deep learning models, which are trained with large amounts of data, the basis of applications such as ChatGPT-,can contribute significantly to a better structuring, categorization and interpretation of medical information, however, they have weaknesses that hinder their use in such a critical area as health. The generation of plausible but incorrect information (hallucinations), or the ability to give different answers to the same query, make their use in medicine very complex. On the other hand, knowledge graphs are a way of structuring large amounts of information, which can be in different formats or which can be in various formats or multivariable, through the use of nodes and vertices that connect the data.

Researchers at the EKFZ for Digital Health of the TU Dresden and the University of Chile propose in the paper Augmented non-hallucinating large language models as medical information curators published in a possible solution to this problem: the combination of LLM with knowledge graphs (KG). This gives rise to a new form of retrieval augmented generation , Retrieval Augmented Generationwhich would allow the models to be more reliable, robust and query replayable.

The reliable recording of medical information and its between different systems (interoperability) is a major (interoperability) is a major challenge in healthcare and is often referred to as the "communication problem" of medicine. Medical ontologies and knowledge graphs (KGs) are approaches to solve this problem. Medical ontologies function as dictionaries of medical terms that help categorize and define medical concepts. However, since in human language terms can have different meanings depending on the context, these ontologies are often ambiguous. The word "cold" in English, for example, can refer to body temperature, to environmental conditions, to a cold. The same thing happens in all languages, and with differences between different disciplines within health. The use of acronyms is another big challenge in the field: COLD can also stand for "Chronic obstructive lung disease".

Knowledge networks (KG) are organized networks that connect different medical concepts and their relationships. For example, the term "COVID-19" in a graph could be connected to "fever" through a link called "has symptom".Graphs facilitate the understanding and processing of medical information, but they face similar challenges as medical ontologies.

Combination for structured reasoning

To remedy these shortcomings, the Dresden and Santiago de Chile researchers propose to combine LLMs with KGs. combining LLM with KGby taking advantage of their respective strengths. This combination provides structured reasoning and could help reduce model bias and provide more reliable, accurate and reproducible results. These approaches would be more compatible with regulatory approval pathways than LLMs alone.

"The combination of large linguistic models and knowledge graphs is a way to link existing medical knowledge with the cognitive capabilities of large linguistic models. We are only at the beginning of a very exciting development," says Professor Jakob N. Kather, Professor of Clinical Artificial Intelligence at the Technical University of Dresden and oncologist at the Carl Gustav Carus University Hospital in Dresden.

In the research, the authors discuss different approaches to combining LLMs with KGs. They suggest that this could also facilitate the development of robust "digital twins" of patients, in the form of structured individual medical records that enable personalized diagnosis.

"Although regulatory challenges remain, healthcare professionals graduating today can anticipate access to compatible and advanced clinical information summarization tools that were previously unimaginable just five years ago. Moreover, approaches that combine large linguistic models with knowledge graphs are more likely to achieve early approval in conservative regulatory pathways," says Professor Stephen Gilbert, Professor of Medical Device Regulatory Science at the Technical University of Dresden.

For Aidan Hogan, "Like LLMs, KGs have applications not only in medicine, but also in many aspects of society that are increasingly dependent on data capture and processing. To make better decisions today, you have to learn from the past, and the digital past is data.. But integrating data comes at a higher cost, limiting this virtual perspective of the past. In this context, LLMs help integrate large-scale text data, while KGs help integrate large-scale structured data. Both approaches are complementary, and their combination can have many applications in different areas where data-driven decisions are made."

Interoperability of clinical data

Clinical data interoperability is a big issue around the world, and Chile is no exception. The idea is that when a patient attends any clinic, hospital or health center, there will be the possibility of sharing - respecting the applicable norms and ethics on privacy, use, etc. - his or her medical history in digital format, and in a faithful and complete way, without mistakes or omissions that can even be fatal. "Currently LLMs have a great capacity to integrate text from different sources, but they do not have the precision needed for this use case. So we propose that LLMs should be combined with other methods - particularly KGs - so that they can be used in the context of clinical data interoperability and other medical applications," Hogan emphasizes.

Source: https://digitalhealth.tu-dresden.de/llms-in-medicine-researchers-publish-solutions-to-increase-reproducibility-and-precision/