Logo de Datalytics
Logo de Datalytics
Logo de Datalytics
Logo de Datalytics

Success Stories

SURA: Generative AI to find new ways of managing health

At Datalytics — based on OpenAI technology — in six months, we developed a generative AI solution that allows healthcare professionals to use data to diagnose diseases and understand the genetic information of populations.

Client: SURA – A leading insurance provider in LATAM. It has over 10K employees across the region and more than 80 years of history.

Technology: Azure Open AI, Azure BotService, Azure Cosmos, Azure Function, AppService, Storage Account, Azure Container Instance

Time: Six months.

SURA is the leading insurance company in Colombia and one of the most important in LATAM. It has over 80 years of history and currently holds health data on more than 5 million affiliated individuals.

In 2023, SURA inaugurated its Omics Science Center in the city of Medellín. From there, they focus on exploring people’s DNA and analyzing the population’s genomic data to predict, prevent, and diagnose diseases, as well as to discover new ways to manage health. The goal is to develop personalized medicine, empowering patients to take control of their well-being and make data-driven decisions.

This type of science—including genomics—generates a massive amount of information about human DNA that must be processed in order to be understood.

 

How are genetic data processed?

DNA is extracted from a blood sample, which provides a lot of information about the individual. This sample is then passed through sequencing equipment, which generates a list of information in four letters that encode the DNA.

In a human being, a single genome contains three billion of these letters. This is an almost unmanageable amount of data that must be processed and analyzed to determine, for example, if a patient has a disease, why they have it, whether a genetic explanation can be provided, what diseases they may develop in the future, etc.

The interpretation of this data is highly complex and cannot be done by just any healthcare professional. Only specialists in genetics, molecular biology, and related fields have the knowledge to carry out this task.

Once these professionals analyze the data, they generate a PDF report, providing a clinical and biological interpretation of the information. These reports are so technical and specific that they can be difficult for non-specialist doctors to interpret. However, they contain very valuable data, as they are curated information about the patient.

Therefore, to begin understanding the population—whether there are relationships between variants, how age and habits influence health, etc.—it would be necessary to cross-reference this information. In practice, this would be very complicated because they were working with over 500 formats of data.

Proceso de análisis genómico
Genomic analysis process

Challenges  

“The information in the genomic reports comes in very heterogeneous formats. So, searching for something in them would require a lot of time and dedication. That’s where we found a solution together with our partner Datalytics. We decided to use generative models to extract information from the PDFs easily and transfer it to a standardized database,” explains Catalina Bustamante, head of technology at SURA’s Omics Science Center.

From this situation, the main challenges were:

  • Processing large volumes of data.
    • Interoperability: it was necessary to combine genetic reports with medical history data.
    • The existence of a large amount of unstructured but highly relevant information.
  • Difficulty in interpretation.

 

Strategy: What did we do?

The goal was to automatically extract information from more than 10,000 PDFs based on the variables we needed and compile it into a standardized report.

“Searching these databases requires technical knowledge of genetics, which is uncommon among clinical staff. However, they do have a clear understanding of what kind of questions to ask and the terms to use. In this sense, we designed a bot using generative AI that allows natural language queries to be made to the database,” adds Bustamante.

Together with the SURA team, Datalytics developed a solution based on a private instance of OpenAI, fed with information from patient genetic reports. Moving forward, we aim to integrate medical history, radiology, and other data.

It is a chat —private and restricted within Microsoft Teams— that uses generative AI to query data. Physicians interact using natural language, which facilitates access to and understanding of genomic data.

To achieve this:

  • We standardized the data: We used a strategy powered by generative AI that normalizes the types of data received and helps machines interpret them. From there, genetic data could be queried just like any medical history.
  • We used GPT to access the extracted, curated information available in a database. This allows natural language questions, such as: “What are the patient’s family medical histories?” GPT not only provides easy access to all available information but also summarizes it without adding potentially erroneous interpretations.

 

Physicians who are not specialists in genetics can ask questions naturally, such as: “How many patients have X variant and also suffer from X pathology?” or “How many patients have a parent younger than 40 with X disease?”

GPT is crucial because it spares geneticists from having to run queries manually, as it translates these questions into a query for the database. Once done, the AI agent constructs text that delivers the response to physicians in natural language.

Consultas asistidas por IA generativa
GenAI-assisted queries

“Hunting through these databases requires technical knowledge of genetics, which is uncommon among clinical staff. However, they do know exactly what questions to ask and which terms to use. In this regard, we designed a bot using generative AI that allows natural language queries to the database,” Bustamante adds.

 

Results

Via chat, the healthcare personnel (geneticists and molecular biology professionals) can use natural language to ask all kinds of open-ended questions in multiple languages. They can also view an integrated natural language response that summarizes the findings and presents them alongside the associated documents.

Fredy Cuervo, a molecular biology professional at SURA’s Omics Science Center, said: “This agent allows us to analyze the genetic variants we encounter daily, enabling us to extract information quickly and accurately. It helps us identify which disease a variant is related to, its pathogenicity, and speeds up our analysis processes, allowing us to access information faster for research purposes.”

“This tool supports healthcare professionals and researchers in comparing information between patients and, most importantly, helps consolidate population data to produce research and create new services and products that contribute to people’s well-being,” concludes Carlos Andrés Agudelo, manager of SURA’s Biosciences department.

This article was originally written in Spanish and translated into English with ChatGPT.

Compartir: