Innovación IMFD collaborates with the Social Security Superintendency on a project that uses advanced NLP techniques

January 2025. One of the most interesting challenges in the field of natural language processing (NLP) is free text recognition. This is the challenge faced by the IMFD's Innovation and Technology Transfer Department in its collaborative work with the Superintendency of Social Security (SUSESO), which focused on optimizing the submission of claims from a citizen's perspective. The objective of this collaboration is to develop and evaluate the performance of natural language processing (NLP) models for the automatic classification of subjects, sub-subjects, and causes of complaints to the SUSESO.

"When a person files a claim with the Social Security Superintendency (SUSESO), there is a range of options to choose from for a cause. A user could mistakenly enter an incorrect option, resulting in a negative solution at the end of this process," explains Hernán Sarmiento, IMFD Innovation Engineer and project leader. After this first step, there is a field where a free text called "story" is entered, where the person indicates why they are complaining. 

The question is how we can take advantage of this free text narrative to somehow automatically assign one of these causes that the user should select. Here we find narratives that have spelling mistakes, typos, and grammatical errors, so during this project we are trying to experimentally evaluate whether we can train classification models and see what percentage of all the existing reasons could be accepted," the engineer points out. 

This initiative was part of a total of seven components promoted by SUSESO, called "Natural Language Processing for the Optimization of Complaint Filing," which not only seeks to streamline the process but also to facilitate a faster and more accurate response for those seeking solutions to their problems.

SUSESO is the autonomous state agency responsible for enforcing social security regulations and ensuring respect for the rights of the People, especially workers, pensioners, and their families. The organization worked for the first time on solutions using data science and artificial intelligence, in this case with natural language processing (NLP) as part of its efforts to improve service and modernize its processes.

Key findings

The project was divided into three key stages: Exploratory data analysis, in which complaints were reviewed to identify patterns. Linguistic characterization of the accounts, where NLP tools were used. And finally, training and validation of models, which involved training artificial intelligence models to automate the classification of claims. 

Upon reaching Stage III, with the model training already completed, the test set was reached, in which a thousand stories corresponding to 17 causes were used. Different language models were evaluated in order to determine the ability to predict a cause from a story.

The main results indicate that it is possible to build language models that can learn from claims income reports and that this can improve the ability to correctly classify these claims or complaints to a specific cause.

Hernán Sarmiento points out that "the fact that we can use existing language models and adapt them to the stories that SUSESO has can, in some way, improve classification by more than tenfold compared to selecting a cause at random.". He adds that "With this finding, we can say that, indeed, the narratives may contain some type of very specific language according to the domain that allows this classification to be improved automatically." 

Innovation at the service of society 

"This is about making something that sometimes seems distant, such as artificial intelligence, available in a concrete way. Innovation is coming, which means improving the user experience of our People come knocking on our door to resolve complaints. We are very happy with the results of the project, because it can have an impact on citizens and is highly technological," said Pamela Gana, Superintendent of SUSESO. 

The IMFD team behind this project is led by Hernán Sarmiento, IMFD Innovation Transfer Engineer, with Francisca Cona and Camila Henríquez as data scientists, and Jocelyn Dunstan, an academic from the UC Department of Computer Science , who also works at the UC Institute of Computational Mathematical Engineering, and researcher AC3E and IMFD researcher. 

"It was a project that addressed a need and a problem that is common to citizens. Working with SUSESO allows us to address real problems in society through methodologies or scientific experimentation, all with the aim of improving certain processes," Sarmiento emphasizes. 

For the IMFD, this project is part of a series of initiatives that seek techniques to apply advanced data science research developed in Chilean academia to solving problems that impact our society, which is in line with the objectives of the institute and its innovation department.