The Nuestra MemorIA project will recover information from human rights archives using artificial intelligence – Millennium Institute Foundational Research on Data

June 2024. Harnessing the transformative power of artificial intelligence (AI) to manage, analyze, and interpret historical archives from the Chilean dictatorship (1973-1990): this is the goal of the Nuestra MemorIA Project by the Millennium Institute Foundational Research on Data, led by Jocelyn Dunstan Escudero, academic at DCC and IMC UC; Juan , academic at DCC and IMC UC and director of IMFD; Camila Diaz, executive director of IMFD; Antonia Fonck, UC History Institute and IMFD research assistant; and new IMFD collaborating researchers Domingo Mery, DCC UC academic; and Hugo Rojas, director of Law Sciences at U. Alberto Hurtado and expert in transitional justice.

Juan , Jocelyn Dunstan, Antonia Fonck, and Domingo Mery

Driven by the nearly impossible task of manually consolidating and scrutinizing thousands of scanned documents, old photographs, and audio recordings scattered across various archives and collections, Nuestra MemorIA was born as a transdisciplinary project with the goal of developing innovative techniques specifically created for the analysis of historical documents from the dictatorship. This process incorporates the experience and research methods of People in human rights, the humanities, and the social sciences.

The ultimate goal is to integrate fragmented information that will help reconstruct historical knowledge, thereby supporting the work of historians and social scientists.

Our MemorIA has the support of the Museum of Memory and Human Rights, the National Institute of Human Rights (INDH), the Vicaría de Solidaridad, the Committee for the Prevention of Torture, and the Undersecretary of Human Rights. The project reports on different developments in the podcast Nuestra MemorIA, available on all listening platforms, and on the Instagram profiles instagram.com/nuestramemoria.cl and Twitter x.com/nuestramemo.

Our Memory Seminar

The initiative was presented on Tuesday, June 25, at a seminar attended by Pedro Bouchon, Vice-Rector for Research at UC; Loreto Valenzuela, Dean of Engineering at UC; Patricio Bernedo, Director of the UC Center for Dialogue and Peace; Felipe Mallea, Head of the Department of Studies at the National Migration Service; the team behind the project; and nearly 100 participants.

The research team faces several challenges. First, although the information exists, it is scattered among various organizations that have worked to preserve it. Then there is the volume: the Vicaría de la Solidaridad alone holds more than 85,000 records, a figure that is almost tripled in the Museo de la Memoria. The Judiciary has more than 10 million pages, in addition to the information collected by various commissions, the National Archives, the National Immigration Service, universities, and other organizations.

"What do we do with everything that is scattered, stored in so many agencies?" was the question posed by Hugo Rojas, who is also a researcher at the Millennium Institute . "This cannot be lost, it must endure for future generations," added Rojas: these archives are essential in transitional justice processes, "to access the truth, so that this does not happen again."

Collaboration will be key to undertaking this task, said Pedro Bouchon, who highlighted that agreements with the Museum of Memory and the Vicaría de la Solidaridad make it possible to put science and engineering at the service of society. For Dean Loreto Valenzuela, the academic community must contribute to the search and rescue of memory, with tools that today "allow us to discover information that would otherwise remain hidden."

A photo, a face, a name

Another challenge presented by working with millions of files from the dictatorship is that the documents come in a wide variety of formats: handwritten texts, typed documents, photos, videos, audio recordings, letters, and drawings, all containing information about victims and witnesses of human rights violations.

Domingo Mery described how they have begun to test the effectiveness of various computational and artificial intelligence tools, such as Gemini and ChatGPT, in retrieving information. In the area of visual recognition, he explains, AI systems have been able to recognize images, group them with others with similar characteristics, and describe them in text format. In addition, they are studying image quality enhancement, techniques for face detection, and face recognition in other images. In one test, he said, using a black-and-white photo of a person before they were disappeared, the AI model found them in another image: on a sign carried by a relative at a march for the disappeared detainees.

Techniques have also been applied to convert handwritten text or old audio files into digital text, which can then be used to generate a data repository on which to begin working.

The power of machine learning

Having historical data in text format is certainly a first step. But reviewing them individually is an almost impossible task to do manually. This is where the most advanced natural language processing (NLP) techniques come into play, says Jocelyn Dunstan Escudero.

The first goal, said the academic, would be to generate an annotated corpus in Spanish: a repository of concepts, such as names or places, reviewed, systematized, and tagged. This is possible: the researcher's work has made it possible to have the first annotated corpus of primary health care in Spanish, which took around two years to develop, during which time around 10,000 annotations were generated. With this corpus, it was possible to study waiting lists in the country.

To achieve this goal in the area of human rights, Jocelyn Dunstan indicated that the support of social scientists—linguists, historians, political scientists, among many others—is requiredfor the labeling and classification of digitized data. An annotated corpus will enable computers to be trained to perform this same task automatically and on massive volumes of files, quickly recognizing entities such as cities, dates, or names of captors.

Networks, connections, relationships

With information such as that described, classified, and labeled, it will be possible to build graph databases, systems that allow data to be stored with many more and different characteristics. In a graph, a piece of data such as a person's name can be accompanied by a date, witnesses, and a location, the last place where they were seen alive. Thanks to these additional characteristics, graph databases are configured as networks that connect information more quickly, explained Juan , facilitating a more targeted search.

What is required to build a graph database? Data. Thousands of pieces of data, such as those Jocelyn Dunstan hopes to systematize. Only in this way can this immense network be built, in which each connection or node, connected to others, allows concepts to be explored, with the hope of detecting relationships that may have previously gone unnoticed and shedding light where shadows still linger.