Nuestra MemorIA project will rescue information from human rights archives using artificial intelligence.

June, 2024. Harnessing the transformative power of artificial intelligence (AI) to manage, analyze and interpret historical archives from the era of the Chilean dictatorship (1973-1990): this is the objective of the Nuestra MemorIA Project Project of the Millennium Institute Foundational Research on Dataled by Jocelyn Dunstan Escudero, academic at DCC and IMC UC; Juan Reutter, academic at DCC and IMC UC and director of IMFD; Camila Diaz, executive director of IMFD, Antonia Fonck, Institute of History UC and IMFD research assistant and the new IMFD collaborating researchers, Domingo Mery, academic at DCC UC; and Hugo Rojas, director of Law Sciences at U. Alberto Hurtado and expert in transitional justice.

Juan Reutter, Jocelyn Dunstan, Antonia Fonck and Domingo Mery

 

Driven by the almost impossible task of manually consolidating and sifting through thousands of scanned documents, old photographs and audio recordings scattered in various archives and collections, Nuestra MemorIA is born as a transdisciplinary project, which aims to develop innovative techniques specifically created for the analysis of historical documents of the dictatorship. In this process, it is incorporating the experience and research methods of human rights experts from the humanities and social sciences.

The ultimate goal is to integrate fragmented information to help reconstruct historical knowledge, thus supporting the work of historians and social scientists.

Our MemorIA is supported by the Museum of Memory and Human Rights, the National Institute of Human Rights (INDH), the Vicariate of Solidarity, the Committee for the Prevention of Torture and the Undersecretary of Human Rights. The project is reporting on different advances in the podcast Our MemorIApodcast, accessible on all listening platforms, and on the instagram profiles instagram.com/nuestramememoria.cl and twitter x.com/nuestramememo.

Our MemorIA Seminar

The initiative was presented on Tuesday, June 25, at a seminar attended by Pedro Bouchon, UC Vice Rector for Research; Loreto Valenzuela, UC Dean of Engineering; Patricio Bernedo, Director of the UC Center for Dialogue and Peace; Felipe Mallea, Head of the Studies Department of the National Migration Service, the team behind the project and about 100 participants.

Pedro Bouchon

 

There are several challenges facing the research team. First, although the information exists, it is scattered among various organizations that have worked on its preservation. Then, there is the volume: the Vicaría de la Solidaridad alone has more than 85,000 records, a figure that is almost tripled in the Museo de la Memoria. In the Judicial Branch there are more than 10 million pages, in addition to those collected by different commissions, the National Archive, the National Migration Service, universities, among other organizations. 

Felipe Mallea

 

"What do we do with all that is scattered, stored in so many agencies?" was the question posed by Hugo Rojas, who is also a researcher at the Millennium Institute VioDemos. "This cannot be lost, it has to last for the next generations," added Rojas: these archives are essential in transitional justice processes, "to access the truth, so that this does not happen again." 

Hugo Rojas

 

Collaboration will be key to take on this task, said Pedro Bouchon, who emphasized that the agreements with the Museum of Memory and the Vicariate of Solidarity make it possible to put science and engineering at the service of society. For Dean Loreto Valenzuela, the academic community must contribute to the search for and rescue of memory, with tools that today "can allow us to discover information that would otherwise remain hidden.

Loreto Valenzuela
A photo, a face, a name

Another challenge of working with the millions of archives of the dictatorship is that they are documents in the most diverse formats: handwritten texts, other typescripts, photos, videos and audio recordings, letters and drawings, data from victims and witnesses of human rights violations.

Domingo Mery described how they have begun to test the effectiveness of different computational and artificial intelligence tools, such as Gemini or ChatGPT, in rescuing information. In the area of visual recognition, he explained, AI systems have been able to recognize images, group them with others of similar characteristics and describe them in text format. In addition, they are studying the improvement of image quality, techniques for face detection and recognition of faces in other images. In one test, he said, using a black-and-white photo of a person before he was made to disappear, the AI model found him in another image: in the sign carried by a family member at a march for the disappeared detainees.

Domingo Mery

 

Techniques have also been applied to convert handwriting or old audio files into digital text that can then be used to generate a repository of data on which to start working.

The power of machine learning

Having the data from the historical archives in text format is certainly a first step. But reviewing them individually is an almost impossible task to perform manually. This is where state-of-the-art natural language processing (NLP) techniques come into play, says Jocelyn Dunstan Escudero. 

Jocelyn Dunstan Escudero

 

The first goal, said the academic, would be to generate an annotated corpus in Spanish: a repository of concepts, such as names or places, reviewed, systematized and labeled. This is possible: the researcher's work has made it possible to have the first annotated corpus of primary health care in Spanish, which took about two years to develop, during which time about 10,000 annotations were generated. With this corpus, it was possible to study waiting lists in the country.

To achieve this goal in the area of human rights, said Jocelyn Dunstan, the support of social scientists - linguists, historians, political scientists, among many others - is requiredto label and classify the digitized data. An annotated corpus will make it possible to train computers to perform the same task automatically and on the massive volumes of files, quickly recognizing entities such as cities, dates or names of captors.

Networks, connections, relationships

With information such as that described, classified and labeled, it will be possible to build graph databases, systems that allow data to be stored with many more and different characteristics. In a network, a piece of information such as a person's name can be accompanied by a date, witnesses and a place, the last place where the person was seen alive. Thanks to these additional features, graph databases are set up as networks that connect information more quickly, explained Juan Reutter, facilitating a more targeted search.

John Reutter

 

What is required to build a network database? Data. Thousands of data, such as Jocelyn Dunstan's line hopes to systematize. 

Only in this way will it be possible to build an immense network in which each connection or node, connected with others, will make it possible to explore concepts, with the hope of detecting relationships that might have gone unnoticed before and of shedding light where shadows still persist. 

The seminar had more than 100 attendees