Critical gap: Vulnerabilities revealed in the anonymization of public data on the eve of the new Data Protection Law – Millennium Institute Foundational Research on Data

Knowing how many of us there are, how we are doing in terms of health and education, how we vote, where we come from: there is general consensus that this data should be accessible to those seeking to create public policies, innovate, research, study, and much more. But how do we ensure the privacy of individual data in a world where it is increasingly easy to handle enormous amounts of data?

"The privacy and security of our data are areas that are increasingly affected by advances in the field of information technology," says Federico Olmedo, an academic at the DCC U. Chile and researcher at Millennium Institute Foundational Research on Data. The study "A Study of Data Anonymization in the Chilean Public Sector," conducted by researchers from the Department of Computer Science (DCC) at the University of Chile and the Millennium Institute Foundational Research on Data IMFD), was recently presented at the 2025 Chilean Computer Science Conference, held at the Pontifical Catholic University of Valparaíso.

Tomás Rivas at the Chilean Computing Conference 2025

This paper highlights significant flaws in the anonymization practices of microdata released by Chilean public institutions, and the enormous challenge that all organizations will face with the entry into force of the new Data Protection Law (Law No. 21,719).

Researchers Tomás Rivas, Federico Olmedo, and Matías Toro, all from the Department of Computer Science at the University of Chile, conducted a systematic evaluation of the effectiveness of anonymization techniques applied to public datasets in sensitive areas such as health, education, migration, and electoral processes.

The research is based on an exhaustive analysis of the structural privacy properties of different databases belonging to Chilean public institutions, which are available to the general public. "In this analysis, we selected five databases that should provide us with statistical and anonymous information on health, education, migration, and elections. Each of these datasets was selected based on criteria of sensitivity, microdata granularity, and public availability, "notes Matías Toro Ipinza, also an IMFD researcher and academic at DCC U. Chile.

The techniques used seek to stress and test the privacy of these datasets, using structural methods, which made it possible to demonstrate that most of these databases have structural weaknesses: "In particular, we found that a large part of the data in these databases can be individually identified, and we demonstrated that these vulnerabilities are not just theoretical," says Tomás Rivas, DCC U. Chile and lead author of the research.

To do this, researchers formulated realistic attacks under the concept of the motivated intruder: a model used for this type of testing that considers a typical user with the intention of obtaining information. With this test, they were able to confirm that the personal information of individuals in these databases can be retrieved and that, therefore, it is possible to individually identify the People data is in these supposedly anonymized databases.

These attacks confirmed that sensitive personal information—such as household income decile, voting behavior, or the outcome of a residency application—can be inferred and retrieved in practice with moderate effort. The study successfully de-anonymized individuals in four of the five datasets evaluated.

New national regulations

The findings of this study are critically relevant in the context of Chile's new Personal Data Protection Law (Law No. 21,719). The research highlights a critical gap between legal compliance and effective privacy protection. "This law enables Chile to advance a modernized data governance framework, bringing it closer to international standards such as the European Union's GDPR," Olmedo points out.

However, the law establishes a strict requirement: anonymized data is data from which individual identification is irreversibly excluded. The study shows that current anonymization practices applied by Chilean public institutions do not meet this standard. "We also see that there is interest from the authorities and public organizations, because in the context of this work, we contacted the institutions and several of them are already taking measures," Federico Olmedo points out.

The researchers emphasize that, although the law will be fully implemented by December 2026, it currently does not provide specific technical standards for anonymization. Their findings highlight the urgent need for clear and robust technical guidance, ideally issued by the newly created Personal Data Protection Agency, to ensure that public institutions can safeguard individual privacy and achieve regulatory compliance.

Local solutions to our problems

Regarding future work, Olmedo highlights that it would be interesting to expand the analysis to other data sets, as well as to monitor how changes in databases can modify re-identification risks. In addition, the researchers propose working on automated tools that public agencies can use to assess re-identification risks before publishing data. "We need to have the local capacity, as a country, to develop automated tools: it is highly unlikely that a solution designed for another reality, such as that of countries in the global north where these challenges have already been addressed, can be directly applied to the reality of our national databases. Therefore, promoting research, innovation, and study of these issues is vital for our academic and development communities."