Critical Breach: Vulnerabilities in the anonymization of public data revealed on the eve of the new Data Protection Law
Knowing how many of us there are, how we are doing in health, in education, how we vote, what our origins are: there is a general consensus that this data should be accessible to those who seek to create public policies, innovate, research, study and much more. But how do we ensure the privacy of individual data in a world where it is increasingly easy to handle huge amounts of data?
"The privacy and security of our data are areas that are being increasingly affected by advances in the area of informatics", emphasizes Federico Olmedo, academic of the DCC U. Chile and researcher of the Millennium Institute Foundational Research on Data. The work "A study of data anonymization in the public sector in Chile", carried out by researchers from the Department of Computer Science (DCC) of the University of Chile and the Millennium Institute Foundational Research on Data (IMFD) was recently presented at the Chilean Conference on Computing 2025, held at the Pontificia Universidad Católica de Valparaíso.

This paper exposes significant flaws in the anonymization practices of microdata released by Chilean public institutions, and the enormous challenge that all organizations will face in view of the entry into force of the new Data Protection Law (Law No. 21.719).
Researchers Tomás Rivas, Federico Olmedo, and Matías Toro, all from the Department of Computer Science at the University of Chile, conducted a systematic evaluation of the effectiveness of anonymization techniques applied to public datasets in sensitive areas such as health, education, migration and electoral processes.
The research work is based on an exhaustive analysis of the structural properties of the privacy of different databases of Chilean public institutions, which are available to the general public. "In this analysis, we selected five databases, which should provide us with statistical and anonymous information on health, education, migration and elections. Each of these datasets were selected based on criteria of sensitivity, granularity of the microdata and public availability", highlights Matías Toro Ipinza, also IMFD researcher and DCC U. Chile academic.
The techniques used seek to stress and test the privacy of these datasets, using structural methods, which allowed us to demonstrate that most of these databases have structural weaknesses: "In particular, we found that a large part of the data in these databases can be individually identified, and we showed that these vulnerabilities are not just theoretical," says Tomás Rivas, DCC U. Chile and lead author of the research.
For this, the researchers formulated realistic attacks under the concept of the motivated intruder: a model used for this type of tests that considers a common user, with intentions of obtaining information. With this test, they were able to confirm that the personal information of individuals in these databases can be retrieved, and that therefore, the People whose data are in these supposedly anonymized databases can be individually identified.
These attacks confirmed that sensitive personal information-such as household income decile, voting behavior, or the outcome of a residency application-can be inferred and retrieved in practice with moderate effort. The study successfully de-anonymized individuals in four of the five datasets evaluated.
New national regulations
The findings of this study become critically relevant in the context of Chile's new Personal Data Protection Law (Law No. 21.719). The research highlights a critical gap between legal compliance and effective privacy protection. "This law means that Chile can advance a modernized data governance framework, moving closer to international standards such as the European Union's GDPR," Olmedo highlights.
However, the law establishes a strict requirement: anonymized data is data whose individual identification is irreversibly excluded. The study shows that current anonymization practices applied by Chilean public institutions do not meet this standard. "We also see that there is interest from the authorities and public organizations, as we contacted the institutions in the framework of this work and several of them are already taking measures," says Federico Olmedo.
The researchers emphasize that, while the law will be fully implemented by December 2026, it does not currently provide concrete technical standards for anonymization. Their findings expose the urgent need for clear and robust technical guidance, ideally issued by the newly created Personal Data Protection Agency, to ensure that public institutions can safeguard individual privacy and achieve regulatory compliance.
Local solutions to local problems
Regarding future work, Olmedo highlights that it is interesting to extend the analysis to other datasets, as well as to track how changes in the databases may modify the re-identification risks. In addition, the researchers propose working on automated tools that would allow public agencies to use to assess re-identification risks before publishing the data. "We have to have the local capacity, as a country, to develop automated tools: it is very unlikely that a solution thought up from another reality, such as that of the countries of the global north where these challenges have already arisen, can be directly applied in the reality of our national databases, so the promotion of research, innovation and study of these issues is vital for our academic and also development communities."
