With the concept “cat” in different languages and images, Jorge Pérez, professor at the Department of Computer Science in the University of Chile and associate researcher in the Millennium Institute for Foundational Research on Data, started his presentation at the VI International Scientific Culture Conference organized by the University Andrés Bello, an event open to general public that aims to bring closer the scientific knowledge with the society as a whole.
The different meanings and ways to use the concept “cat” were the link to explain how computers understand human language and represent the meaning of a word through codes. Between laughs and spontaneous participations, the attendees were able to understand the logic of the association between words and the meaning that the computer delivers through codes.
The researcher provided also examples of the applications of his research in the field of political data analysis: in the constituent process of 2016, Pérez designed a Constitutional Explorer, which analyzed and detected the most recurrent topics in the citizen assemblies depending on the commune.
Galaxy of concepts
His most recent work in this line is the construction of a “galaxy where each star is a phrase that Michelle Bachelet said in the previous government. I computed the representation of each of them and put them in three dimensions to be able to explore them”, stated the researcher while projecting a complex three-dimensional data cloud.
The amazing thing about this analysis is that it is not a synthesis, but rather it covers all the speeches of the former president, a work that is only possible because all of her speeches are digitalized.
The researcher used the phrase “everyone knows where the shoe squeezes”, typical from Bachelet: the system was able to identify all the times it was said during his second government. This galaxy is available online for those who want to consult it, in the link bit.do/galaxia-presidencial
Scope of an Artificial Intelligence
From the audience a question arose about the use of data in the context of Artificial Intelligence. “The way we train an Artificial Intelligence today is with data. We have so much data that we can tell a machine to learn from them, but the biases of the data will also be learned. We, as scientists, have to take care of that: an Artificial Intelligence will not learn something that I am not showing to it”, said Pérez, while clarifying that the biases appears in all languages, not only in the case of Spanish.
Jorge Pérez left a challenge in the air: that in this type of research -based on data or seeking to know how to use them better- “ethics should be present at all times and scientists should reward works that increase ethics”. He concluded affirming: “The computer can understand human language to a certain extent, and it can help us to do better, but we must take care not to transfer our biases”.