Every time a user performs a search on the internet, the data goes to a server. The same applies for almost everything we share by e-mail, social networks, mobile apps and others. This information has reached such volume, that in the field of computer science a line has been destined to the study of new models that improve storage systems, understand and take advantage of the connection among them.
Currently, large information companies and services work with a model known as knowledge graphs, a system that has allowed to offer context related to a single search. For instance, you can look for “best places to visit” in Spain or wonder about the “male actor” in that movie you liked. These methods allow Google, for example, to suggest you to visit in Barcelona (Spain) places like Park Güell and La Rambla, but also restaurants nearby and museum opening hours. And what about that actor in that movie you loked? Google not only shows you his name, but also other movies he has starried in and his colleagues.
But what computer science is looking for is to make those results even more intelligent: “The goal is to use the information available in any database to answer really complex questions, the kind that involves discovering the relationships between many variables”, Pablo Barceló, deputy director of the Millennium Institute Foundational Research of Data at the University of Chile, explains.
“Knowledge graphs are a system to explore and organize the connections that exist among data, a network that stands as the basis of the architecture on which Google and other search engines rely. They constitute a new paradigm in information management and it is being used today by the largest technology companies “, Barceló adds.
The quest for a better language to complex queries
The expertise of the Millennium Institute Foundational Research of Data in the study of knowledge graphs is worlwide recognized. Members of the institute are part of an international team of 12 researchers who, along the LDBC Council and companies such as IBM, Oracle and Neo4J, have been working for two years in G-Core, a query language able to discover, extract and understand the most relevant connection between pairs of data.
“One piece of information only acquires value in relation with another”, Claudio Gutiérrez, IMFD senior researcher and professor at the University of Chile, states: “Its richness lies not in the information itself, but in the connections that can be discovered between a node and another”.
The newly created language was presented by the international group in the international conference Sigmod / PODS 2018, one of the most important scientificdata management event, recently held in Houston, United States.
According to Gutiérrez, “G-Core is the only language that discovers the paths between data, thus being able to generate invaluable information to reveal, for example, those connections between power and business, or behavioral patterns between one node of information and another completely different”.
The researchers foresee important applications in social sciences, economics and accountability studies, but it also could be applied in any area with such requirements. “We work in the development and improvement of the techniques to access information and that is the reason why G-Core is so important at a scientific level: this advance could have a great impact on what we know and use of knowledge graphs”, the researcher explains.
The institute expects that this language will soon be translated into an application to be used by professionals who need to extract information from complex data sets with high interrelation.
Scientifically based and waiting to be tested
Marcelo Arenas, director of the Millennium Institute Foundational Research of Data and professor at the Universidad Católica, explains that today the most commonly used graphs query language is Cypher, but G-Core offers two specific improvements: “First, we demonstrate mathematically that all the queries that are made will yield results, something that Cypher is not able to guarantee”, he said. But also, while the answers provided by Cyhper are shown similar to sharts, those of G-Core are presented to the user in the form knowledge graphs: therefor, over a query can be made, potentially, infinite more queries, with each one refining the search”, Arenas concludes.