IMFD researchers' work wins award at SIGMOD/PODS

June 2023. Every year, the Association for Computing Machinery (ACM) —founded in 1947 as the first scientific and educational society in the field of computing—organizes the SIGMOD/PODS Conference. Currently, the event is considered one of the most important international forums in the area of data management, where researchers gather to explore new ideas, results, techniques, and experiences. It is in this context that the best papers presented are also awarded: in the 2023 edition, which takes place between June 18 and 23 in Seattle (USA), one of these awards goes to a study co-authored by Domagoj Vrgoč, an academic at the Institute of Mathematical and Computational Engineering at the Catholic University of Chile and a researcher at the Millennium Institute Foundational Research on Data (IMFD), and Renzo Angles, an academic at the Department of Computer Science at the University of Talca and an IMFD researcher. Catholic University of Chile and researcher at Millennium Institute Foundational Research on Data IMFD), and Renzo Angles, an academic at the Department of Computer Science at the University of Talca and IMFD researcher.

According to the organizers, the study entitled "PG-Schema: Schemas for Property Graphs" was chosen as the "Best Paper in the Industrial Track" due to its exceptional quality, originality, and contribution to the field of graph databases. It is the result of joint work by researchers from several higher education institutions, such as the University of Warsaw (Poland), the University of Bayreuth (Germany), the University of Edinburgh (Scotland), and companies such as Amazon Web Services, TigerGraph, Neo4J, and RelationalAI, among others.

"SIGMOD/PODS is one of the largest conferences in the world and one of the most prestigious in the field of databases. Every year, it brings together around 2,000 participants. It has a section called SIGMOD, which covers the more practical aspects, and PODS, which focuses on the more theoretical field," says Domagoj Vrgoč, PhD in Computer Science from the University of Edinburgh (Scotland). As for the paper itself, he points out that the award refers to the fact that the study is "a work with a significant impact on the industry and carried out in collaboration with people from the industry. In fact, it is a paper with more than 20 authors, which represents a very large collaboration that took a long time and solves a specific problem that exists in the field."

Renzo Angles comments that this article was developed by members ofthe Property Graph Schema Working Groupwithin the Linked Data Benchmark Council (LDBC). "In mid-2019, we began discussing the characteristics of graph-based data models and the lack of a standard way to represent their structure or schema. In this regard, the article proposes a formalism for specifying schemas for graphs with properties, that is, a language that allows for the precise description of the types of nodes, edges, and properties existing in a graph-based database, in addition to specifying simple and complex constraints on those types and their relationships."

The potential of the study

In the field of databases, there is a very important branch known as graph databases, in which data is modeled conceptually. "Every entity you want to represent, such as a person, a city, or a workplace, will be a node in your graph. And when you want to link data, you add edges that tell you what the connection is between the different entities. This means that it is a model that does not have a fixed structure; when you want to add a new entity, you simply connect it through edges," says Vrgoč.

This feature means that it is not necessary to have a fixed structure, as is the case in the more traditional area of this field of research, which covers relational databases. "Graph databases do not have a schema, which is understood as a description that tells you 'everything looks like this' and which exists very strongly in the world of relational databases,"says the academic. Instead, he adds, in a graph database there may be "nodes that represent People, but some include only name and country, and others only show a name and age. Therefore, it is not necessary for everything to be structured. In contrast, in relational databases, everything must have the same attributes."

According to Vrgoč, this feature gives graph databases a lot of flexibility, but it can also cause problems "when you have a very large knowledge graph, where you do need a schema that tells you what type of data you have."

The study helps fill that gap. "The paper is called PG-Schema because it refers to a language that allows you to define the schema for a graph database format that is widely used in the industry and is called 'property graphs'. And that's exactly what the work is: a language that allows you to compactly describe what kind of data you have in your database without having to display all that data. It is based on a certain syntax, develops semantics, and makes it easier to define that."

Vrgoč's contribution to the paper was to establish a grammar for that language: "The work I did with a subgroup of that international team, mainly with Filip Murlak (University of Warsaw, Poland) and Wim Martens (University of Bayreuth, Germany), was to design a base language that allows me to describe what I have in a node, what I can have in an edge, how they are linked, and what my graph looks like in general. Then, with the rest of the team, we developed several extensions that ultimately led to this language."

"The development of the article took quite some time because there was a discussion between the real needs raised by the members of the group working in the industry and the theoretical foundations proposed by the members belonging to academia. The end result is a schema specification language that allows graph schemas with different types of constraints to be represented, while respecting important theoretical conditions," says Renzo Angles, who is confident that the article will have an impact on the development of schema specification languages for graph-based database systems.

Researchers hope that, due to its potential, this language will be incorporated into a new ISO standard for graph query language. "Our work is a proposal with input for the group that defines that standard, but it is not yet something that is established in the industry. For now, it is a research project," says Domagoj Vrgoč.

All authors of the study are: Renzo Angles (University of Talca); Angela Bonifati (University of Lyon); Stefania Dumbrava (ENSIIE); George Fletcher (Eindhoven University of Technology, the Netherlands); Alastair Green (Mr); Jan Hidders (Birkbeck, University of London)*; Bei Li (Google); Leonid Libkin (University of Edinburgh & RelationalAI); Victor Marsault (UPEM / CNRS); Wim Martens (University of Bayreuth); Filip Murlak (University of Warsaw, Poland); Stefan Plantikow (Neo4j); Ognjen Savkovic (Free University of Bozen-Bolzano); Michael Schmidt (Amazon Web Services); Juan (data.world); Sławek Staworko (RelationalAI); Dominik Tomaszuk (University of Bialystok); Hannes Voigt (Neo4j); Domagoj Vrgoc (Pontificia Universidad Catolica de Chile); Mingxi Wu (Tigergraph inc); Dušan Živković (Integral Data Solutions)

Source: IMC UC