IMFD researchers present advances in query systems for graph databases in SIGMOD/PODS

“Worst-Case-Optimal Similarity Joins on Graph Databases” is the title of the research presented at SIGMOD/PODS 2024 by researchers from the Millennium Institute Foundational Research on Data (IMFD) Benjamín Bustos, Aidan Hogan (IMFD deputy director) and Gonzalo Navarro, academics at the Computer Science Department, Universidad de Chile; Diego Arroyuelo and the IMFD director Juan Reutter, academics at the Computer Science Department, Universidad Católica de Chile, and Adrián Gómez-Brandon (Universidade da Coruña).

About this research, Gonzalo Navarro explains: “Recently, there is a lot of work about graph databases, which represent information in the form of nodes (objects) and edges that connect nodes (and represent relationships between objects), because they are more flexible than relational databases. The queries over these graphs are based on a small pattern graph, which has nodes and edges but where the identities of the nodes and edges can be variable. Each way of giving value to the variables so that that pattern appears in the graph is an answer to the query. As a very simplified example, in a graph with information about drugstores, a query such as (F, sells, X), (X, component, paracetamol), (X, price, P), (F, has-agreement, health insurance), will result on the drugstores F with an agreement with my health insurance company that sell some medicine X with paracetamol, and the price P at which they sell it. Responding to those queries efficiently is not easy, and it is not uncommon for some to take seconds or even minutes. A quality standard for these algorithms is to be worst-case optimal. This type of algorithms are relatively recent and there is still much research to carry out in this field.”

In this context, the academic expresses that a line of work developed at the IMFD is the so-called multimodal graphs, “which contain more semantics than the mere connectivity between the nodes. This type of information is very common in real life, and to be consulted effectively requires enriching the language of pattern graphs with specific expressions of the data types. For example, some nodes may be geographic locations and have associated location information, and I would like to be able to search for the previous query, but restricted to the drugstore within a certain radius of distance. Or that the current time of the consult is within the working period of the drugstore. Or that the price is less than a certain limit.”

Professor Gonzalo Navarro points out that this scientific article addresses the problem of being able to add metric information to objects and conditions of proximity to pattern graphs. He explains that metrics are generalizations of Euclidean distances (such as those of the plane or space), which cover not only the example of geographical distance but others such as similarity between images, audio, video, texts, etc. (which are modeled as distances in very high-dimensional spaces). “We also wanted to maintain the worst-case-optimality condition of the search algorithms,” he says.

In this way, the researchers show that it is possible to include this type of information and support a particular type of queries, “which is that of the k nearest neighbors. A graph is formed, among the objects that have the metric, where each node knows which are the k nodes closest/similar to it. The query can then ask that one object be among the k closest/similar to another. This allows us to ask, for example, about churches in my same city whose façade resembles (as an image) one that interests me. We show that the graph of the k neighbors can be joined to the database graph in such a way as to elegantly integrate it into the same worst-case-optimal algorithm, reusing in a certain way all the development already known about this algorithm. We show that, in several cases, we can retain the worst-case-optimality, although not in all. And we implement the new algorithm and show that it is significantly better than a basic solution. This opens the door to supporting other types of queries (by distance, as in the example of pharmacies), and other types of data (topological, geographic regions, temporal, etc.),” comments the academic.

SOURCE: DCC Communications U. of Chile

More news
View : All
1st semester
2nd semester
No news in this category
Show more
Nothing to show