Millennium DB: the powerful multimodal engine created at the Millennium Institute Foundational Research on Data

December 2023. In a world where enormous amounts of data are produced every second, how can we extract information that is useful to us? This is the question that researchers at Millennium Institute Foundational Research on Data have been working to answer for more than a decade.

" Millennium DB is currently a data manager that allows knowledge graphs containing very large volumes of data to be handled efficiently and effectively using the most modern techniques available and known from research," explains Domagoj Vrgoč, an academic at the Institute of Mathematical and Computational Engineering at the Catholic University of Chile, a researcher at Millennium Institute Foundational Research on Data one of the main authors of the research that led to this work.

Domagoj Vrgoč

Millennium DB is a modular, open-source data management engine that allows you to efficiently manage a variety of knowledge graphs that store large volumes of data. Millennium DBuses techniques that are currently at the forefront of scientific research in this field: it is based on a combination of proven data management techniques, state-of-the-art algorithms for worst-case-optimal joins, and specialized algorithms for evaluating path queries. It allows different information formats to be combined and graphs to be created, from which useful information can be obtained in different ways.

"This software was developed from scratch in Chile, and it has the capacity to compete with and even surpass other systems that have been in development for years. In general, all these types of tools are created in the global northern hemisphere: with Millenium DB, we can say that in Chile we do and develop quality research that can compete with developments in other countries," explains Carlos Rojas, researcher at Millennium Institute Foundational Research on Data project director.

Millenium DB has already been tested with data from different areas. The first tests were carried out with Wikidata, and "one example of where we used this model was for the analysis of the country's social and political situation during the period when the Constitutional Convention was in operation," explains Juan , deputy director of the IMFD. "Beyond the outcome of the Convention, we had a large amount of information during that period: comments on social media, complete broadcasts of the debates that took place in the former Congress, surveys that we also conducted with different audiences and with temporal components, and the text itself as it was being created. We were able to systematize all of this in this model and obtain very enriching analyses that allow us to combine different types of variables."

Juan

For Jazmine Maldonado, Director of Innovation and Technology Transfer at the Millennium Institute Foundational Research on Data, the model has great potential to be developed as a product that can boost different market sectors. "For example, let's think about retail: you have a series of information that you obtain from customers, such as purchasing preferences, searches they perform, periods when they search for something specific, and you also have product prices, characteristics, perhaps images, codes. Companies have all this information available, and it is difficult to find a way to systematize it that allows them to obtain data that is useful for their core business."