As a first step towards enhancing our ability to acquire and interpret legacy data, we must standardize and integrate its structure and representation.
Existing techniques are ill-suited to the modern data landscape: the variety of formats now in use, the scale, the rate of change, mixed quality and completeness, among other obstacles.
We thus need to explore foundational data representations and languages that are built from scratch with these modern challenges in mind. We will then need to develop new techniques for querying and reasoning over data adapted for this novel setting. The aim of this research line is to generate solutions to these problematics as a whole, with emphasis on:
-Following a “multimodel” approach, which abstracts away the specifics of each concrete format and provides a convenient integrated view of the variety of sources available.
-Developing techniques for handling the aforementioned tasks at scale.
-Designing suitable notions of approximation for query answering and reasoning under uncertainty/incompleteness —with qualitative and/or quantitative guarantees— when exact computation is infeasible
-Incorporating statistical operators into query languages that simplify subsequent analysis.