Tech_Talk by Prof. Dominik Tomaszuk: Features and metrics for data formats

Coordenadas: jueves 3 de octubre, a las 12.00 horas, en la sala Philippe Flajolet del Departamento de Ciencias de la Computación de la Universidad de Chile (Beauchef 851, Edificio Poniente, Piso 3, Santiago. Metros Toesca o Parque O’Higgins)

ABSTRACT: In information technology, a data format defines a set of syntax rules to encode data. Nowadays there are several data formats to encode text, images, video and other types of data. It is usual that a data format can support different types of data, and the same data can be encoded in different data formats. This many-to-many relationship generates several questions: What is the best data format? Are two data formats comparable? What kind of data (or data model) a data format is able to support? All these questions are related to the features of a data format. In the documentation about data formats, we can find statements such as: lightweight format, concise format, and human-readable format. Unfortunately, the above adjectives are not really as useful as there is no standard meaning for them. In this talk we propose a set of features for a data format (e.g. Flexibility), providing a clear definition and evaluation metrics. Additionally, we use the metrics to compare general data formats (e.g. XML and CSV) and application-oriented formats (e.g. GraphML and GraphSON).

Bio: Dr. Dominik Tomaszuk is a researcher at the University of Bialystok, Faculty of Mathematics and Informatics (Institute of Informatics), Poland. Dominik holds an M.Sc. (2008) in Computer Science, from the Bialystok University of Technology, Poland. He also holds a Ph.D. (2014) in Computer Science from the Warsaw University of Technology, Poland. His current research focuses on Semantic Web, RDF, Property Graphs, NoSQL databases, and cheminformatics.