Publicaciones

Filtrar por

Source

TODOS

Book Chapter

Conference

Book

Journal

Workshop

Signature Projects

TODOS

Development of new query languages for graphs

Data por the analysis of highly complex social problems

Efficient data retrieval from highly complex scenarios

Development of strong information structures

Artificial intelligence with an explanation

2021

Foundations of Symbolic Languages for Model Interpretability

Marcelo Arenas, Daniel Báez, Pablo Barceló, Jorge Pérez, Bernardo Subercaseaux (2021). In: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). https://proceedings.neurips.cc/paper/2021/hash/60cb558c40e4f18479664069d9642d5a-Abstract.html

Sorry, this entry is only available in European Spanish.

Parametric Sensitivity in Graph Neural Network for Urban Cluster Detection, a Case Study

Camila Vera, Francesca Lucchini, Felipe Gutiérrez, Hern ́an Valdivieso, Hans Lobel, Naim Bro, Marcelo Mendoza (2021). In: COMPLEX NETWORKS 2021.THE 10TH INTERNATIONAL CONFERENCE ON COMPLEX NETWORKS AND THEIR APPLICATIONS. November 30 - December 02 , 2021. Madrid, Spain, pp 598-600. https://2021.complexnetworks.org/wp-content/uploads/sites/5/2022/01/CNA2021-Cover.png

Sorry, this entry is only available in European Spanish.

Is it Possible to Verify if a Transaction is Spendable?

Domagoj Vrgoč, Juan Reutter, Marcelo Arenas, Thomas Reisenegger (2021). In: Frontiers Blockchain, 14 December 2021 | https://doi.org/10.3389/fbloc.2021.770503

Sorry, this entry is only available in European Spanish.

Counter a Reactive Media System

Dhavan Shah, Yini Zhang, Jon Pevehouse, Sebastián Valenzuela (2021). In: "Fixing American Politics: Solutions for the Media Age". https://www.taylorfrancis.com/books/edit/10.4324/9781003212515/fixing-american-politics-roderick-hart

Sorry, this entry is only available in European Spanish.

Youth environmental activism in the age of social media: the case of Chile (2009-2019)

Andrés Scherman, Sebastián Rivera, Sebastián Valenzuela (2021). In: Journal of Youth Studies. Latest Articles. https://doi.org/10.1080/13676261.2021.2010691

Sorry, this entry is only available in European Spanish.

A cross sectional study found differential risks for COVID-19 seropositivity amongst health care professionals in Chile

Marcela Zuñiga , Anne J Lagomarcino , Sergio Muñoz , Alfredo Peña Alonso , Miguel L O'Ryan , Sebastián Valenzuela (2021). In: Journal of Clinical Epidemiology 144(9). https://www.researchgate.net/publication/357336538_A_cross_sectional_study_found_differential_risks_for_COVID-19_seropositivity_amongst_health_care_professionals_in_Chile

Sorry, this entry is only available in European Spanish.

A Formal Framework for Complex Event Recognition

Alejandro Grez, Cristian Riveros, Martín Ugarte, Stijn Vansummeren (2021). In: ACM Transactions on Database Systems. Volume 46. Issue 4. December 2021. Article No.: 16, pp 1–49. https://doi.org/10.1145/3485463

Sorry, this entry is only available in European Spanish.

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Waqas Ali, Muhammad Saleem, Bin Yao, Aidan Hogan, Axel-Cyrille Ngonga Ngomo (2021). In: The VLDB Journal volume 31, pages 1–26 (2022). https://link.springer.com/article/10.1007/s00778-021-00711-3

Sorry, this entry is only available in European Spanish.

Surnames and Social Rank: Long-term Traits of Social Mobility in Colombia and Chile

Juliana Jaramillo, Andrés Álvarez, Naim Bro (2021). In: CAF. http://cafscioteca.azurewebsites.net/handle/123456789/1848

Sorry, this entry is only available in European Spanish.

A comprehensive review of the video-to-text problem

Jesus Perez-Martin, Benjamin Bustos, Silvio Jamil F. Guimarães, Ivan Sipiran, Jorge Pérez, Grethel Coello Said (2021). In: Artificial Intelligence Review (2022). https://link.springer.com/article/10.1007/s10462-021-10104-1

Sorry, this entry is only available in European Spanish.

A practical succinct dynamic graph representation

Miguel E. Coimbra, Joana Hrotkó, Alexandre P. Francisco, Luís M.S. Russo, Guillermo de Bernardo, Susana Ladra, Gonzalo Navarro, Gonzalo Navarro (2021). In: Information and Computation. Available online 29 December 2021, 104862. https://doi.org/10.1016/j.ic.2021.104862

Sorry, this entry is only available in European Spanish.

A Neural Networks Approach to SPARQL Query Performance Prediction

Daniel Arturo Casal Amat; Carlos Buil-Aranda; Carlos Valle-Vidal (2021). In: 2021 XLVII Latin American Computing Conference (CLEI). https://ieeexplore.ieee.org/document/9639899

Sorry, this entry is only available in European Spanish.

Boosting perturbation-based iterative algorithms to compute the median string

Pedro Mirabal; José Abreu; Diego Seco; Óscar Pedreira; Edgar Chávez (2021). In: IEEE Access ( Volume: 9). https://ieeexplore.ieee.org/document/9661386

Sorry, this entry is only available in European Spanish.

You Are Fake News! Factors Impacting Journalists’ Debunking Behaviors on Social Media

Hong Tien Vu, Magdalena Saldaña (2021). In: Digital Journalism. Latest Articles. https://doi.org/10.1080/21670811.2021.2004554

Sorry, this entry is only available in European Spanish.

Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse

Rada Chirkova, Jon Doyle, Juan Reutter (2021). In: Journal of Data and Information Quality. Volume 13. Issue 3. September 2021. Article No.: 15pp 1–15. https://doi.org/10.1145/3428154

Sorry, this entry is only available in European Spanish.

Procesamiento de Lenguaje Natural: dónde estamos y qué estamos haciendo.

Felipe Bravo-Márquez, Jocelyn Dunstan (2021). In: Revista Bits de Ciencia. Núm. 21 (2021): Inteligencia artificial. Inteligencia artificial: aplicaciones de la inteligencia artificial. https://revistasdex.uchile.cl/index.php/bits/article/view/2772

Sorry, this entry is only available in European Spanish.

Premio Turing 2019: la revolución de la animación 3D por computadora

Benjamín Bustos, Nancy Hitschfeld (2021). In: Revista Bits de Ciencia. Núm. 21 (2021): Inteligencia artificial. https://revistasdex.uchile.cl/index.php/bits/article/view/2765

Sorry, this entry is only available in European Spanish.

Historia y evolución de la inteligencia artificial

Andrés Abeliuk, Claudio Gutiérrez (2021). In: Revista Bits de Ciencia. Núm. 21 (2021): Inteligencia artificial. https://revistasdex.uchile.cl/index.php/bits/article/view/2767

Sorry, this entry is only available in European Spanish.

Indigenous Movements, Parties, And the State: Comparative Lessons From Latin America

Carla Alberti (2021). In: APSA Comparative Politics Newsletter. Volume 31. Issue 2. Fall 2021. https://www.comparativepoliticsnewsletter.org/wp-content/uploads/2021/12/2021_fall.pdf

Sorry, this entry is only available in European Spanish.

El proyecto Cybersyn: sus antecedentes técnicos

Juan Álvarez , Claudio Gutiérrez (2021). In: Cuadernos de Beauchef. Vol. 5 Núm. 1 (2021): Nostalgia del futuro: ciencia, tecnología y sociedad en Chile. A 50 años del proyecto Cybersyn, pp 101-116. https://revistasdex.uchile.cl/index.php/cdb/article/view/3354/3290

Sorry, this entry is only available in European Spanish.

Conectando la visión y el lenguaje

Jesús Pérez-Martín, Jorge Pérez, Benjamín Bustos (2021). In: Revista Bits de Ciencia. https://revistasdex.uchile.cl/index.php/bits/article/view/2780/2712

Sorry, this entry is only available in European Spanish.

Aprendizaje profundo en sistemas de recomendación

Denis Parra (2021). In: Revista Bits de Ciencia, Núm. 21 (2021): Inteligencia artificial. https://revistasdex.uchile.cl/index.php/bits/article/view/2777

Sorry, this entry is only available in European Spanish.

Aprendizaje de representaciones en grafos y su importancia en el análisis de redes

(Español) Marcelo Mendoza (2021). In: Revista Bits de Ciencia. https://revistasdex.uchile.cl/index.php/bits/article/view/2776/2709

Sorry, this entry is only available in European Spanish.

¿Puede una máquina ver mejor que un humano?

Javier Carrasco, Aidan Hogan, Jorge Pérez (2021). In: Revista Bits de Ciencia. Núm. 21 (2021): Inteligencia artificial. https://revistasdex.uchile.cl/index.php/bits/article/view/2771

Words, Tweets, and Reviews: Leveraging Affective Knowledge Between Multiple Domains

Felipe Bravo-Marque, Cristián Tamblay (2021). In: Cognitive Computation volume 14, pages 388–406 (2022). https://link.springer.com/article/10.1007/s12559-021-09923-9

Three popular application domains of sentiment and emotion analysis are: 1) the automatic rating of movie reviews, 2) extracting opinions and emotions on Twitter, and 3) inferring sentiment and emotion associations of words. The textual elements of these domains differ in their length, i.e., movie reviews are usually longer than tweets and words are obviously shorter than tweets, but they also share the property that they can be plausibly annotated according to the same affective categories (e.g., positive, negative, anger, joy). Moreover, state-of-the-art models for these domains are all based on the approach of training supervised machine learning models on manually annotated examples. This approach suffers from an important bottleneck: Manually annotated examples are expensive and time-consuming to obtain and not always available. In this paper, we propose a method for transferring affective knowledge between words, tweets, and movie reviews using two representation techniques: Word2Vec static embeddings and BERT contextualized embeddings. We build compatible representations for movie reviews, tweets, and words, using these techniques, and train and evaluate supervised models on all combinations of source and target domains. Our experimental results show that affective knowledge can be successfully transferred between our three domains, that contextualized embeddings tend to outperform their static counterparts, and that better transfer learning results are obtained when the source domain has longer textual units than the target domain.

Time- and Space-Efficient Regular Path Queries on Graphs

Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Javiel Rojas-Ledesma (2021). In: arXiv:2111.04556. https://doi.org/10.48550/arXiv.2111.04556

We introduce a time- and space-efficient technique to solve regularpath queries over labeled graphs. We combine a bit-parallel simula-tion of the Glushkov automaton of the regular expression with thering index introduced by Arroyuelo et al., exploiting its wavelettree representation of the triples in order to efficiently reach thestates of the product graph that are relevant for the query. Ourquery algorithm is able to simultaneously process several automa-ton states, as well as several graph nodes/labels. Our experimentalresults show that our representation uses 3-5 times less space thanthe alternatives in the literature, while generally outperformingthem in query times (1.67 times faster than the next best).

Time series classification for rumor detection

F. Weiss, E. Milios, Marcelo Mendoza (2021). In: 11th International Conference of Pattern Recognition Systems (ICPRS 2021), 2021 p. 176 – 18. https://digital-library.theiet.org/content/conferences/10.1049/icp.2021.1466

Sorry, this entry is only available in European Spanish.

The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

F. Förster, G. Cabrera-Vives, E. Castillo-Navarrete, P. A. Estévez, P. Sánchez-Sáez, J. Arredondo, F. E. Bauer, R. Carrasco-Davis, M. Catelan, F. Elorrieta, S. Eyheramendy, P. Huijse, G. Pignata, E. Reyes, I. Reyes, D. Rodríguez-Mancini, D. Ruz-Mieres, C. Valenzuela, I. Alvarez-Maldonado, N. Astorga, J. Borissova, A. Clocchiatti, D. De Cicco, C. Donoso-Oliva, M. J. Graham, R. Kurtev, A. Mahabal, J.C. Maureira, R. Molina-Ferreiro, A. Moya, W. Palma, M. Pérez-Carrasco, P. Protopapas, M. Romero, L. Sabatini-Gacitúa, A. Sánchez, J. San Martín, C. Sepúlveda-Cobo, E. Vera, J. R. Vergara (2021). In: arXiv:2008.03303. https://doi.org/10.48550/arXiv.2008.03303

Sorry, this entry is only available in European Spanish.

Temporal Regular Path Queries

Marcelo Arenas, Pedro Bahamondes, Amir Aghasadeghi, Julia Stoyanovich (2021) In: arXiv:2107.01241. https://doi.org/10.48550/arXiv.2107.01241

Sorry, this entry is only available in European Spanish.

State-space Representation of Matérn and Damped Simple Harmonic Oscillator Gaussian Processes

Jordán, Andrés ; Eyheramendy, Susana ; Buchner, Johannes (2921). In: Research Notes of the AAS. arXiv:2109.10685. https://doi.org/10.48550/arXiv.2109.10685

Sorry, this entry is only available in European Spanish.

“Searching for changing-state AGNs in massive datasets – I: applying deep learning and anomaly detection techniques to find AGNs with anomalous variability behaviours”

P. Sánchez-Sáez, H. Lira, L. Martí, N. Sánchez-Pi, J. Arredondo, F. E. Bauer, A. Bayo, G. Cabrera-Vives, C. Donoso-Oliva, P. A. Estévez, S. Eyheramendy, F. Förster, L. Hernández-García, A. M. Muñoz Arancibia, M. Pérez-Carrasco, M. Sepúlveda, J. R. Vergara (2021). In: arXiv:2106.07660. https://arxiv.org/abs/2106.07660

Sorry, this entry is only available in European Spanish.

Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis

Leopoldo Bertossi (2021). In: arXiv:2106.10562. https://arxiv.org/abs/2106.10562

Sorry, this entry is only available in European Spanish.

Recordar el futuro, la frontera de las tecnologías de búsqueda

Ricardo Baeza-Yates (2021). In: "De neuronas a galaxias. ¿Es el universo un holograma?". https://dialnet.unirioja.es/servlet/libro?codigo=794380

Sorry, this entry is only available in European Spanish.

Principles of Databases (Preliminary Version)

Marcelo Arenas, Pablo Barceló, Leonid Libkin, Wim Martens, Andreas Pieris (2021). In: https://www.theoinf.uni-bayreuth.de/pool/documents/Paper2021-25/Paper2021/pdm-public.pdf

Sorry, this entry is only available in European Spanish.

On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, Mikaël Monet (2021). In: arXiv:2104.08015. https://arxiv.org/abs/2104.08015

Sorry, this entry is only available in European Spanish.

Old Adults Are More Engaged on Facebook, Especially in Politics: Evidence From Users in 46 Countries

Pablo Ortellado, M. M. Ribeiro, G. Kessler, Gabriel Vommaro, Juan Carlos Rodríguez-Raga, Eduarth Heinen, Laura Fernanda Cely, Juan Pablo Luna, Sergio Toro (2021). In: SSRN Electronic Journal. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3839688

Sorry, this entry is only available in European Spanish.

Notas para el debate constitucional: breve análisis comparado sobre tres dimensiones institucionales en disputa

Sergio Toro (2021). In: Thompson Reutter. https://www.researchgate.net/publication/344800298_Notas_para_el_debate_Constitucional_en_Chile_Breve_analisis_comparado_sobre_tres_dimensiones_institucionales_en_disputa

Sorry, this entry is only available in European Spanish.

New insights from GWAS on longitudinal and cross-sectional BMI and related phenotypes in admixed children with Native American and European ancestries

Esteban Barrientos, Tomás Norambuena, Danilo Alvares, Juan Cristobal Gana, Valeria Leiva, Veronica Mericq, Cristian Meza, Ana Pereira, José L. Santos, Lucas Vicuña, Susana Eyheramendy (2021). In: medRxiv. https://doi.org/10.1101/2021.09.24.21263664

Sorry, this entry is only available in European Spanish.

Neural Abstractive Unsupervised Summarization of Online News Discussions

Marcelo Mendoza, Ignacio Tampe, Evangelos Milios (2021). In: arXiv:2106.03953. https://arxiv.org/abs/2106.03953

Sorry, this entry is only available in European Spanish.

Las condiciones sociohistóricas de América Latina

(Español) Sergio Toro, Danny Monsalve, Noelia Carrasco y Rodrigo Pulgar (2021). In: "Las condiciones sociohistóricas de América Latina: Un abordaje desde el desarrollo, la autoridad, la política y la historia”.

Sorry, this entry is only available in European Spanish.

(Español) Knowledge graphs

Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, Antoine Zimmermann, Aidan Hogan, Claudio Gutiérrez (2021). In: ACM Computing SurveysVolume 54Issue 4May 2022 Article No.: 71pp 1–37https://doi.org/10.1145/3447772

In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as languages used to query and validate knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We conclude with high-level future research directions for knowledge graphs.

Investigative journalism in Latin America today

Magdalena Saldaña, Silvio Waisbord (2021). In: Investigative Journalism. Third edition. https://www.taylorfrancis.com/chapters/edit/10.4324/9780429060281-20/investigative-journalism-latin-america-today-magdalena-salda%C3%B1a-silvio-waisbord

Sorry, this entry is only available in European Spanish.

Innovación social en ciudades portuarias de Chile. De la logística a la articulación territorial en la Región del Biobío

Mabel Alarcón, Hernán Cuevas, Violeta Monero, Claudia García, Alejandro Tudela, Verónica Alarcón, Sergio Toro (2021). In: Red Internacional del Libro Ltda. https://rileditores.com/tienda/innovacion-social-en-ciudades-portuarias-de-chile-de-la-logistica-a-la-articulacion-territorial-en-la-region-del-biobio/

Sorry, this entry is only available in European Spanish.

Improved screening of COVID-19 cases through a Bayesian network symptoms model and psychophysical olfactory test

Pedro A. Saa, Eduardo A. Undurraga, Carlos Valencia, Carolina López, Luis Méndez, Javier Pizarro-Berdichevsky, Andrés Finkelstein-Kulka, Sandra Solari, Eduardo Agosin, Susana Eyheramendy, Nicolás Salas, Pedro Bahamondes, Martín Ugarte, Pablo Barceló, Marcelo Arenas (2021). In: medRxiv. https://www.sciencedirect.com/science/article/pii/S2589004221013900

Sorry, this entry is only available in European Spanish.

Graph Neural Networks with Local Graph Parameters

Pablo Barceló, Floris Geerts, Juan Reutter, Maksimilian Ryschkov (2021). In: arXiv:2106.06707. https://arxiv.org/abs/2106.06707

Sorry, this entry is only available in European Spanish.

Foundations of Symbolic Languages for Model Interpretability

Marcelo Arenas, Daniel Baez, Pablo Barceló, Jorge Pérez, Bernardo Subercaseaux (2021). In: arXiv:2110.02376. https://arxiv.org/abs/2110.02376

Sorry, this entry is only available in European Spanish.

Fake News Detection via English-to-Spanish Translation: Is It Really Useful?

Sebastián Ruíz, Eliana Providel, Marcelo Mendoza (2021). In: International Conference on Human-Computer Interaction. HCII 2021: Social Computing and Social Media: Experience Design and Social Network Analysis pp 136–148. Part of the Lecture Notes in Computer Science book series (LNISA,volume 12774). https://link.springer.com/chapter/10.1007/978-3-030-77626-8_9

Social networks are used every day to report daily events, although the information published in them many times correspond to fake news. Detecting these fake news has become a research topic that can be approached using deep learning. However, most of the current research on the topic is available only for the English language. When working on fake news detection in other languages, such as Spanish, one of the barriers is the low quantity of labeled datasets available in Spanish. Hence, we explore if it is convenient to translate an English dataset to Spanish using Statistical Machine Translation. We use the translated dataset to evaluate the accuracy of several deep learning architectures and compare the results from the translated dataset and the original dataset in fake news classification. Our results suggest that the approach is feasible, although it requires high-quality translation techniques, such as those found in the translation’s neural-based models.

Extending Sticky-Datalog+/-via Finite-Position Selection Functions: Tractability, Algorithms, and Optimization

Leopoldo Bertossi, Mostafa Milani (2021). In: arXiv:2108.00903. https://arxiv.org/abs/2108.00903

Weakly-Sticky(WS) Datalog+/- is an expressive member of the family of Datalog+/- program classes that is defined on the basis of the conditions of stickiness and weak-acyclicity. Conjunctive query answering (QA) over the WS programs has been investigated, and its tractability in data complexity has been established. However, the design and implementation of practical QA algorithms and their optimizations have been open. In order to fill this gap, we first study Sticky and WS programs from the point of view of the behavior of the chase procedure. We extend the stickiness property of the chase to that of generalized stickiness of the chase (GSCh) modulo an oracle that selects (and provides) the predicate positions where finitely values appear during the chase. Stickiness modulo a selection function S that provides only a subset of those positions defines sch(S), a semantic subclass of GSCh. Program classes with selection functions include Sticky and WS, and another syntactic class that we introduce and characterize, namely JWS, of jointly-weakly-sticky programs, which contains WS. The selection functions for these last three classes are computable, and no external, possibly non-computable oracle is needed. We propose a bottom-up QA algorithm for programs in the class sch(S), for a general selection function S. As a particular case, we obtain a polynomial-time QA algorithm for JWS and weakly-sticky programs. Unlike WS, JWS turns out to be closed under magic-sets query optimization. As a consequence, both the generic polynomial-time QA algorithm and its magic-set optimization can be particularized and applied to WS.

Evaluating Interactive Comparison Techniques in a Multiclass Density Map for Visual Crime Analytics

Lukas Svicarovic, María Jesús Lobo, Denis Parra (2021). In: EuroVis 2021 - Short Papers. https://doi.org/10.2312/evs.20211059

Techniques for presenting objects spatially via density maps have been thoroughly studied, but there is lack of research on how to display this information in the presence of several classes, i.e., multiclass density maps. Moreover, there is even less research on how to design an interactive visualization for comparison tasks on multiclass density maps. One application domain which requires this type of visualization for comparison tasks is crime analytics, and the lack of research in this area results in ineffective visual designs. To fill this gap, we study four types of techniques to compare multiclass density maps, using car theft data. The interactive techniques studied are swipe, translucent overlay, magic lens, and juxtaposition. The results of a user study (N=32) indicate that juxtaposition yields the worst performance to compare distributions, whereas swipe and magic lens perform the best in terms of time needed to complete the experiment. Our research provides empirical evidence on how to design interactive idioms for multiclass density spatial data, and it opens a line of research for other domains and visual tasks.

Differences in citation patterns across areas, article types and age groups of researchers

Marcelo Mendoza (2021). In: Publications 2021, 9(4), 47; https://doi.org/10.3390/publications9040047

The evaluation of research proposals and academic careers is subject to indicators of scientific productivity. Citations are critical signs of impact for researchers, and many indicators are based on these data. The literature shows that there are differences in citation patterns between areas. The scope and depth that these differences may have to motivate the extension of these studies considering types of articles and age groups of researchers. In this work, we conducted an exploratory study to elucidate what evidence there is about the existence of these differences in citation patterns. To perform this study, we collected historical data from Scopus. Analyzing these data, we evaluate if there are measurable differences in citation patterns. This study shows that there are evident differences in citation patterns between areas, types of publications, and age groups of researchers that may be relevant when carrying out researchers’ academic evaluation

Data Science for Engineers: A Teaching Ecosystem

Felipe Tobar; Felipe Bravo-Marquez; Jocelyn Dunstan; Joaquin Fontbona; Alejandro Maass, Daniel Remenik, Jorge F. Silva (2021). In: IEEE Signal Processing Magazine ( Volume: 38, Issue: 3, May 2021). https://ieeexplore.ieee.org/document/9418568/authors#authors

We describe an ecosystem for teaching data science (DS) to engineers that blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences (FCFM is its Spanish acronym), Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments.

Cross-lingual hate speech detection based on multilingual domain-specific word embeddings

Aymé Arango, Jorge Pérez, Barbara Poblete (2021). In: arXiv:2104.14728 . https://arxiv.org/abs/2104.14728

Automatic hate speech detection in online social networks is an important open problem in Natural Language Processing (NLP). Hate speech is a multidimensional issue, strongly dependant on language and cultural factors. Despite its relevance, research on this topic has been almost exclusively devoted to English. Most supervised learning resources, such as labeled datasets and NLP tools, have been created for this same language. Considering that a large portion of users worldwide speak in languages other than English, there is an important need for creating efficient approaches for multilingual hate speech detection. In this work we propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language, and to determine effective ways to achieve this. We propose a hate specific data representation and evaluate its effectiveness against general-purpose universal representations most of which, unlike our proposed model, have been trained on massive amounts of data. We focus on a cross-lingual setting, in which one needs to classify hate speech in one language without having access to any labeled data for that language. We show that the use of our simple yet specific multilingual hate representations improves classification results. We explain this with a qualitative analysis showing that our specific representation is able to capture some common patterns in how hate speech presents itself in different languages.
Our proposal constitutes, to the best of our knowledge, the first attempt for constructing multilingual specific-task representations. Despite its simplicity, our model outperformed the previous approaches for most of the experimental setups. Our findings can orient future solutions toward the use of domain-specific representations.

COVID-19 in Chile: A Health Crisis amidst a Political Crisis amidst a Social Crisis

Ingrid Bachmann, Sebastián Valenzuela, Arturo Figueroa-Bustos (2021). In: "Political Communication in the Time of Coronavirus". https://www.taylorfrancis.com/chapters/edit/10.4324/9781003170051-5/covid-19-chile-ingrid-bachmann-sebasti%C3%A1n-valenzuela-arturo-figueroa-bustos

Once regarded as the poster child for democratic stability and sound policymaking in Latin America, in the last two decades Chile has experienced increasing levels of mistrust in political institutions and media elites, as well as disenfranchisement. In the wake of the mass protests of October 2019, the COVID-19 pandemic found the Chilean government at record levels of disapproval and with citizens skeptical of messages by authorities and legacy media. Based on data from an online survey and a narrative analysis of public discourse of key government interventions during the first six months of the pandemic, this chapter pays attention to individuals’ perceptions regarding the coronavirus crisis and offers a qualitative assessment of how the government’s handling was addressed in the public sphere. Findings show that Chileans have been skeptical of government measures and critical of officials’ handling of the situation, regardless of their support for the administration. With the news media struggling to hold authorities accountable, the resulting crisis has only deepened the political, economic, and social divisions within Chilean society.

Castigo a los oficialismos y ciclo político de derecha en América Latina

Cristóbal Rovira Kaltwasser, Juan Pablo Luna (2021). In: Revista Uruguaya de Ciencia Política. https://www.researchgate.net/publication/352705640_Castigo_a_los_oficialismos_y_ciclo_politico_de_derecha_en_America_Latina

This article presents a characterization of three type of strategies (non-electoral, non-partisan electoral, and partisan) pursued by the right in contemporary Latin America. By analyzing the recent course of the party systems in the region, we also argue that the so-called turn to the right constitutes a process of power alternation generated by the punishment of incumbents of the last decade and a half (mostly on the left), rather than a structural ideological realignment. This alternation in power occurs in a context, in which established parties tend to disappear or become substantially weaker, and in which short-lived electoral vehicles are gaining traction. Finally, we argue that there does not seem to be space in the region today -especially because of the social crisis associated with the effects of the covid-19 pandemic- for the strengthening of a neoliberal right. However, the current context does seem propitious for the emergence of right-wing outsiders, capable of structuring a pro-order agenda that incorporates, in different proportions, ‘iron-hand’ policies, value conservatism, and market liberalism.

Cabildo Abierto: oportunidades y desafíos para la construcción partidaria en un sistema de partidos institucionalizado

Felipe Monestier, Lihuen Nocetto, Fernando Rosenblatt (2021). In: "De la estabilidad al equilibrio inestable: elecciones y comportamiento electoral".

“Cabildo Abierto: oportunidades y desafíos para la construcción partidaria en un sistema de partidos institucionalizado”. In Juan Andrés Moraes and Verónica Pérez Bentancur. De la estabilidad al equilibrio inestable: elecciones y comportamiento electoral. Instituto de Ciencia Política ( Con Felipe Monestier y Lihuen Nocetto).

Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review

Jesus Perez-Martin, Benjamin Bustos, Silvio Jamil F. Guimarães, Ivan Sipiran, Jorge Pérez, Grethel Coello Said (2021). In: arXiv:2103.14785 . https://arxiv.org/abs/2103.14785

Sorry, this entry is only available in European Spanish.

An irregularly spaced first-order moving average model

Cesar Ojeda, Wilfredo Palma, Susana Eyheramendy, Felipe Elorrieta (2021). In: arXiv:2105.06395. https://arxiv.org/abs/2105.06395

Sorry, this entry is only available in European Spanish.

AI & Human Values

Laurence Devillers, Françoise Fogelman-Soulié, Ricardo Baeza-Yates (2021). In: Reflections on Artificial Intelligence for Humanity pp 76–89. https://link.springer.com/chapter/10.1007/978-3-030-69128-8_6

Sorry, this entry is only available in European Spanish.

A trend study in the stratification of social media use among urban youth: Chile 2009-2019

Teresa Correa, Sebastián Valenzuela (2021). In: Journal of Quantitative Description: Digital Media. https://doi.org/10.51685/jqd.2021.009

Sorry, this entry is only available in European Spanish.

A novel bivariate autoregressive model for predicting and forecasting irregularly observed time series

Felipe Elorrieta, Susana Eyheramendy, Wilfredo Palma, Cesar Ojeda (2021). In: arXiv:2104.12248 . https://arxiv.org/abs/2104.12248

Sorry, this entry is only available in European Spanish.

A New Content-Based Image Retrieval System for SARS-CoV-2 Computer-Aided Diagnosis

Gabriel Molina, Marcelo Mendoza, Ignacio Loayza, Camilo Núñez, Mauricio Araya, Víctor Castañeda, Mauricio Solar (2021). In: International Conference on Medical Imaging and Computer-Aided Diagnosis MICAD 2021: Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2021) pp 316–324. https://link.springer.com/chapter/10.1007/978-981-16-3880-0_33

Sorry, this entry is only available in European Spanish.

A data-driven strategy to combine word embeddings in information retrieval

Alfredo Silva, Marcelo Mendoza (2021). In: arXiv:2105.12788. https://arxiv.org/abs/2105.12788

Sorry, this entry is only available in European Spanish.

What “Emergency Sources” Expect From Journalists: Applying the Hierarchy of Influences Model to Disaster News Coverage

Daniela Grassau, Sebastián Valenzuela, Soledad Puente (2021). In: International Journal of Communication. Vol 15 (2021). https://ijoc.org/index.php/ijoc/article/view/14450

Sorry, this entry is only available in European Spanish.

Using pre-analysis plans in qualitative research

Pérez Bentancur, Verónica; Piñeiro Rodríguez, Rafael; Rosenblatt, Fernando (2021). In: Qualitative & Multi-Method Research . https://zenodo.org/record/5495552#.YmqrCJLMJpQ

Sorry, this entry is only available in European Spanish.

Towards intellectual freedom in an AI Ethics Global Community

Christoph Ebell, Richard Benjamins, Hengjin Cai, Mark Coeckelbergh, Tania Duarte, Merve Hickok, Aurelie Jacquet, Angela Kim, Joris Krijger, John MacIntyre, Piyush Madhamshettiwar, Lauren Maffeo, Jeanna Matthews, Larry Medsker, Peter Smith & Savannah Thais , Ricardo Baeza-Yates (2021). In: AI and Ethics volume 1, pages 131–138 (2021). https://link.springer.com/article/10.1007/s43681-021-00052-5

Sorry, this entry is only available in European Spanish.

Topic Models Ensembles for AD-HOC Information Retrieval

Pablo Ormeño, Carlos Valle, Marcelo Mendoza (2021). In: Information 2021, 12(9), 360; https://doi.org/10.3390/info12090360

Sorry, this entry is only available in European Spanish.

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Andrés Villa, Juan-Manuel Perez-Rua, Victor Escorcia, Vladimir Araujo, Juan Carlos Niebles, Alvaro Soto (2021). In: arXiv:2106.11173. https://arxiv.org/abs/2106.11173

Sorry, this entry is only available in European Spanish.

The Shapley Value of Tuples in Query Answering

Ester Livshits ; Leopoldo Bertossi ; Benny Kimelfeld ; Moshe Sebag (2021). In: Logical Methods in Computer Science, September 2, 2021, Volume 17, Issue 3 - https://doi.org/10.46298/lmcs-17(3:22)2021

Sorry, this entry is only available in European Spanish.

The Personal Is the Political? What Do WhatsApp Users Share and How It Matters for News Knowledge, Polarization and Participation in Chile

Ingrid Bachmann, Matías Bargsted, Sebastián Valenzuela (2021). In: Digital Journalism. Volume 9, 2021 - Issue 2: Digital Journalism in Latin America. Guest Editors: Pablo J. Boczkowski and Eugenia Mitchelstein. https://doi.org/10.1080/21670811.2019.1693904

Sorry, this entry is only available in European Spanish.

The Marriage of Univalence and Parametricity

Nicolas Tabareau, Éric Tanter, Matthieu Sozeau (2021). In: Journal of the ACMVolume 68Issue 1February 2021 Article No.: 5pp 1–44https://doi.org/10.1145/3429979

Sorry, this entry is only available in European Spanish.

The future is big graphs: a community view on graph processing systems

Sakr Sherif, Bonifati Angela, Voigt Hannes, Iosup Alexandru, Ammar Khaled, Aref Walid, Besta Maciej, Boncz Peter A, Daudjee Khuzaima, Della Valle Emanuele, Dumbrava Stefania, Hartig Olaf, Haslhofer Bernhard, Hegeman Tim, Hidders Jan, Hose, Katja Iamnitchi, Adriana Kalavri, Vasiliki Kapp, Hugo Martens, Wim Özsu, Tamer M. Peukert, Eric Plantikow, Stefan Ragab, Mohamed Ripeanu, Matei R. Salihoglu, Semih Schulz, Christian Selmer, Petra Sequeda, Juan F. Shinavier, Joshua Szárnyas, Gábor Tommasini, Riccardo Tumeo, Antonino Uta, Alexandru Varbanescu, Ana L. Wu, Hsiang-Yun Yakovets, Nikolay Yan, Da Yoneki, Eiko, Renzo Angles, Marcelo Arenas (2021). In: Communicatios of the ACM, September 2021, Vol. 64 No. 9, Pages 62-71. https://cacm.acm.org/magazines/2021/9/255040-the-future-is-big-graphs/fulltext

Sorry, this entry is only available in European Spanish.

The Complexity of Counting Problems Over Incomplete Databases

Mikael Monet, Marcelo Arenas, Pablo Barceló (2021). In: ACM Transactions on Computational LogicVolume 22Issue 4October 2021 Article No.: 21pp 1–52https://doi.org/10.1145/3461642

Sorry, this entry is only available in European Spanish.

Surname affinity in Santiago, Chile: A network-based approach that uncovers urban segregation.

Naim Bro, Marcelo Mendoza (2021). In: PLoS ONE. https://doi.org/10.1371/journal.pone.0244372

Sorry, this entry is only available in European Spanish.

Supporting the classification of patients in public hospitals in Chile by designing, deploying and validating a system based on natural language processing

Fabián Villena, Jorge Pérez, René Lagos, Jocelyn Dunstan (2021). In: BMC Medical Informatics and Decision Making volume 21, Article number: 208 (2021). https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01565-z

Sorry, this entry is only available in European Spanish.

Succinct Encoding of Binary Strings Representing Triangulations

José Fuentes-Sepúlveda, Raquel Viaña, Diego Seco (2021). In: Algorithmica volume 83, pages 3432–3468 (2021). https://link.springer.com/article/10.1007/s00453-021-00861-4

Sorry, this entry is only available in European Spanish.

Subversive affordances as a form of digital transnational activism: The case of Telegram’s native proxy

Marcelo Santos, Ksenia Tsyganova, Magdalena Saldaña (2021). In: New Media & Society. https://doi.org/10.1177/14614448211054830

Sorry, this entry is only available in European Spanish.

Stronger and safer together: Motivations for and challenges of (trans) national collaboration in investigative reporting in Latin America

Lourdes Cueva, Magdalena Saldaña (2021). In: Digital Journalism Volume 9, 2021 - Issue 2: Digital Journalism in Latin America. Guest Editors: Pablo J. Boczkowski and Eugenia Mitchelstein. https://doi.org/10.1080/21670811.2020.1775103

Sorry, this entry is only available in European Spanish.

Specifying and computing causes for query answers in databases via database repairs and repair-programs

Leopoldo Bertossi (2021). In: Knowledge and Information Systems 63(4&5):1-33. https://www.researchgate.net/publication/346056817_Specifying_and_computing_causes_for_query_answers_in_databases_via_database_repairs_and_repair-programs

Sorry, this entry is only available in European Spanish.

Space-efficient representations of raster time series

Fernando Silva-Coira, José Paramá, Guillermo de Bernardo, Diego Seco (2021). In: Information Sciences. Volume 566, August 2021, Pages 300-325. https://doi.org/10.1016/j.ins.2021.03.035

Sorry, this entry is only available in European Spanish.

SHREC 2021: Retrieval of Cultural Heritage Objects

Ivan Sipiran, Patrick Lazo, CristianLopez, Milagritos Jimenez, Nihar Bagewadie, Hieu Dao, Shankar Gangisetty, Martin Hanik, Ngoc-Phuong Ho-Thi, Mike Holenderski, Dmitri Jarnikovhi, Arniel Labrada, Stefan Lengauer, Roxane Licandro, Dinh-Huan Nguyen, Thang-Long Nguyen-Ho, Luis A. Perez Rey, Bang-Dang Pham, Minh-Khoi Pham, Reinhold Preiner, Tobias Schreck, Quoc-Huy Trinh, Loek Tonnaer, Christoph von Tycowicz, The-Anh Vu-Le, Benjamín Bustos (2021). In: Computers & Graphics. Volume 100, November 2021, Pages 1-20. https://doi.org/10.1016/j.cag.2021.07.010

Sorry, this entry is only available in European Spanish.

Rethinking the Virtuous Circle Hypothesis on Social Media: Subjective versus Objective Knowledge and Political Participation

Sangwon Lee, Trevor Diehl, Sebastián Valenzuela (2021). In: Human Communication Research, Volume 48, Issue 1, January 2022, Pages 57–87, https://doi.org/10.1093/hcr/hqab014

Sorry, this entry is only available in European Spanish.

Regularizing conjunctive features for classification

Pablo Barceló, Alexander Baumgartner, Victor Dalmau, Benny Kimelfeld (2021). In: Journal of Computer and System Sciences 119. https://www.researchgate.net/publication/349384840_Regularizing_conjunctive_features_for_classification

Sorry, this entry is only available in European Spanish.

Recursion in SPARQL

Domagoj Vrgoc, Juan Reutter, Adrián Soto (2021). In: Semantic Web, vol. 12, no. 5, pp. 711-740, 2021. https://content.iospress.com/articles/semantic-web/sw200401

Sorry, this entry is only available in European Spanish.

Range Majorities and Minorities in Arrays

Djamal Belazzougui, Travis Gagie, Ian Munro, Yakov Nekrich, Gonzalo Navarro (2021). In: Algorithmica volume 83, pages 1707–1733. https://link.springer.com/article/10.1007/s00453-021-00799-7

Sorry, this entry is only available in European Spanish.

Raising the Red Flag: Democratic Elitism and the Protests in Chile

Matthew Rhodes-Purdy, Fernando Rosenblatt (2021). In: Perspectives on Politics , First View , pp. 1 - 13. https://doi.org/10.1017/S1537592721000050

Sorry, this entry is only available in European Spanish.

Question Answering over Knowledge Graphs with Neural Machine Translation and Entity Linking

Daniel Diomedi, Aidan Hogan (2021). In: arXiv:2107.02865. https://arxiv.org/abs/2107.02865

Sorry, this entry is only available in European Spanish.

Query Games in Databases

Ester Livshits, Benny Kimelfeld, Moshe Sebag, Leopoldo Bertossi (2021). In: ACM SIGMOD Record. Volume 50. Issue 1. March 2021, pp 78–85. https://doi.org/10.1145/3471485.3471504

Sorry, this entry is only available in European Spanish.

Pumping Lemmas for Weighted Automata

Agnishom Chattopadhyay, Filip Mazowiecki, Anca Muscholl, Cristian Riveros (2021). In: Logical Methods in Computer Science. https://lmcs.episciences.org/7692/pdf

Sorry, this entry is only available in European Spanish.

Puc chile team at vqa-med 2021: approaching vqa as a classfication task via fine-tuning a pretrained cnn

Ricardo Schilling, Pablo Messina, Denis Parra, Hans Löbel (2021). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2936/paper-113.pdf

Sorry, this entry is only available in European Spanish.

PUC Chile team at TBT Task: Diagnosis of Tuberculosis Type using segmented CT scans

José Miguel Quintana, Daniel Florea, Ria Deane, Denis Parra, Pablo Pino, Pablo Messina , Hans Löbel (2021). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2936/paper-112.pdf

Sorry, this entry is only available in European Spanish.

PUC Chile team at Concept Detection: K Nearest Neighbors with Perceptual Similarity

Gregory Schuit, Vicente Castro, Pablo Pino, Denis Parra, Hans Lobel (2021). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2936/paper-114.pdf

Sorry, this entry is only available in European Spanish.

PUC Chile team at Caption Prediction: ResNet visual encoding and caption classification with Parametric ReLU

Vicente Castro, Pablo Pino, Denis Parra, Hans Lobel (2021). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2936/paper-95.pdf

Sorry, this entry is only available in European Spanish.

Predicting affinity ties in a surname network

Marcelo Mendoza, Naim Bro (2021). In: PLoS ONE. https://doi.org/10.1371/journal.pone.0256603

Sorry, this entry is only available in European Spanish.

Power-Law Distributed Graph Generation With MapReduce

Fernanda López-Gallegos, Rodrigo Paredes, Renzo Angles (2021). In: Journals & Magazines. IEEE Access. Volume: 9. https://ieeexplore.ieee.org/document/9467285

Sorry, this entry is only available in European Spanish.

PolyLM: Learning about Polysemy through Language Modeling

Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer (2021). In: arXiv:2101.10448 . https://arxiv.org/abs/2101.10448

Sorry, this entry is only available in European Spanish.

Political Parties, Diminished Sub-types and Democracy

Rafael Piñeiro, Gabriel Vommaro, Juan Pablo Luna, Fernando Rosenblatt (2021). In: Political parties, diminished subtypes, and democracy. Vol 27, Issue 2, 2021. https://doi.org/10.1177/1354068820923723

Sorry, this entry is only available in European Spanish.

Peripheral elaboration model: The impact of incidental news exposure on political participation

Saif Shahin, Homero Gil de Zúñiga, Magdalena Saldaña (2021). In: Journal of Information Technology & Politics. Volume 18, 2021 - Issue 2. https://doi.org/10.1080/19331681.2020.1832012

Sorry, this entry is only available in European Spanish.

Overcoming catastrophic forgetting using sparse coding and meta learning

Julio Hurtado; Hans Lobel; Alvaro Soto (2021). In: IEEE Access ( Volume: 9). https://ieeexplore.ieee.org/document/9459700

Sorry, this entry is only available in European Spanish.

Optimizing Reusable Knowledge for Continual Learning via Metalearning

Julio Hurtado, Alain Raymond-Saez, Alvaro Soto (2021). In: arXiv:2106.05390. https://arxiv.org/abs/2106.05390

Sorry, this entry is only available in European Spanish.

On the Approximation Ratio of Ordered Parsings

Nicola Prezza, Carlos Ochoa, Gonzalo Navarro (2021). In: IEEE Transactions on Information Theory ( Volume: 67, Issue: 2, Feb. 2021). https://ieeexplore.ieee.org/document/9281349

Sorry, this entry is only available in European Spanish.

Offering an Entrepreneurship Course to All Engineering Students: Lessons Learned from ING2030 in Puc-Chile

Isabel Hilliger, Constance Fleet, Constanza Melian, Jorge Baier (2021). In: Advances in Engineering Education. https://www.researchgate.net/publication/354224445_Offering_an_Entrepreneurship_Course_to_All_Engineering_Students_Lessons_Learned_from_ING2030_in_Puc-Chile

Sorry, this entry is only available in European Spanish.

Novel loci and Mapuche genetic ancestry are associated with pubertal growth traits in Chilean boys

Tomás Norambuena, José Patricio Miranda, Ana Pereira, Veronica Mericq, Linda Ongaro, Francesco Montinaro, José L. Santos, Susana Eyheramendy, Lucas Vicuña (2021). In: Human Genetics volume 140, pages 1651–1661 (2021). https://link.springer.com/article/10.1007/s00439-021-02290-3

Sorry, this entry is only available in European Spanish.

Much Ado About Facebook? Evidence from 80 Congressional Campaigns in Chile

Cristian Pérez, Sergio Toro, Fernando Rosenblatt, Bárbara Poblete, Sebastián Valenzuela, Andrés Cruz, Naim Bro, Daniel Alcatruz, Andrea Escobar, Juan Pablo Luna (2021). In: Journal of Information Technology & Politics. Volume 19, 2022 - Issue 2. https://doi.org/10.1080/19331681.2021.1936334

Sorry, this entry is only available in European Spanish.

Misleading information in Spanish: a survey

Eliana Providel, Marcelo Mendoza (2021). In: Social Network Analysis and Mining volume 11, Article number: 36 (2021). https://link.springer.com/article/10.1007/s13278-021-00746-y

Misleading information spread on social networks is often supported by activists who promote this type of information and bots that amplify their visibility. The need for useful and timely mechanisms of credibility assessment in social media has become increasingly indispensable. Efforts to tackle this problem in Spanish are growing. The last years have witnessed many efforts to develop methods to detect fake news, rumors, stances, and bots on the Spanish social web. This work leads to a systematic review of the literature that relates the efforts to develop this area in the Spanish language. The work identifies pending tasks for this community and challenges that require coordination among the leading investigators on the subject.

Mind The Gap! The Role of Political Identity and Attitudes in the Emergence of Belief Gaps

Magdalena Saldaña, Shannon McGregor, Tom Johnson (2021). In: International Journal of Public Opinion Research, Volume 33, Issue 3, Autumn 2021, Pages 607–625, https://doi.org/10.1093/ijpor/edab006

To more fully understand the belief gap hypothesis, this study examines the effect of political identity, education, and partisan media consumption on the formation of attitudes and false beliefs. Using a two-wave, nationally representative online survey of the U.S., we assess people’s attitudes and beliefs toward climate change, on the one hand, and Syrian refugees, on the other. Building on previous studies, we demonstrate that the effect of one’s political identity on attitudes and false beliefs is contingent upon education, which appears to widen the belief gap in consort with political identity.

Merging Web Tables for Relation Extraction with Knowledge Graphs

Jhomara Luzuriaga, Emir Muñoz, Aidan Hogan, Henry Rosales-Mendez (2021). In: IEEE Transactions on Knowledge and Data Engineering. https://ieeexplore.ieee.org/document/9506867

We propose methods for extracting triples from Wikipedia’s HTML tables using a reference knowledge graph. Our methods use a distant-supervision approach to find existing triples in the knowledge graph for pairs of entities on the same row of a table, postulating the corresponding relation for pairs of entities from other rows in the corresponding columns, thus extracting novel candidate triples. Binary classifiers are applied on these candidates to detect correct triples and thus increase the precision of the output triples. We extend this approach with a preliminary step where we first group and merge similar tables, thereafter applying extraction on the larger merged tables. More specifically, we propose an observed schema for individual tables, which is used to group and merge tables. We compare the precision and number of triples extracted with and without table merging, where we show that with merging, we can extract a larger number of triples at a similar precision. Ultimately, from the tables of English Wikipedia, we extract 5.9 million novel and unique triples for Wikidata at an estimated precision of 0.718.

Measuring heterogeneous perception of urban space with massive data and machine learning: An application to safety

Tomás Ramírez, Ricardo Hurtubia, Tomás Rossetti, Hans Löbel (2021). In: Landscape and Urban Planning. Volume 208, April 2021, 104002. https://doi.org/10.1016/j.landurbplan.2020.104002

In the last decade, large street imagery data sets and machine learning developments have allowed increasing scalability of methodologies to understand the effects of landscape attributes on the way they are perceived. However, these new methodologies have not incorporated individual heterogeneity in their analysis, even though differences by gender and other sociodemographic characteristics in the perception of safety and other aspects of landscapes and public spaces have been widely studied in social sciences and urban planning in lower scale studies. In the present study, we combine computational and statistical tools to develop a methodological proposal with high scalability and low implementation cost, which helps to identify and measure heterogeneous perception and its correlation to the presence of elements in the landscape. To achieve this, we implement a survey of perception of public spaces, collecting sociodemographic information of respondents. Then, we fit a discrete choice model to quantify perceptions of these spaces using a parametrization of images that jointly considers semantic segmentation and object detection as input. Our results show heterogeneity in the perception of safety in public spaces according to gender and the observers habitual mobility choices. The model is then applied to the city of Santiago, Chile. This produces a map of safety perception for different types of users. The proposed method and the obtained results can be a relevant input for the design of public spaces and decision making in the urban planning process.

Matrix Query Languages

Floris Geerts, Thomas Muñoz, Jan Van den Bussche, Cristian Riveros, Domagoj Vrgoc (2021). In: ACM SIGMOD Record. Volume 50. Issue 3. September 2021 pp 6–19. https://doi.org/10.1145/3503780.3503782

Due to the importance of linear algebra and matrix operations in data analytics, there has been a renewed interest in developing query languages that combine both standard relational operations and linear algebra operations. We survey aspects of the matrix query language MATLANG and extensions thereof, and connect matrix query languages to classical query languages and arithmetic circuits.

Knowledge graphs

Claudio Gutierrez, Juan F. Sequeda (2021). In: Communications of the ACM. Volume 64. Issue 3. March 2021, pp 96–104. https://doi.org/10.1145/3418294

Tracking the historical events that lead to the interweaving of data and knowledge.

Indexing highly repetitive string collections, part II: Compressed indexes

Gonzalo Navarro (2021). In: ACM Computing Surveys. Volume 54. Issue 2. March 2022. Article No.: 26pp 1–32https://doi.org/10.1145/3432999

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability of handling them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures.

In this second part, we describe the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations. We conclude with the current challenges in this fascinating field.

Indexing highly repetitive string collections, part I: Repetitiveness measures

Gonzalo Navarro (2021). In: ACM Computing Surveys. Volume 54. Issue 2. March 2022. Article No.: 29pp 1–31. https://doi.org/10.1145/3434399

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures.

In this first part, we describe the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings. In the quest for an ideal measure of repetitiveness, we uncover a fascinating web of relations between those measures, as well as the limits up to which the data can be recovered, and up to which direct access to the compressed data can be provided. This is the basic aspect of indexability, which is covered in the second part of this survey.

Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction

Arun Khanchandani, Bernhard Pfahringer, Felipe Bravo-Marquez (2021). In: Cognitive Computation 14, pages 425–441 (2022). https://link.springer.com/article/10.1007/s12559-021-09831-y

A sentiment lexicon is a list of expressions annotated according to affect categories such as positive, negative, anger and fear. Lexicons are widely used in sentiment classification of tweets, especially when labeled messages are scarce. Sentiment lexicons are prone to obsolescence due to: 1) the arrival of new sentiment-conveying expressions such as #trumpwall and #PrayForParis and 2) temporal changes in sentiment patterns of words (e.g., a scandal associated with an entity). In this paper, we propose a methodology for automatically inducing continuously updated sentiment lexicons from Twitter streams by training incremental word sentiment classifiers from time-evolving distributional word vectors. We experiment with various sketching techniques for efficiently building incremental word context matrices and study how the lexicon adapts to drastic changes in the sentiment pattern. Change is simulated by randomly picking some words from a testing partition of words and swapping their context with the context of words exhibiting the opposite sentiment. Our experimental results show that our approach allows for successfully tracking of the sentiment of words over time even when drastic change is induced.

Improving Signal-Strength Aggregation for Mobile Crowdsourcing Scenarios

Javier Madariaga, Javier Bustos-Jiménez, Benjamín Bustos, Diego Madariaga (2021). In: Sensors 2021, 21(4), 1084; https://doi.org/10.3390/s21041084

Due to its huge impact on the overall quality of service (QoS) of wireless networks, both academic and industrial research have actively focused on analyzing the received signal strength in areas of particular interest. In this paper, we propose the improvement of signal-strength aggregation with a special focus on Mobile Crowdsourcing scenarios by avoiding common issues related to the mishandling of log-scaled signal values, and by the proposal of a novel aggregation method based on interpolation. Our paper presents two clear contributions. First, we discuss the misuse of log-scaled signal-strength values, which is a persistent problem within the mobile computing community. We present the physical and mathematical formalities on how signal-strength values must be handled in a scientific environment. Second, we present a solution to the difficulties of aggregating signal strength in Mobile Crowdsourcing scenarios, as a low number of measurements and nonuniformity in spatial distribution. Our proposed method obtained consistently lower Root Mean Squared Error (RMSE) values than other commonly used methods at estimating the expected value of signal strength over an area. Both contributions of this paper are important for several recent pieces of research that characterize signal strength for an area of interest.

Improved structures to solve aggregated queries for trips over public transportation networks

Nieves Brisaboa, Antonio fariña, Daniil Galavtianov, Tirso Rodeiro, M. Andrea Rodríguez (2021). In: Information Sciences Volume 584, January 2022, Pages 752-783. https://doi.org/10.1016/j.ins.2021.10.079

We address the problem of storing and analyzing large datasets of passenger trips over public transportation networks that are of interest to network administrators trying to balance transportation offers (e.g., frequency of vehicles) according to the historical demand. We exploit the fact that all passenger trips made within the same vehicle share the same trajectories to reduce their redundancy and provide a representation, based on well-known compact data structures, that not only reduces the space requirements of the original passenger’s trajectories but also efficiently supports querying. Our solution uses two complementary representations: T-Matrices which excels at querying the aggregated network load, and TTCTR which represents all passenger trips and aims at counting the trips following a given pattern (i.e., how many passengers started/ended a trip at a given location or moved from a given location to another). In addition, we propose XCTR, a variant of TTCTR, which efficiently answers a wider range of queries at the cost of a moderate performance loss for some queries and some space overhead. Overall, our representation can handle a dataset of ten million trips within approximately 65% of its original size while supporting a wide range of queries in the order of microseconds.

Guarded Ontology-Mediated Queries

Pablo Barceló, Gerald Berger, Georg Gottlob, Andreas Pieris (2021). In: Outstanding Contributions to Logic book series. Hajnal Andréka and István Németi on Unity of Science pp 27–52. https://link.springer.com/chapter/10.1007/978-3-030-64187-0_2

We concentrate on ontology-mediated queries (OMQs) expressed using guarded Datalog $^{\exists}$ and conjunctive queries. Guarded Datalog $^{\exists}$ is a rule-based knowledge representation formalism inspired by the guarded fragment of first-order logic, while conjunctive queries represent a prominent database query language that lies at the core of relational calculus (i.e., first-order queries). For such guarded OMQs we discuss three main algorithmic tasks: query evaluation, query containment, and first-order rewritability. The first one is the task of computing the answer to an OMQ over an input database. The second one is the task of checking whether the answer to an OMQ is contained in the answer of some other OMQ on every input database. The third one asks whether an OMQ can be equivalently rewritten as a first-order query. For query evaluation, we explain how classical results on the satisfiability problem for the guarded fragment of first-order logic can be applied. For query containment, we discuss how tree automata techniques can be used. Finally, for first-order rewritability, we explain how techniques based on a more sophisticated automata model, known as cost automata, can be exploited.

Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations

Iván Cantador, Andrés Carvallo, Fernando Diez, Denis Parra (2021). In: arXiv:2107.03226. https://arxiv.org/abs/2107.03226

The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, current recommendation methods based on graph embeddings have shown state-of-the-art performance. These methods commonly encode latent rating patterns and content features. Different from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Our approach has the advantage of providing explanations which leverage aspect-based opinions given by users about recommended items. Furthermore, we also provide examples of the applicability of recommendations utilizing aspect opinions as explanations in a visualization dashboard, which allows obtaining information about the most and least liked aspects of similar users obtained from the embeddings of an input graph.

Gendered bureaucracies: Women mayors and the size and composition of local governments

Carla Alberti, Diego Diaz-Rioseco, Giancarlo Visconti (2021). In: Governance. https://doi.org/10.1111/gove.12591

While women are underrepresented in politics, recent improvements in women’s representation in legislative and executive bodies have spurred academic interest in the effects of electing women on a wide array of outcomes. Effects on bureaucracies, however, have received less attention. Do women mayors reform local bureaucracies differently than their men counterparts? We take advantage of rich administrative data from Chile to explore the effects of having a woman mayor on the size and gender composition of municipal bureaucracies. Using a regression discontinuity design in close electoral races, we find that women mayors reduce the size of local bureaucracies while simultaneously increasing the share of women public employees. Our findings thus show that women mayors’ approach to bureaucratic reform once in office differs from that of their men counterparts, and contribute to existing research on the consequences of electing women.

Forecasting copper electrorefining cathode rejection by means of recurrent neural networks with attention mechanism

Pedro Pablo Correa, Aldo Cipriano, Felipe Nuñez, Juan Carlos Salas, Hans Löbel (2021). In: IEEE Access ( Volume: 9). https://ieeexplore.ieee.org/document/9410222

Electrolytic refining is the last step of pyrometallurgical copper production. Here, smelted copper is converted into high-quality cathodes through electrolysis. Cathodes that do not meet the physical quality standards are rejected and further reprocessed or sold at a minimum profit. Prediction of cathodic rejection is therefore of utmost importance to accurately forecast the electrorefining cycle economic production. Several attempts have been made to estimate this process outcomes, mostly based on physical models of the underlying electrochemical reactions. However, they do not stand the complexity of real operations. Data-driven methods, such as deep learning, allow modeling complex non-linear processes by learning representations directly from the data. We study the use of several recurrent neural network models to estimate the cathodic rejection of a cathodic cycle, using a series of operational measurements throughout the process. We provide an ARMAX model as a benchmark. Basic recurrent neural network models are analyzed first: a vanilla RNN and an LSTM model provide an initial approach. These are further composed into an Encoder-Decoder model, that uses an attention mechanism to selectively weight the input steps that provide most information upon inference. This model obtains 5.45% relative error, improving by 81.4% the proposed benchmark. Finally, we study the attention mechanism’s output to distinguish the most relevant electrorefining process steps. We identify the initial state as critical in predicting cathodic rejection. This information can be used as an input for decision support systems or control strategies to reduce cathodic rejection and improve electrolytic refining’s profitability.

Fair Top-k Ranking with multiple protected groups

Meike Zehlike, Tom Suhr, Francesco Bonchi, Carlos Castillo, Sara Hajian, Ricardo Baeza-Yates (2021). In: Elsevier Information Processing & Management. Volume 59, Issue 1, January 2022, 102707. https://doi.org/10.1016/j.ipm.2021.102707

Ranking items or people is a fundamental operation at the basis of several processes and services, not all of them happening online. Ranking is required for different tasks, including search, personalization, recommendation, and filtering. While traditionally ranking has been aimed solely at maximizing some global utility function, recently the awareness of potential discrimination for some of the elements to rank, has captured the attention of researchers, which have thus started devising ranking systems which are non-discriminatory or fair for the items being ranked. So far, researchers have mostly focused on group fairness, which is usually expressed in the form of constraints on the fraction of elements from some protected groups that should be included in the top -k $k$ positions, for any relevant k $k$ . These constraints are needed in order to correct implicit societal biases existing in the input data and reflected in the relevance or fitness score computed.

In this article, we tackle the problem of selecting a subset of $k$ individuals from a pool n >> k of $n ≫ k$ candidates, maximizing global utility (i.e., selecting the “best” candidates) while respecting given group-fairness criteria. In particular, to tackle this Fair Top – $k$ kRanking problem, we adopt a ranked group-fairness definition which extends the standard notion of group fairness based on protected groups, by ensuring that the proportion of protected candidates in every prefix of the top- $k$ ranking remains statistically above, or indistinguishable from, a given minimum threshold. Our notion of utility requires, intuitively, that every individual included in the top -k $k$ should be more qualified than every candidate not included; and that for every pair of candidates in the top -k $k$ , the more qualified candidate should be ranked above.

The main contribution of this paper is an algorithm for producing a fair top -k $k$ ranking that can be used when more than one protected group is present, which means that a statistical test based on a multinomial distribution needs to be used instead of one for a binomial distribution, as the original FA*IR algorithms does. This poses important technical challenges and increases both the space and time complexity of the re-ranking algorithm. Our experimental assessment on real-world datasets shows that our approach yields small distortions with respect to rankings that maximize utility without considering our fairness criteria.

Enhanced Word Embedding Variations for the Detection of Substance Abuse and Mental Health Issues on Social Media Writings

D Ramírez-Cifuentes, C Largeron, J Tissier, A Freire, Ricardo Baeza-Yates (2021). In: IEEE Access ( Volume: 9) . https://ieeexplore.ieee.org/document/9535518

Substance abuse and mental health issues are severe conditions that affect millions. Signs of certain conditions have been traced on social media through the analysis of posts. In this paper we analyze textual cues that characterize and differentiate Reddit posts related to depression, eating disorders, suicidal ideation, and alcoholism, along with control posts. We also generate enhanced word embeddings for binary and multi-class classification tasks dedicated to the detection of these types of posts. Our enhancement method to generate word embeddings focuses on identifying terms that are predictive for a class and aims to move their vector representations close to each other while moving them away from the vectors of terms that are predictive for other classes. Variations of the embeddings are defined and evaluated through predictive tasks, a cosine similarity-based method, and a visual approach. We generate predictive models using variations of our enhanced representations with statistical and deep learning approaches. We also propose a method that leverages the properties of the enhanced embeddings in order to build features for predictive models. Results show that variations of our enhanced representations outperform in Recall, Accuracy, and F1-Score the embeddings learned with Word2vec , DistilBERT , GloVe ’s fine-tuned pre-learned embeddings and other methods based on domain adapted embeddings. The approach presented has the potential to be used on similar binary or multi-class classification tasks that deal with small domain-specific textual corpora.

Engineering Practical Lempel-Ziv Tries

Rodrigo Cánovas, Johannes Fischer, Dominik Köppl, Marvin Löbel, Rajeev Raman, Diego Arroyuelo, Gonzalo Navarro (2021). In: ACM Journal of Experimental Algorithmics. Volume 26. December 2021. Article No.: 14pp 1–47. https://doi.org/10.1145/3481638

The Lempel-Ziv 78 (LZ78) and Lempel-Ziv-Welch (LZW) text factorizations are popular, not only for bare compression but also for building compressed data structures on top of them. Their regular factor structure makes them computable within space bounded by the compressed output size. In this article, we carry out the first thorough study of low-memory LZ78 and LZW text factorization algorithms, introducing more efficient alternatives to the classical methods, as well as new techniques that can run within less memory space than the necessary to hold the compressed file. Our results build on hash-based representations of tries that may have independent interest.

Effect of adding physical links on the robustness of the Internet modeled as a physical-logical interdependent network using simple strategies

Ivana Bachmann, Valeria Valdés, Javier Bustos-Jiménez, Benjamin Bustos (2021). In: International Journal of Critical Infrastructure Protection. Volume 36, March 2022, 100483. https://doi.org/10.1016/j.ijcip.2021.100483

In this work we model the Internet as a physical–logical interdependent network composed by the logical Internet network (Autonomous System level network), the physical Internet network (Internet backbone), and their interactions. We have tested the effect of adding physical links over the Internet’s robustness against both physical random attacks, and localized attacks. We add links using strategies that are simple enough to be used when information of the physical network is incomplete or not accurate enough to use more complex strategies. To measure the effect of adding links to the physical network our tests consider the logical network, and the set of interlinks to be constant. We tested four physical link addition strategies: random addition, distance addition, local hubs addition, and degree addition, over three different physical network models: Gabriel Graphs, $n$ -nearest neighbors, and relative neighborhood graphs, and two extreme space shapes based on the geography of real countries: a long and narrow space with a width to length ratio of (1:25), and square space with a (1:1) width to length ratio. Our results show that there are High Damage Localized Attacks (HDLA): localized attacks that cause the failure of more than half of the logical network after removing less than 9% of the physical nodes. Some HDLA can even result in total failure. We found that HDLA are caused by the failure of “bridge nodes” in the logical network. Our results show that adding links to the physical network improves the robustness against localized attacks, and physical random attacks. Adding physical links also decreases the damage caused by HDLA, but does not fully prevent them. We found that degree and random addition strategies improve the Internet’s robustness the most, while distance addition is the most cost efficient link addition strategy in terms of robustness improvement. We also found that the high robustness and low cost efficiency of random strategy is related to the length of the links added, highlighting the importance of simple features such as the length of the links added over the robustness of physical–logical interdependent networks . Our findings suggest that given cost constraints it may be better to add more physical links using distance addition than it is to add fewer physical links using degree or random link addition strategies, and that more cost efficient versions of degree strategy could be obtained by simply limiting the length of the links added by the strategy.

Domestic Isomorphic Pressures in the Design of FOI Oversight Institutions in Latin America

Rafael Piñeiro Rodríguez, Paula Muñoz, Cecilia Rossel, Fabrizio Scrollini, Rafael Piñeiro Rodríguez, Paula Muñoz, Cecilia Rossel, Fabrizio Scrollini (2020). In: Governance. https://onlinelibrary.wiley.com/doi/10.1111/gove.12614

Even though many countries in Latin America have adopted FOI Laws, there are significant differences in the institutional design of FOI oversight institutions. Most explanations highlight the role of political competition in motivating political actors to design strong de jure FOI oversight institutions. The design of FOI oversight institutions in Chile, Peru and Uruguay, however, cannot fully be explained by political competition. We show how isomorphic pressures help explain variation in the de jure strength of the FOI oversight institutions. Our findings highlight the importance of considering domestic constraints on the diffusion of one-size-fits-all models. To analyze each case, we conducted a systematic process-tracing analysis. Our in-depth analysis allowed us to assess different theories concerning the specific institutional design of FOI oversight institutions.

DockerPedia: A Knowledge Graph of Software Images and their Metadata

Maximiliano Osorio, Daniel Garijo, Idafen Santana-Perez, Carlos Buil-Aranda (2021). In: International Journal of Software Engineering and Knowledge EngineeringVol. 32, No. 01, pp. 71-89 (2022). https://doi.org/10.1142/S0218194022500036

An increasing amount of researchers use software images to capture the requirements and code dependencies needed to carry out computational experiments. Software images preserve the computational environment required to execute a scientific experiment and have become a crucial asset for reproducibility. However, software images are usually not properly documented and described, making it challenging for scientists to find, reuse and understand them. In this paper, we propose a framework for automatically describing software images in a machine-readable manner by (i) creating a vocabulary to describe software images; (ii) developing an annotation framework designed to automatically document the underlying environment of software images and (iii) creating DockerPedia, a Knowledge Graph with over 150,000 annotated software images, automatically described using our framework. We illustrate the usefulness of our approach in finding images with specific software dependencies, comparing similar software images, addressing versioning problems when running computational experiments; and flagging problems with vulnerable software dependencies.

Divisive politics and democratic dangers in Latin America

Carla Albeti, Thomas Carothers, Andreas E. Feldmann, Juan Pablo Luna, Paula Muñoz, Angelika Rettberg, Oliver Stuenkel, Guillermo Trejo (2021). In: Carnegie Endowment for International Peace. https://carnegieendowment.org/files/Carothers_Feldmann_Polarization_in_Latin_America_final1.pdf

Divisive politics have hit many Latin American countries hard in recent years, fueled by numerous underlying fissures and issues including economic inequality and exclusion, corruption, ideological differences, high levels of violence, and chronically weak state capacity. The coronavirus pandemic has only intensified these pressures. Latin America thus enters 2021 shadowed by an ominous sense that democracy is under extraordinary strain.

To help shine a light on these troubled waters and chart the risks ahead, this collection of essays by a notable set of regional experts examines recent developments in six key countries: Bolivia, Brazil, Chile, Colombia, Mexico, and Peru. Taken together, the different country accounts present a sobering picture, though not an unrelievedly negative one. Divisions are deep, economic troubles are widespread, and the pandemic continues to devastate the lives of countless people in the region. The risks for democracy are serious, ranging from the rupture of basic democratic structures to the potential emergence of new illiberal political figures and forces. Remedial steps are possible, but they will be challenging to carry out. The collection seeks to help engaged actors and observers throughout the region and beyond better understand the troubling dynamics of rising political division and formulate effective responses.

Detecting Anomalies at a TLD Name Server Based on DNS Traffic Predictions

Diego Madariaga; Javier Madariaga; Martín Panza; Javier Bustos-Jiménez; Benjamin Bustos (2021). In: IEEE Transactions on Network and Service Management ( Volume: 18, Issue: 1, March 2021). https://ieeexplore.ieee.org/document/9320589

The Domain Name System (DNS) is a critical component of Internet infrastructure, as almost every activity on the Internet starts with a DNS query. Given its importance, there is increasing concern over its vulnerability to attacks and failures, as they can negatively affect all Internet-based resources. Thus, detecting these events is crucial to preserve the correct functioning of all DNS components, such as high-volume name servers for top-level domains (TLD). This article presents a near real-time Anomaly Detection Based on Prediction (AD-BoP) method, providing a useful and easily explainable methodology to effectively detect DNS anomalies. AD-BoP is based on the prediction of expected DNS traffic statistics, and could be especially helpful for TLD registry operators to preserve their services’ reliability. After an exhaustive analysis, AD-BoP is shown to improve the current state-of-the-art for anomaly detection in authoritative TLD name servers.

Delayed and Approved: A Quantitative Study of Conflicts and the Environmental Impact Assessments of Energy Projects in Chile 2012–2017

Sebastián Huneeus, Andrés Cruz, Daniel Alcatruz, Bryan Castillo, Camilo Betranou, Javier Cisterna, Sergio Toro, Juan Pablo Luna, Diego Sazo (2021). In: Sustainability 2021, 13(13), 6986; https://doi.org/10.3390/su13136986

The Sistema de Evaluación de Impacto Ambiental (Environmental Impact Assessment System—SEIA) evaluates all projects potentially harmful to human health and the environment in Chile. Since its establishment, many projects approved by the SEIA have been contested by organized communities, especially in the energy sector. The question guiding our research is whether socio-environmental conflicts affect the evaluation times and the approval rates of projects under assessment. Using a novel database comprising all energy projects assessed by the SEIA, we analyzed 380 energy projects that entered the SEIA review process between 2012 and 2017 and matched these projects with protest events. Using linear and logit regression, we find no association between the occurrence of protests aimed at specific projects and the probability of project approval. We do, however, find that projects associated with the occurrence of protest events experience significantly longer review times. To assess the robustness of this finding, we compare two run-of-river plants proposed in Mapuche territory in Chile’s La Araucanía region. We discuss the broader implications of these findings for sustainable environmental decision making.

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Cristóbal Eyzaguirre, Felipe del Río, Vladimir Araujo, Álvaro Soto (2021). In: arXiv:2109.11745. https://arxiv.org/abs/2109.11745

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT’s regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.

Creativity in Generative Musical Networks: Evidence From Two Case Studies

Rodrigo Cádiz, Agustín Macaya, Manuel Cartagena, Denis Parra (2021). In: Frontiers in Robotics and AI. https://doi.org/10.3389/frobt.2021.680586

Deep learning, one of the fastest-growing branches of artificial intelligence, has become one of the most relevant research and development areas of the last years, especially since 2012, when a neural network surpassed the most advanced image classification techniques of the time. This spectacular development has not been alien to the world of the arts, as recent advances in generative networks have made possible the artificial creation of high-quality content such as images, movies or music. We believe that these novel generative models propose a great challenge to our current understanding of computational creativity. If a robot can now create music that an expert cannot distinguish from music composed by a human, or create novel musical entities that were not known at training time, or exhibit conceptual leaps, does it mean that the machine is then creative? We believe that the emergence of these generative models clearly signals that much more research needs to be done in this area. We would like to contribute to this debate with two case studies of our own: TimbreNet, a variational auto-encoder network trained to generate audio-based musical chords, and StyleGAN Pianorolls, a generative adversarial network capable of creating short musical excerpts, despite the fact that it was trained with images and not musical data. We discuss and assess these generative models in terms of their creativity and we show that they are in practice capable of learning musical concepts that are not obvious based on the training data, and we hypothesize that these deep models, based on our current understanding of creativity in robots and machines, can be considered, in fact, creative.

Correction to: Towards intellectual freedom in an AI Ethics Global Community

Christoph Ebell, Richard Benjamins, Hengjin Cai, Mark Coeckelbergh, Tania Duarte, Merve Hickok, Aurelie Jacquet, Angela Kim, Joris Krijger, John MacIntyre, Piyush Madhamshettiwar, Lauren Mafeo, Jeanna Matthews, Larry Medsker, Peter Smith & Savannah Thais, Ricardo Baeza-Yates (2021). In: AI and Ethics volume 1, pages 131–138 (2021). https://link.springer.com/article/10.1007/s43681-021-00052-5

Sorry, this entry is only available in European Spanish.

Correction to: Supporting the classifcation of patients in public hospitals in Chile by designing, deploying and validating a system based on natural language processing

Fabián Villena, Jorge Pérez, René Lagos, Jocelyn Dunstan (2021). In: BMC Medical Informatics and Decision Making 21, 220 (2021). https://doi.org/10.1186/s12911-021-01587-7

Sorry, this entry is only available in European Spanish.

Content-Based Medical Image Retrieval and Intelligent Interactive Visual Browser for Medical Education, Research and Care

Camilo Sotomayor, Víctor Castañeda, Humberto Farías, Gabriel Molina, Gonzalo Pereira, Steffen Hartel, Mauricio Solar, Mauricio Araya, Marcelo Mendoza (2021). In: Diagnostics. Volume 11. Issue 8. https://doi.org/10.3390/diagnostics11081470

Sorry, this entry is only available in European Spanish.

Computing the depth distribution of a set of boxes

Jérémy Barbaya, Pablo Pérez-Lantero, Javiel Rojas-Ledesma (2021). In: Theoretical Computer Science. Volume 883, 3 September 2021, Pages 69-82. https://doi.org/10.1016/j.tcs.2021.06.007

Sorry, this entry is only available in European Spanish.

Competing Frames and Melodrama: The Effects of Facebook Posts on Policy Preferences about COVID-19

Ingrid Bachmann, Constanza Mujica, Daniela Grassau, Claudia Labarca, Daniel Halpern, Soledad Puente, Sebastián Valenzuela (2021). In: Digital Journalism. Volume 9, 2021 - Issue 9. https://doi.org/10.1080/21670811.2021.1943479

The tension between health and economic considerations regarding COVID-19 has resulted in a framing contest, in which proponents and adversaries of strong containment measures hold oppositional frames about the pandemic. This study examines the effects of competing news frames on social media users’ policy preferences and the moderation of framing effects played by melodramatic news treatment. Results from a pre-registered online survey experiment in Chile (N = 518) show that participants exposed to Facebook posts with an economic frame were significantly less supportive of measures that restrict mobility (e.g., quarantines) than participants in the control group. Contrary to expectations, exposure to a public health frame also reduced support for stay-at-home orders, and the presence of melodramatic features had no significant impact on users’ preferences. Other variables, however, did alter these framing effects, such as fear of COVID-19 and frequency of social media news use. These findings paint a rather complex picture of framing effects during the pandemic in a digital media environment.

Compact structure for sparse undirected graphs based on a clique graph partition

Felipe Glaria, Cecilia Hernandez, Susana Ladra, Lilian Salinas, Gonzalo Navarro (2021). In: Information Sciences. Volume 544, 12 January 2021, Pages 485-499. https://doi.org/10.1016/j.ins.2020.09.010

Compressing real-world graphs has many benefits such as improving or enabling the visualization in small memory devices, graph query processing, community search, and mining algorithms. This work proposes a novel compact representation for real sparse and clustered undirected graphs. The approach lists all the maximal cliques by using a fast algorithm and defines a clique graph based on its maximal cliques. Further, the method defines a fast and effective heuristic for finding a clique graph partition that avoids the construction of the clique graph. Finally, this partition is used to define a compact representation of the input graph. The experimental evaluation shows that this approach is competitive with the state-of-the-art methods in terms of compression efficiency and access times for neighbor queries, and that it recovers all the maximal cliques faster than using the original graph. Moreover, the approach makes it possible to query maximal cliques, which is useful for community detection.

Clinically Correct Report Generation from Chest X-Rays Using Templates

Pablo Pino, Cecilia Besa, Claudio Lagos, Denis Parra (2021). In: International Workshop on Machine Learning in Medical Imaging, pp 654–663. https://link.springer.com/chapter/10.1007/978-3-030-87589-3_67

We address the task of automatically generating a medical report from chest X-rays. Many authors have proposed deep learning models to solve this task, but they focus mainly on improving NLP metrics, such as BLEU and CIDEr, which are not suitable to measure clinical correctness in clinical reports. In this work, we propose CNN-TRG, a Template-based Report Generation model that detects a set of abnormalities and verbalizes them via fixed sentences, which is much simpler than other state-of-the-art NLG methods and achieves better results in medical correctness metrics.

We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines as well as deep learning-based models, by employing the Chexpert labeler and MIRQI as clinical correctness evaluations, and NLP metrics as secondary evaluation. We also provide further evidence indicating that traditional NLP metrics are not suitable for this task by presenting their lack of robustness in multiple cases. We show that slightly altering a template-based model can increase NLP metrics considerably while maintaining high clinical performance. Our work contributes by a simple but effective approach for chest X-ray report generation, as well as by supporting a model evaluation focused primarily on clinical correctness metrics and secondarily on NLP metrics.

Chillin’ Effects of Fake News: Changes in Practices Related to Accountability and Transparency in American Newsrooms Under the Influence of Misinformation and Accusations Against the News Media

Hong Tien Vu, Magdalena Saldaña (2021). In: Journalism & Mass Communication Quarterly. Vol 98, Issue 3, 2021. https://doi.org/10.1177/1077699020984781

This study examines how newsroom work in the United States has changed in response to some of the latest developments in the news media environment. Using nationally representative survey data, we explore what professional routines American journalists have adopted to avoid spreading or being accused of publishing misinformation. Findings suggest that journalists have added new or intensified practices to increase accountability and transparency. In addition, role conceptions, perception of fake news, and responsibility for social media audiences impact the adoption of such practices. Journalists are more likely to embrace transparency than accountability, suggesting the emergence of new journalistic norms in today’s newsrooms.

Characterization of Anorexia Nervosa on Social Media: Textual, Visual, Relational, Behavioral, and Demographical Analysis

Diana Ramírez-Cifuentes, Ana Freire, Nadia Sanz Lamora, Aida Álvarez, Alexandre González-Rodríguez, Meritxell Lozano Rochel, Roger Llobet Vives, Diego Alejandro Velazquez, Josep Maria Gonfaus, Jordi Gonzàlez, Ricardo Baeza-Yates (2021). In: Journal of Medical Internet Research. https://www.jmir.org/2021/7/e25925/

Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia nervosa (AN) is defined as the belief of being overweight despite being dangerously underweight. The psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can manifest on social media, wherein both harmful and beneficial content is shared daily.

Building a Party with Activists: The Case of the Uruguayan FA

Verónica Pérez, Rafael Piñeiro, Fernando Rosenblatt (2021). In: John Hopkins Stavros Niarchos Foundation. SNF Agora Institute. https://www.researchgate.net/publication/336894919_How_Party_Activism_Survives_Uruguay%27s_Frente_Amplio

Political parties with activists are in decline due to various external shocks. Societal changes, like the emergence of new technologies of communication have diminished the role and number of activists, while party elites increasingly can make do without grassroots activists. However, recent scholarship concerning different democracies has shown how activism still matters for representation. This book contributes to this literature by analyzing the unique case of the Uruguayan Frente Amplio (FA), the only mass-organic, institutionalized leftist party in Latin America. Using thick description, systematic process tracing, and survey research, this case study highlights the value of an organization-centered approach for understanding parties’ role in democracy. Within the FA, organizational rules grant activists a significant voice, which imbues activists’ participation with a strong sense of efficacy. This book is an excellent resource for scholars and students of Latin America and comparative politics who are interested in political parties and the challenges confronting new democracies.

Attention is Turing Complete

Jorge Pérez, Pablo Barceló, Javier Marinkovic (2021). In: Journal of Machine Learning Research . https://jmlr.org/papers/v22/20-302.html

Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computational properties of such networks have not yet been fully explored.We study the computational power of the Transformer, one of the most paradigmatic architectures exemplifying self-attention. We show that the Transformer with hard-attention is Turing complete exclusively based on their capacity to compute and access internal dense representations of the data.Our study also reveals some minimal sets of elements needed to obtain this completeness result.

Assessing the Risk of Democratic Reversal in the United States: A Reply to Kurt Weyland

Juan Pablo Luna, Matías López (2021). In: PS: Political Science & Politics , Volume 54 , Issue 3 , July 2021 , pp. 421 - 426. https://doi.org/10.1017/S1049096521000329

By replying to Kurt Weyland’s (2020) comparative study of populism, we revisit optimistic perspectives on the health of American democracy in light of existing evidence. Relying on a set-theoretical approach, Weyland concludes that populists succeed in subverting democracy only when institutional weakness and conjunctural misfortune are observed jointly in a polity, thereby conferring on the United States immunity to democratic reversal. We challenge this conclusion on two grounds. First, we argue that the focus on institutional dynamics neglects the impact of the structural conditions in which institutions are embedded, such as inequality, racial cleavages, and changing political attitudes among the public. Second, we claim that endogeneity, coding errors, and the (mis)use of Boolean algebra raise questions about the accuracy of the analysis and its conclusions. Although we are skeptical of crisp-set Qualitative Comparative Analysis as an adequate modeling choice, we replicate the original analysis and find that the paths toward democratic backsliding and continuity are both potentially compatible with the United States.

Anonymity and Asynchronicity as Key Design Dimensions for the Reciprocity of Online Democratic Deliberation

Leandro De Brasi, Claudio Gutierrez (2021). In: International Journal of Applied Philosophy. Volume 34, Issue 2, Fall 2020. https://doi.org/10.5840/ijap2021322143

The aim of this paper is to identify, given certain democratic normative standards regarding deliberation, some pros as well as cons of possible online deliberation designs due to variations in two key design dimensions: namely, asynchronicity and anonymity. In particular, we consider one crucial aspect of deliberative argumentation: namely, its reciprocity, which puts interaction centre stage to capture the back-and-forth of reasons. More precisely, we focus on two essential features of the deliberative interaction: namely, its listening widely and listening carefully. We conclude that one sort of online deliberation that combines the two design features of anonymity and asynchronicity is likely to better promote the reciprocity required for democratic deliberation than both natural and designed offline deliberations (such as the designed deliberation in Deliberative Polling) and online simulations of them.

An index for moving objects with constant-time access to their compressed trajectories

Nieves Brisaboa, Adrián Gómez-Brandón, Travis Gagie, José Paramá, Gonzalo Navarro (2021). In: International Journal of Geographical Information Science. Volume 35, 2021 - Issue 7. https://doi.org/10.1080/13658816.2020.1833015

As the number of vehicles and devices equipped with GPS technology has grown explosively, an urgent need has arisen for time- and space-efficient data structures to represent their trajectories. The most commonly desired queries are the following: queries about an object’s trajectory, range queries, and nearest neighbor queries. In this paper, we consider that the objects can move freely and we present a new compressed data structure for storing their trajectories, based on a combination of logs and snapshots, with the logs storing sequences of the objects’ relative movements and the snapshots storing their absolute positions sampled at regular time intervals. We call our data structure ContaCT because it provides Constant- time access to Compressed Trajectories. Its logs are based on a compact partial-sums data structure that returns cumulative displacement in constant time, and allows us to compute in constant time any object’s position at any instant, enabling a speedup when processing several other queries. We have compared ContaCT experimentally with another compact data structure for trajectories, called GraCT, and with a classic spatio-temporal index, the MVR-tree. Our results show that ContaCT outperforms the MVR-tree by orders of magnitude in space and also outperforms the compressed representation in time performance.

An Extended Account of Trace-Relating Compiler Correctness and Secure Compilation

Carmine Abate, Roberto Blanco, Ştefan Ciobâcă, Adrien Durier, Deepak Garg, Cătălin Hriţcu, Marco Patrignani, Éric Tanter, Jérémy Thibault (2021). In: ACM Transactions on Programming Languages and SystemsVolume 43. Issue 4. December 2021. Article No.: 14pp 1–48. https://doi.org/10.1145/3460860

Compiler correctness, in its simplest form, is defined as the inclusion of the set of traces of the compiled program in the set of traces of the original program. This is equivalent to the preservation of all trace properties. Here, traces collect, for instance, the externally observable events of each execution. However, this definition requires the set of traces of the source and target languages to be the same, which is not the case when the languages are far apart or when observations are fine-grained. To overcome this issue, we study a generalized compiler correctness definition, which uses source and target traces drawn from potentially different sets and connected by an arbitrary relation. We set out to understand what guarantees this generalized compiler correctness definition gives us when instantiated with a non-trivial relation on traces. When this trace relation is not equality, it is no longer possible to preserve the trace properties of the source program unchanged. Instead, we provide a generic characterization of the target trace property ensured by correctly compiling a program that satisfies a given source property, and dually, of the source trace property one is required to show to obtain a certain target property for the compiled code. We show that this view on compiler correctness can naturally account for undefined behavior, resource exhaustion, different source and target values, side channels, and various abstraction mismatches. Finally, we show that the same generalization also applies to many definitions of secure compilation, which characterize the protection of a compiled program linked against adversarial code.

AHMoSe: A knowledge-based visual support system for selecting regression machine learning models

Diego Rojo, Nyi Nyi Htun, Robin De Croon, Katrien Verbet, Denis Parra (2021). In: Computers and Electronics in Agriculture Volume 187, August 2021, 106183. https://doi.org/10.1016/j.compag.2021.106183

Decision support systems have become increasingly popular in the domain of agriculture. With the development of automated machine learning, agricultural experts are now able to train, evaluate and make predictions using cutting edge machine learning (ML) models without the need for much ML knowledge. Although this automated approach has led to successful results in many scenarios, in certain cases (e.g., when few labeled datasets are available) choosing among different models with similar performance metrics is a difficult task. Furthermore, these systems do not commonly allow users to incorporate their domain knowledge that could facilitate the task of model selection, and to gain insight into the prediction system for eventual decision making. To address these issues, in this paper we present AHMoSe, a visual support system that allows domain experts to better understand, diagnose and compare different regression models, primarily by enriching model-agnostic explanations with domain knowledge. To validate AHMoSe, we describe a use case scenario in the viticulture domain, grape quality prediction, where the system enables users to diagnose and select prediction models that perform better. We also discuss feedback concerning the design of the tool from both ML and viticulture experts.

Agents of Representation: The Organic Connection between Society and Leftist Parties in Bolivia and Uruguay

Santiago Anria, Verónica Pérez, Rafael Piñeiro, Fernando Rosenblatt (2021). In: Politics & Society. https://journals.sagepub.com/doi/10.1177/00323292211042442

Sorry, this entry is only available in European Spanish.

Adaptation to Extreme Environments in an Admixed Human Population from the Atacama Desert

Lucas Vicuña, Mario I Fernandez, Cecilia Vial, Patricio Valdebenito, Eduardo Chaparro, Karena Espinoza, Annemarie Ziegler, Alberto Bustamante, Susana Eyheramendy (2021). In: Genome Biology and Evolution, Volume 11, Issue 9, September 2019, Pages 2468–2479, https://doi.org/10.1093/gbe/evz172

Sorry, this entry is only available in European Spanish.

A MOOC‐based flipped experience: Scaffolding SRL strategies improves learners’ time management and engagement

Mar Pérez-Sanagustín, Diego Sapunar-Opazo, Ronald Pérez-Álvarez, Isabel Hilliger, Anis Bey, Jorge Maldonado-Mahauad, Jorge Baier (2021). In: Computer Applications in Engineering Education. Volume29, Issue 4. Special Issue: Distance learning, MOOCs and globalisation of engineering education. https://onlinelibrary.wiley.com/doi/abs/10.1002/cae.22337

Sorry, this entry is only available in European Spanish.

A galaxy of apps: Mobile app reliance and the indirect influence on political participation through political discussion and trust

Thomas Johnson, Barb Kaye, Magdalena Saldaña (2021). In: Mobile Media & Communication. Vol 10, Issue 1, 2022. https://journals.sagepub.com/doi/10.1177/20501579211012430

Sorry, this entry is only available in European Spanish.

A Downward Spiral? A Panel Study of Misinformation and Media Trust in Chile

Daniel Halpern, Felipe Araneda, Sebastián Valenzuela (2021). In: International Journal of Press/Politics. Vol 27, Issue 2, 2022. https://journals.sagepub.com/doi/abs/10.1177/19401612211025238?journalCode=hijb

Sorry, this entry is only available in European Spanish.

A Compact Answer Set Programming Encoding of Multi-Agent Pathfinding

Rodrigo Gómez, Carlos Hernández, Jorge Baier (2021). In: IEEE Access ( Volume: 9). https://ieeexplore.ieee.org/document/9333548

Sorry, this entry is only available in European Spanish.

A Benchmark Dataset for Repetitive Pattern Recognition on Textured 3D Surfaces

Stefan Lengauer, Ivan Sipiran, Reinhold Preiner, Tobias Schreck, Benjamin Bustos (2021). In: Computer Graphics Forum. Volume 40, Issue 5. https://onlinelibrary.wiley.com/doi/10.1111/cgf.14352

In digital archaeology, a large research area is concerned with the computer-aided analysis of 3D captured ancient pottery objects. A key aspect thereby is the analysis of motifs and patterns that were painted on these objects’ surfaces. In particular, the automatic identification and segmentation of repetitive patterns is an important task serving different applications such as documentation, analysis and retrieval. Such patterns typically contain distinctive geometric features and often appear in repetitive ornaments or friezes, thus exhibiting a significant amount of symmetry and structure. At the same time, they can occur at varying sizes, orientations and irregular placements, posing a particular challenge for the detection of similarities. A key prerequisite to develop and evaluate new detection approaches for such repetitive patterns is the availability of an expressive dataset of 3D models, defining ground truth sets of similar patterns occurring on their surfaces. Unfortunately, such a dataset has not been available so far for this particular problem. We present an annotated dataset of 82 different 3D models of painted ancient Peruvian vessels, exhibiting different levels of repetitiveness in their surface patterns. To serve the evaluation of detection techniques of similar patterns, our dataset was labeled by archaeologists who identified clearly definable pattern classes. Those given, we manually annotated their respective occurrences on the mesh surfaces. Along with the data, we introduce an evaluation benchmark that can rank different recognition techniques for repetitive patterns based on the mean average precision of correctly segmented 3D mesh faces. An evaluation of different incremental sampling-based detection approaches, as well as a domain specific technique, demonstrates the applicability of our benchmark. With this benchmark we especially want to address the geometry processing community, and expect it will induce novel approaches for pattern analysis based on geometric reasoning like 2D shape and symmetry analysis. This can enable novel research approaches in the Digital Humanities and related fields, based on digitized 3D Cultural Heritage artifacts. Alongside the source code for our evaluation scripts we provide our annotation tools for the public to extend the benchmark and further increase its variety.

#NFA Admits an FPRAS: Efficient Enumeration, Counting, and Uniform Generation for Logspace Classes

Marcelo Arenas, Luis Croquevielle, Rajesh Jayaram, Cristian Riveros (2021). In: Journal of the ACM. Volume 68. Issue 6. December 2021 Article No.: 48pp 1–40. https://doi.org/10.1145/3477045

In this work, we study two simple yet general complexity classes, based on logspace Turing machines, that provide a unifying framework for efficient query evaluation in areas such as information extraction and graph databases, among others. We investigate the complexity of three fundamental algorithmic problems for these classes: enumeration, counting, and uniform generation of solutions, and show that they have several desirable properties in this respect.

Both complexity classes are defined in terms of non-deterministic logspace transducers (NL-transducers). For the first class, we consider the case of unambiguous NL-transducers, and we prove constant delay enumeration and both counting and uniform generation of solutions in polynomial time. For the second class, we consider unrestricted NL-transducers, and we obtain polynomial delay enumeration, approximate counting in polynomial time, and polynomial-time randomized algorithms for uniform generation. More specifically, we show that each problem in this second class admits a fully polynomial-time randomized approximation scheme (FPRAS) and a polynomial-time Las Vegas algorithm (with preprocessing) for uniform generation. Remarkably, the key idea to prove these results is to show that the fundamental problem # NFA admits an FPRAS, where # NFA is the problem of counting the number of strings of length n (given in unary) accepted by a non-deterministic finite automaton (NFA). While this problem is known to be P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem and obtain as a welcome corollary that every function in SpanL admits an FPRAS.

Bolivia’s Old and New Illnesses” en Divisive politics and democratic dangers in Latin America

Carla Alberti (2021). In: Carnegie Endowment for International Peace. https://carnegieendowment.org/2021/02/17/bolivia-s-old-and-new-illnesses-pub-83782

Bolivia’s long-standing sociopolitical divisions were inflamed by a disputed election, as well as by the coronavirus pandemic and its economic fallout. Now, the new government must restore citizens’ trust or risk continued unrest.

Work in Progress: A Cross-sectional Survey Study for Understanding and Addressing the Needs of Engineering Students During COVID-19

Isabel Hilliger, Constanza Melian, Javiera Meza, Gonzalo Cortés, Jorge Baier (2021). In: 2021 ASEE Virtual Annual Conference Content Access. https://strategy.asee.org/work-in-progress-a-cross-sectional-survey-study-for-understanding-and-addressing-the-needs-of-engineering-students-during-covid-19

In order to reduce the spread of COVID-19, many universities and colleges have closed their campus and implemented what researchers call ‘emergency online education’. This means that many faculty members are teaching in front of computer screens while students are staying at home and taking their courses remotely. Unfortunately, this leaves students without some advantages of residential education  such as study spaces, face-to-face counseling, and recreational facilities. In the case of engineering students, this has also left them without access to maker spaces, laboratories, and field trips (among other activities that enrich their learning experience).

For understanding how the consequences of this pandemic have affected students’ well-being, some researchers have implemented cross-sectional survey studies. These types of studies frequently used to measure stakeholders’ needs of support services as they relate to courses, programs or involvement in institutional planning. So far, there is a growing body of knowledge regarding factors that have affected students’ mental health, along with scales to measure students’ anxiety levels. However, the pandemic has come with confusing and changing information, making it more difficult for educational institutions to implement timely support strategies to maintaining some sense of well-being among their students. Given the close relationship between student well-being and learning outcomes, more studies are needed to not only understand factors that might negatively affect students’ learning experiences, but also examine interventions that might positively impact students’ resilience.

This paper presents a Work-In-Progress (WIP) that was carried out in a large engineering school in Latin America. As many schools in many countries, this school shifted to online education during 2020. In order to monitor students’ needs in this remote learning context, a cross-sectional survey study was conducted to evaluate their use of different types of support interventions that have been implemented since the pandemic started. Specifically, this paper presents the perceived benefits of having implemented a mid-semester break of one week to reduce stress during the first academic period. During the week after the break, we applied an online anonymous survey to a convenience sample of 994 engineering students from different admission cohorts and majors. Findings not only reveal how many hours students declare that they spent studying, resting, and doing recreational activities during that break, but also the percentage of students that perceived that this break was beneficial to their overall well-being. Future work will focus on assessing other type of support interventions that were implemented throughout that year, besides providing recommendations to monitor and support engineering students in different educational settings.

When is approximate counting for conjunctive queries tractable?

Marcelo Arenas, Luis Croquevielle, Rajesh Jayaram, Cristian Riveros (2021). In: STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. June 2021. Pages 1015–1027. https://doi.org/10.1145/3406325.3451014

Conjunctive queries are one of the most common class of queries used in database systems, and the best studied in the literature. A seminal result of Grohe, Schwentick, and Segoufin (STOC 2001) demonstrates that for every class G of graphs, the evaluation of all conjunctive queries whose underlying graph is in G is tractable if, and only if, G has bounded treewidth. In this work, we extend this characterization to the counting problem for conjunctive queries. Specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first fully polynomial-time randomized approximation scheme (FPRAS) for counting answers to a query in C, and the first polynomial-time algorithm for sampling answers uniformly from a query in C. As a corollary, it follows that for every class G of graphs, the counting problem for conjunctive queries whose underlying graph is in G admits an FPRAS if, and only if, G has bounded treewidth (unless BPP is different from P). In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries.

The key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we demonstrate the first FPRAS and polynomial time sampler for the set of trees of size n accepted by a tree automaton, which improves the prior quasi-polynomial time randomized approximation scheme (QPRAS) and sampling algorithm of Gore, Jerrum, Kannan, Sweedyk, and Mahaney ’97. We demonstrate how this algorithm can be used to obtain an FPRAS for many open problems, such as counting solutions to constraint satisfaction problems (CSP) with bounded hypertree width, counting the number of error threads in programs with nested call subroutines, and counting valid assignments to structured DNNF circuits.

Visual-Syntactic Embedding for Video Captioning

Jesus Perez-Martin; Benjamin Bustos; Jorge Pérez (2021). In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). https://ieeexplore.ieee.org/document/9423097

Video captioning is the task of predicting a semantic and syntactically correct sequence of words given some context video. The most successful methods for video captioning have a strong dependency on the effectiveness of semantic representations learned from visual models, but often produce syntactically incorrect sentences which harms their performance on standard datasets. In this paper, we address this limitation by considering syntactic representation learning as an essential component of video captioning. We construct a visual-syntactic embedding by mapping into a common vector space a visual representation, that depends only on the video, with a syntactic representation that depends only on Part-of-Speech (POS) tagging structures of the video description. We integrate this joint representation into an encoder-decoder architecture that we call Visual-Semantic-Syntactic Aligned Network (SemSynAN), which guides the decoder (text generation stage) by aligning temporal compositions of visual, semantic, and syntactic representations. We tested our proposed architecture obtaining state-of-the-art results on two widely used video captioning datasets: the Microsoft Video Description (MSVD) dataset and the Microsoft Research Video-to-Text (MSR-VTT) dataset.

The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits

Marcelo Arenas , Pablo Barceló, Leopoldo Bertossi, Mikaël Monet (2021). In: AAAI 2021 - 35th Conference on Artificial Intelligence, Feb 2021, Virtual, France. https://hal.inria.fr/hal-03147623

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAPscore, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

Second-Order Specifications and Quantifier Elimination for Consistent Query Answering in Databases

Leopoldo Bertossi (2021). In: To appear in CEUR Proc. SOQE WS colocated with KR21. https://arxiv.org/abs/2108.08423

Consistent answers to a query from a possibly inconsistent database are answers that are simultaneously retrieved from every possible repair of the database. Repairs are consistent instances that minimally differ from the original inconsistent instance. It has been shown before that database repairs can be specified as the stable models of a disjunctive logic program. In this paper we show how to use the repair programs to transform the problem of consistent query answering into a problem of reasoning w.r.t. a theory written in second-order predicate logic. It also investigated how a first-order theory can be obtained instead by applying second-order quantifier elimination techniques.

Reasoning about Counterfactuals and Explanations: Problems, Results and Directions

Leopoldo Bertossi (2021). In: In Informal Proceedings of 2nd Workshop on Explainable Logic-Based Artificial Intelligence (XLoKR'21). https://arxiv.org/abs/2108.11004

There are some recent approaches and results about the use of answer-set programming for specifying counterfactual interventions on entities under classification, and reasoning about them. These approaches are flexible and modular in that they allow the seamless addition of domain knowledge. Reasoning is enabled by query answering from the answer-set program. The programs can be used to specify and compute responsibility-based numerical scores as attributive explanations for classification results.

Ranked Enumeration of MSO Logic on Words

Pierre Bourhis, Alejandro Grez, Louis Jachiet, Cristian Riveros (2021). In: ICDT 2021: 24th International Conference on Database Theory. https://arxiv.org/abs/2010.08042

Sorry, this entry is only available in European Spanish.

Querying in the Age of Graph Databases and Knowledge Graphs

Marcelo Arenas, Claudio Gutiérrez, Juan Sequeda (2021). In: SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data. June 2021. Pages 2821–2828. https://doi.org/10.1145/3448016.3457545

raphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs

Proceedings of the 11th International Temporal Web Analytics Workshop

Dirk Ahlers, Erik Wilde, Marc Spaniol, Ricardo Baeza-Yates, Omar Alonso (2021). In: ACM SIGIR Forum. Volume 55. Issue 2. December 2021 Article No.: 6pp 1–7. https://doi.org/10.1145/3527546.3527555

LocWeb and TempWeb 2021 were the eleventh events in their workshop series and took place co-located on 12^th April 2021 in conjunction with The Web Conference WWW 2021. They were intended to be held in Ljubljana, Slovenia as a potentially hybrid event, but due to the pandemic, were fully moved online.

LocWeb and TempWeb were held as one colocated session with a merged programme and shared topics to explore similarities and introduce attendees to the two related and complementary areas. LocWeb 2021 explored the intersection of location-based analytics and Web architecture with a focus on on Web-scale services and location-aware information access. TempWeb 2021 discussed temporal analytics at a Web scale with experts from science and industry.

PG-Keys: Keys for Property Graphs

Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savković, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk (2021). In: SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data. June 2021. Pages 2423–2436. https://doi.org/10.1145/3448016.3457561

We report on a community effort between industry and academia to shape the future of property graph constraints. The standardization for a property graph query language is currently underway through the ISO Graph Query Language (GQL) project. Our position is that this project should pay close attention to schemas and constraints, and should focus next on key constraints. The main purposes of keys are enforcing data integrity and allowing the referencing and identifying of objects. Motivated by use cases from our industry partners, we argue that key constraints should be able to have different modes, which are combinations of basic restriction that require the key to be exclusive, mandatory, and singleton. Moreover, keys should be applicable to nodes, edges, and properties since these all can represent valid real-life entities. Our result is PG-Keys, a flexible and powerful framework for defining key constraints, which fulfills the above goals. PG-Keys is a design by the Linked Data Benchmark Council’s Property Graph Schema Working Group, consisting of members from industry, academia, and ISO GQL standards group, intending to bring the best of all worlds to property graph practitioners. PG-Keys aims to guide the evolution of the standardization efforts towards making systems more useful, powerful, and expressive.

PePa Ping Dataset: Comprehensive Contextualization of Periodic Passive Ping in Wireless Networks

Diego Madariaga, Lucas Torrealba, Javier Madariaga, Javier Bustos-Jiménez, Benjamin Bustos (2021). In: MMSys '21: Proceedings of the 12th ACM Multimedia Systems Conference. June 2021. Pages 274–280. https://doi.org/10.1145/3458305.3478456

Among all Internet Quality of Service (QoS) indicators, Round-trip time (RTT), jitter and packet loss have been thoroughly studied due to their great impact on the overall network’s performance and the Quality of Experience (QoE) perceived by the users. Considering that, we managed to generate a real-world dataset with a comprehensive contextualization of these important quality indicators by passively monitoring the network in user-space. To generate this dataset, we first developed a novel Periodic Passive Ping (PePa Ping) methodology for Android devices. Contrary to other works, PePa Ping periodically obtains RTT, jitter, and number of lost packets of all TCP connections. This passive approach relies on the implementation of a local VPN server residing inside the client device to manage all Internet traffic and obtain QoS information of the connections established. The collected QoS indicators are provided directly by the Linux kernel, and therefore, they are exceptionally close to real QoS values experienced by users’ devices. Additionally, the PePa Ping application continuously measured other indicators related to each individual network flow, the state of the device, and the state of the Internet connection (either WiFi or Mobile). With all the collected information, each network flow can be precisely linked to a set of environmental data that provides a comprehensive contextualization of each individual connection.

MillenniumDB: A Persistent, Open-Source, Graph Database

Domagoj Vrgoc, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero (2021). In: arXiv:2111.01540. https://arxiv.org/abs/2111.01540

In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.

Inspecting the concept knowledge graph encoded by modern language models

Carlos Aspillaga, Marcelo Mendoza, Alvaro Soto (2021). In: In ACL findings, The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), Aug 2-4. https://aclanthology.org/2021.findings-acl.263.pdf

The field of natural language understanding has experienced exponential progress in the last few years, with impressive results in several tasks. This success has motivated researchers to study the underlying knowledge encoded by these models. Despite this, attempts to understand their semantic capabilities have not been successful, often leading to non conclusive, or contradictory conclusions among different works. Via a probing classifier, we extract the underlying knowledge graph of nine of the most influential language models of the last years, including word embeddings, text generators, and context encoders. This probe is based on concept relatedness, grounded on WordNet. Our results reveal that all the models encode this knowledge, but suffer from several inaccuracies. Furthermore, we show that the different architectures and training strategies lead to different model biases. We conduct a systematic evaluation to discover specific factors that explain why some concepts are challenging. We hope our insights will motivate the development of models that capture concepts more precisely.

Improving video captioning with temporal composition of a visual-syntactic embedding

Jesus Perez-Martin; Benjamin Bustos; Jorge Pérez (2021). In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). https://ieeexplore.ieee.org/document/9423097

Sorry, this entry is only available in European Spanish.

Finding neighborhoods in Santiago, Chile: a graph neural network approach to urban cluster detection

(Español) Marcelo Mendoza, Naim Bro, Hans Lobel (2021). In: https://urbcompsys.github.io/

Sorry, this entry is only available in European Spanish.

Exploiting Learned Policies in Focal Search

Pablo Araneda, Matias Greco, Jorge A. Baier (2021). In: Proceedings of the International Symposium on Combinatorial Search (SoCS 2021). https://arxiv.org/abs/2104.10535

Recent machine-learning approaches to deterministic search and domain-independent planning employ policy learning to speed up search. Unfortunately, when attempting to solve a search problem by successively applying a policy, no guarantees can be given on solution quality. The problem of how to effectively use a learned policy within a bounded-suboptimal search algorithm remains largely as an open question. In this paper, we propose various ways in which such policies can be integrated into Focal Search, assuming that the policy is a neural network classifier. Furthermore, we provide mathematical foundations for some of the resulting algorithms. To evaluate the resulting algorithms over a number of policies with varying accuracy, we use synthetic policies which can be generated for a target accuracy for problems where the search space can be held in memory. We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples. We observe that Discrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.

Explainability Queries for ML Models and its Connections with Data Management Problems

Barceló, Pablo (2021). In: 24th International Conference on Database Theory (ICDT 2021). https://drops.dagstuhl.de/opus/volltexte/2021/13709/

In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML.
The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time.
The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.

Diversity, Fairness, and Sustainability in Population Protocols

Nan Kang, Frederik Mallmann-Trenn, Nicolás Rivera (2021). In: PODC'21: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. July 2021. Pages 67–76. https://doi.org/10.1145/3465084.3467940

Over the years, population protocols with the goal of reaching consensus have been studied in great depth. However, many systems in the real-world do not result in all agents eventually reaching consensus, but rather in the opposite: they converge to a state of rich diversity. Consider for example task allocation in ants. If eventually all ants perform the same task, then the colony will perish (lack of food, no brood care, etc.). Then, it is vital for the survival of the colony to have a diverse set of tasks and enough ants working on each task. What complicates matters is that ants need to switch tasks periodically to adjust the needs of the colony; e.g., when too many foragers fell victim to other ant colonies. A further difficulty is that not all tasks are equally important and maybe they need to keep certain proportions in the distribution of the task. How can ants keep a healthy and balanced allocation of tasks?

To answer this question, we propose a simple population protocol for n agents on a complete graph and an arbitrary initial distribution of k colours (tasks). In this protocol we assume that each colour i has an associated weight (importance or value) w_i ≥ 1. By denoting w as the sum of the weights of different colours, we show that the protocol converges in O(w^2 n łog n) rounds to a configuration where the number of agents supporting each colour i is concentrated on the fair share w_in/w and will stay concentrated for a large number of rounds, w.h.p.

%that is, every task is being performed by a number of agents proportional to its weight w_i. Our protocol has many interesting properties: agents do not need to know other colours and weights in the system, and our protocol requires very little memory per agent. Furthermore, the protocol guarantees fairness meaning that over a long period each agent has each colour roughly a number of times proportional to the weight of the colour. Finally, our protocol also fulfils sustainability meaning that no colour ever vanishes. All of these properties still hold when an adversary adds agents or colours.

Bounded-Suboptimal Search with Learned Heuristics

Matias Greco, Jorge A. Baier (2021). In: 2th ICAPS Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning 2021. https://prl-theworkshop.github.io/prl2021/papers/PRL2021_paper_33.pdf

Reinforcement learning allows learning very accurate heuristics for hard combinatorial puzzles like the 15-puzzle, the 24-
puzzle, and Rubik’s cube. In this paper, we empirically investigate how to exploit these learned heuristics in the context
of (deterministic) heuristic search with bounded suboptimality guarantees, using the learned heuristic for the 15 and 24-
puzzle of DeepCubeA. We show that Focal Search (FS), in its most straightforward form, that is, using the learned heuristic to sort the focal list, has poor performance when compared to Focal Discrepancy Search (FDS), a version of FS that we propose that uses a discrepancy function to sort the focal list. This is interesting the best performing algorithm does not use the heuristic values themselves but just the ranking between the successors of the node. In addition, we show FDS is competitive with satisficing search algorithms Weighted A* and Greedy Best-First Search.

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Vladimir Araujo, Andrés Villa, Marcelo Mendoza, Marie-Francine Moens, Alvaro Soto (2021). In: In: Conference on Empirical Methods in Natural Language Processing (EMNLP), November 7-11. https://arxiv.org/abs/2109.04602

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.

Attentive visual semantic specialized network for video captioning

Jesus Perez-Martin; Benjamin Bustos; Jorge Pérez (2021). In: 2020 25th International Conference on Pattern Recognition (ICPR). https://ieeexplore.ieee.org/document/9412898

As an essential high-level task of video understanding topic, automatically describing a video with natural language has recently gained attention as a fundamental challenge in computer vision. Previous models for video captioning have several limitations, such as the existence of gaps in current semantic representations and the inexpressibility of the generated captions. To deal with these limitations, in this paper, we present a new architecture that we call Attentive Visual Semantic Specialized Network (AVSSN), which is an encoder-decoder model based on our Adaptive Attention Gate and Specialized LSTM layers. This architecture can selectively decide when to use visual or semantic information into the text generation process. The adaptive gate makes the decoder to automatically select the relevant information for providing a better temporal state representation than the existing decoders. Besides, the model is capable of learning to improve the expressiveness of generated captions attending to their length, using a sentence-length-related loss function. We evaluate the effectiveness of the proposed approach on the Microsoft Video Description (MSVD) and the Microsoft Research Video-to-Text (MSR-VTT) datasets, achieving state-of-the-art performance with several popular evaluation metrics: BLEU-4, METEOR, CIDEr, and ROUGE _L .

“Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification”

Leopoldo Bertossi, Gabriela Reyes (2021). In: To appear in Proc. 1st International Joint Conference on Learning and Reasoning (IJCLR'21), Springer LNCS. arXiv:2107.10159. https://arxiv.org/abs/2107.10159

We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A detailed example with a naive-Bayes classifier is presented.

A Protocol to Follow-up with Students in Large-enrollment Courses

Matías Piña, Isabel Hilliger, Constanza Melian, Cristian Ruz, Tomás Gonzalez, Jorge Baier (2021). In: 2021 ASEE Virtual Annual Conference Content Access. https://peer.asee.org/a-protocol-to-follow-up-with-students-in-large-enrollment-courses

Sorry, this entry is only available in European Spanish.

A polynomial-time approximation algorithm for counting words accepted by an NFA (invited paper)

Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, Cristian Riveros (2021). In: STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. June 2021. Pages 4. https://doi.org/10.1145/3406325.3465353

Sorry, this entry is only available in European Spanish.

A New Boolean Encoding for MAPF and its Performance with ASP and MaxSAT Solvers

Roberto Asín Achá, Rodrigo López, Sebastián Hagerdorn, Jorge Baier (2021). In: Proceedings of the International Symposium on Combinatorial Search (SoCS 2021). https://ojs.aaai.org/index.php/SOCS/article/view/18546

Sorry, this entry is only available in European Spanish.

A Disk-Based Index for Trajectories with an In-Memory Compressed Cache

Daniela Campos; Adrián Gómez-Brandón; Gonzalo Navarro (2021). In: 2021 Data Compression Conference (DCC). https://ieeexplore.ieee.org/document/9418747

Sorry, this entry is only available in European Spanish.

VisRec: A Hands-on Tutorial on Deep Learning for Visual Recommender Systems

Denis Parra, Antonio Ossa-Guerra, Manuel Cartagena , Patricio Cerda-Mardini, Felipe del Rio (2021). In: IUI '21 Companion: 26th International Conference on Intelligent User Interfaces - Companion. April 2021 Pages 5–6. https://doi.org/10.1145/3397482.3450620

Sorry, this entry is only available in European Spanish.

Tools Impact on the Quality of Annotations for Chat Untangling

Jhonny Cerezo, Felipe Bravo-Marquez, Alexandre Henri Bergel (2021). In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop. https://aclanthology.org/2021.acl-srw.22/

Sorry, this entry is only available in European Spanish.

Stress Test Evaluation of Biomedical Word Embeddings

Vladimir Araujo, Andrés Carvallo, Carlos Aspillaga, Camilo Thorne, Denis Parra (2021). In: Proceedings of the 20th Workshop on Biomedical Language Processing. https://arxiv.org/abs/2107.11652

Sorry, this entry is only available in European Spanish.

Predicting SPARQL Query Dynamics

Alberto Moya Loustaunau, Aidan Hogan (2021). In: K-CAP '21: Proceedings of the 11th on Knowledge Capture Conference. December 2021. Pages 161–168. https://doi.org/10.1145/3460210.3493565

Sorry, this entry is only available in European Spanish.

OpenCSMap: A System for Geolocating Computer Science Publications

Felipe Manenl, Aidan Hogan (2021). In: CEUR Workshop Proceedings. https://aidanhogan.com/docs/opencsmap.pdf

Sorry, this entry is only available in European Spanish.

Interventions Recommendation: Professionals’ Observations Analysis in Special Needs Education

Javier Muñoz, Felipe Bravo-Marquez (2021). In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications. https://aclanthology.org/2021.bea-1.18/

Sorry, this entry is only available in European Spanish.

Gradually Structured Data

Stefan Malewski, Michael Greenberg, Éric Tanter (2021). In: Proceedings of the ACM on Programming Languages. Volume 5. Issue OOPSLA. October 2021. Article No.: 126pp 1–29. https://doi.org/10.1145/3485503

Sorry, this entry is only available in European Spanish.

Gradual Program Analysis for Null Pointers

Sam Estep, Jenna Wise, Jonathan Aldrich, Éric Tanter, Johannes Bader, Joshua Sunshine (2021). In: Proceedings of ECOOP 2021, LIPIcs, vol 194. arxiv:2105.06081. https://arxiv.org/abs/2105.06081

Sorry, this entry is only available in European Spanish.

Fourth Workshop on Exploratory Search and Interactive Data Analytics (ESIDA)

Dorota Glowacka, Evangelos Milios, Axel J Soto, Osnat Mokryn, Fernando V Paulovich, Denis Parra (2021). In: IUI '21 Companion: 26th International Conference on Intelligent User Interfaces - Companion. April 2021. Pages 18–20. https://doi.org/10.1145/3397482.3450711

Sorry, this entry is only available in European Spanish.

Fast Approximate Autocompletion for SPARQL Query Builders

Gabriel de la Parra, Aidan Hogan (2021). In: Visualization and Interaction for Ontologies and Linked Data Virtual Workshop. http://ceur-ws.org/Vol-3023/paper10.pdf

Sorry, this entry is only available in European Spanish.

“Expressive Power of Linear Algebra Query Languages “

Floris Geerts, Thomas Muñoz, Cristian Riveros, Domagoj Vrgoč (2021). In: PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. June 2021. Pages 342–354. https://doi.org/10.1145/3452021.3458314

Sorry, this entry is only available in European Spanish.

Evaluating a Learning Analytics Dashboard to Visualize Student Self-Reports of Time-on-task: A Case Study in a Latin American University

Isabel Hilliger, Constanza Miranda , Gregory Schuit , Fernando Duarte, Martin Anselmo, Denis Parra (2021). In: LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference. April 2021. Pages 592–598. https://doi.org/10.1145/3448139.3448203

Sorry, this entry is only available in European Spanish.

Crisis communication: a comparative study of communication patterns across crisis events in social media

Hernan Sarmiento, Barbara Poblete (2021). In: SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing. March 2021. Pages 1711–1720. https://doi.org/10.1145/3412841.3442044

Sorry, this entry is only available in European Spanish.

COVIDCube: An RDF Data Cube for Exploring Among-Country COVID-19 Correlations

Tamara Novoa-Rodríguez, Aidan Hogan (2021). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2980/paper395.pdf

We present an RDF Data Cube – integrated from numerous sources on the Web – that describes countries in terms of general variables (e.g., GDP, population density) and COVID-19 variables. On top of this data cube, we develop a system that computes and visualises correlations between these variables, providing insights into the factors that correlate with COVID-19 cases, deaths, etc., on an international level.

Adaptive Succinctness

Diego Arroyuelo, Rajeev Raman (2021). In: Algorithmica 84, 694–718 (2022). https://doi.org/10.1007/s00453-021-00872-1.

Representing a static set of integers S, $| S | = n$ from a finite universe $U = [1 . . u]$ is a fundamental task in computer science. Our concern is to represent S in small space while supporting the operations of $r a n k$ and $s e l e c t$ on S; if S is viewed as its characteristic vector, the problem becomes that of representing a bit-vector, which is arguably the most fundamental building block of succinct data structures. Although there is an information-theoretic lower bound of $B (n, u) = ł g (\binom{u}{n})$ bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain runs of| $ℓ > 1$ consecutive elements, one that occurs in many practical situations. Let $C^{(n)}$ denote the class of $(\binom{u}{n})$ distinct sets of $n$ elements over the universe $[1 . . u]$ . Let also $C_{g}^{(n)} \subset C^{(n)}$ contain the sets whose $n$ elements are arranged in $g ł e n$ runs of $ℓ_{i} \geq 1$ consecutive elements from U for $i = 1, ł d o t s, g$ , and let $C_{g, r}^{(n)} \subset C_{g}^{(n)}$ contain all sets that consist of g runs, such that $r ł e g$ of them have at least 2 elements.

Compact Representation of Spatial Hierarchies and Topological Relationships

José Fuentes-Sepúlveda; Diego Gatica; Gonzalo Navarro; M. Andrea Rodrígucz; Diego Seco (2021). In: 2021 Data Compression Conference (DCC). https://ieeexplore.ieee.org/document/9418724

The topological model for spatial objects identifies common boundaries between regions, explicitly storing adjacency relations, which not only improves the efficiency of topologyrelated queries, but also provides advantages such as avoiding data duplication and facilitating data consistency. Recently, a compact representation of the topological model based on planar graph embeddings was proposed. In this article, we provide an elegant generalization of such a representation to support hierarchies of vector objects, which better fits the multi-granular nature of spatial data, such as the political and administrative partition of a country. This representation adds a small space on top of the succinct base representation of each granularity, while efficiently answering new topology-related queries between objects not necessarily at the same level of granularity.

2020

Translating navigation instructions in natural language to a high-level plan for behavioral robot navigation

Álvaro Soto, Xiaoxue Zang, Ashwini Pokle, Marynel Vázquez, Kevin T. Chen, Juan Carlos Niebles, Silvio Savarese. 2020. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. https://aclanthology.org/D18-1286/

We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user instructions and a topological representation of the environment. We evaluate our model’s performance on a new dataset containing 10,050 pairs of navigation instructions. Our model significantly outperforms baseline approaches. Furthermore, our results suggest that it is possible to leverage the environment map as a relevant knowledge base to facilitate the translation of free-form navigational instruction.

Knowledge Graphs

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, Antoine Zimmermann. arXiv.org (https://arxiv.org/abs/2003.02320)

In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graph

GENE: Graph generation conditioned on named entities for polarity and controversy detection in social media

Marcelo Mendoza, Denis Parra, Álvaro Soto. Information Processing & Management, Volume 57, Issue 6, 2020, 102366, ISSN 0306-4573. https://doi.org/10.1016/j.ipm.2020.102366.

Abstract: Many of the interactions between users on social networks are controversial, specially in polarized environments. In effect, rather than producing a space for deliberation, these environments foster the emergence of users that disqualify the position of others. On news sites, comments on the news are characterized by such interactions. This is detrimental to the construction of a deliberative and democratic climate, stressing the need for automatic tools that can provide an early detection of polarization and controversy. We introduce GENE (graph generation conditioned on named entities), a representation of user networks conditioned on the named entities (personalities, brands, organizations) which users comment upon. GENE models the leaning that each user has concerning entities mentioned in the news. GENE graphs is able to segment the user network according to their polarity. Using the segmented network, we study the performance of two controversy indices, the existing Random Walks Controversy (RWC) and another one we introduce, Relative Closeness Controversy (RCC). These indices measure the interaction between the network’s poles providing a metric to quantify the emergence of controversy. To evaluate the performance of GENE, we model the network of users of a popular news site in Chile, collecting data in an observation window of more than three years. A large-scale evaluation using GENE, on thousands of news, allows us to conclude that over 60% of user comments have a predictable polarity. This predictability of the user interaction scenario allows both controversy indices to detect a controversy successfully. In particular, our introduced RCC index shows satisfactory performance in the early detection of controversies using partial information collected during the first hours of the news event, with a sensitivity to the target class exceeding 90%.

In-Database Graph Analytics with Recursive SPARQL

Hogan A., Reutter J.L., Soto A. (2020) In-Database Graph Analytics with Recursive SPARQL. In: Pan J.Z. et al. (eds) The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science, vol 12506. Springer, Cham. https://doi.org/10.1007/978-3-030-62419-4_29

Works on knowledge graphs and graph-based data management often focus either on graph query languages or on frameworks for graph analytics, where there has been little work in trying to combine both approaches. However, many real-world tasks conceptually involve combinations of these approaches: a graph query can be used to select the appropriate data, which is then enriched with analytics, and then possibly filtered or combined again with other data by means of a query language. In this paper we propose a language that is well-suited for both graph querying and analytical tasks. We propose a minimalistic extension of SPARQL to allow for expressing analytical tasks over existing SPARQL infrastructure; in particular, we propose to extend SPARQL with recursive features, and provide a formal syntax and semantics for our language. We show that this language can express key analytical tasks on graphs (in fact, it is Turing complete). Moreover, queries in this language can also be compiled into sequences of iterations of SPARQL update statements. We show how procedures in our language can be implemented over off-the-shelf SPARQL engines, with a specialised client that can leverage database operations to improve the performance of queries. Results for our implementation show that procedures for popular analytics currently run in seconds or minutes for selective sub-graphs (our target use-case).

Global Vertex Similarity for Large-Scale Knowledge Graphs

Marco Caballero and Aidan Hogan; DCC, Universidad de Chile & IMFD. CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2773/paper-16.pdf

Abstract. We investigate global measures of vertex similarity for knowledge graphs. While vertex similarity has been explored in the context of directed, unlabelled graphs, measures based on recursive algorithms or learning frameworks can be costly to compute, assume labelled data, and/or provide poorly-interpretable results. Knowledge graphs further imply unique challenges for vertex similarity in terms of scale and diversity. We thus propose and explore global measures of vertex similarity for Knowledge Graphs that (i) are unsupervised, (ii) offer explanations of similarity results; (iii) take into consideration edge labels; and (iv) are robust in terms of redundant or interdependent information. Given that these measures can still be costly to compute precisely, we propose an approximation strategy that enables computation at scale. We compare our measures with a recursive measure (SimRank) for computing vertex similarity over subsets of Wikidata.

Versioned Queries over RDF Archives: All You Need is SPARQL?

Ignacio Cuevas and Aidan Hogan; Department of Computer Science, University of Chile & IMFD Chile. http://ceur-ws.org/Vol-2821/paper6.pdf

Abstract. We explore solutions for representing archives of versioned RDF data using the SPARQL standard and off-the-shelf engines. We consider six representations of RDF archives based on named graphs, and describe how input queries can be automatically rewritten to return solutions for a particular version, or solutions that change between versions. We evaluate these alternatives over an archive of 8 weekly versions of Wikidata and 146 queries using Virtuoso as the SPARQL engine.

Suggesting Citations for Wikidata Claims basedon Wikipedia’s External References

Paolo Curotto and Aidan Hogan. CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2773/paper-15.pdf

Abstract. Given a Wikidata claim, we explore automated methods for locating references that support that claim. Our goal is to assist human editors in referencing claims, and thus increase the ratio of referenced claims in Wikidata. As an initial approach, we mine links from the references section of English Wikipedia articles, download and index their content, and use standard relevance-based measures to find supporting documents. We consider various forms of search phrasings, as well as different scopes of search. We evaluate our methods in terms of the coverage of reference documents collected from Wikipedia. We also develop a gold standard of sample items for evaluating the relevance of suggestions. Our results in general reveal that the coverage of Wikipedia reference documents for claims is quite low, but where a reference document is available, we can often suggest it within the first few results.

Laconic Image Classification: Human vs. Machine Performance

Javier Carrasco, Aidan Hogan, Jorge Pérez. CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020 Pages 115–124https://doi.org/10.1145/3340531.3411984

We propose laconic classification as a novel way to understand and compare the performance of diverse image classifiers. The goal in this setting is to minimise the amount of information (aka. entropy) required in individual test images to maintain correct classification. Given a classifier and a test image, we compute an approximate minimal-entropy positive image for which the classifier provides a correct classification, becoming incorrect upon any further reduction. The notion of entropy offers a unifying metric that allows to combine and compare the effects of various types of reductions (e.g., crop, colour reduction, resolution reduction) on classification performance, in turn generalising similar methods explored in previous works. Proposing two complementary frameworks for computing the minimal-entropy positive images of both human and machine classifiers, in experiments over the ILSVRC test-set, we find that machine classifiers are more sensitive entropy-wise to reduced resolution (versus cropping or reduced colour for machines, as well as reduced resolution for humans), supporting recent results suggesting a texture bias in the ILSVRC-trained models used. We also find, in the evaluated setting, that humans classify the minimal-entropy positive images of machine models with higher precision than machines classify those of humans.

Knowledge Graphs: Research Directions

Hogan A. (2020) Knowledge Graphs: Research Directions. In: Manna M., Pieris A. (eds) Reasoning Web. Declarative Artificial Intelligence. Reasoning Web 2020. Lecture Notes in Computer Science, vol 12258. Springer, Cham. https://doi.org/10.1007/978-3-030-60067-9_8

In these lecture notes, we provide an overview of some of the high-level research directions and open questions relating to knowledge graphs. We discuss six high-level concepts relating to knowledge graphs: data models, queries, ontologies, rules, embeddings and graph neural networks. While traditionally these concepts have been explored by different communities in the context of graphs, more recent works have begun to look at how they relate to one another, and how they can be unified. In fact, at a more foundational level, we can find some surprising relations between the different concepts. The research questions we explore mostly involve combinations of these concepts.

Web Ontology Language

Hogan A. (2020) Web Ontology Language. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_5

In this chapter, we provide a detailed primer on the second version of the Web Ontology Language (OWL 2) standard. We first motivate the need for such a standard, discussing the role and importance of ontologies on the Web. We then describe how ontology languages, which themselves can be formally defined through model theory, can subsequently be used to formally define ontologies. Thereafter we discuss the OWL vocabulary used to define the semantics of classes, properties, individuals, and datatypes within ontologies. We cover some of the main reasoning tasks for ontologies and the applications in which they are used. We discuss how these core reasoning tasks are undecidable for the full OWL (2) language and outline the sub-languages (aka. profiles) proposed by the standard that allow for more efficient reasoning procedures. We conclude by reflecting on the importance of having expressive ontologies on the Web of Data, and discuss open challenges.

The Web of Data

The Web of Data. Author: Aidan Hogan, Department of Computer ScienceUniversidad de ChileSantiago de Chile, IMFD. https://doi.org/10.1007/978-3-030-51580-5

This book concisely brings together the key standards and best practices relating to modelling, querying, validating and linking machine-readable data and semantics on the Web. Alongside practical examples and formal definitions, the book shows how these standards contribute to – and have been used thus far on – the “Web of Data”: a machine readable evolution of the Web marked by increased automation, enabling powerful Web applications capable of discovering, cross-referencing, and organising data from numerous websites in a matter of seconds. The book is divided into nine chapters, the first of which highlights the fundamental shortcomings of the current Web that illustrate the need for increased machine readability. The next chapter outlines the core concepts of the “Web of Data”, discussing use-cases on the Web where they have already been deployed. “Resource Description Framework (RDF)” describes the graph-structured data model proposed by the Semantic Web community as a common data model for the Web. The chapter on “RDF Schema (RDFS) and Semantics” presents a lightweight ontology language used to define an initial semantics for RDF graphs. In turn, the chapter “Web Ontology Language (OWL)” elaborates on a much more expressive ontology language built upon RDFS. In “SPARQL Query Language” a language for querying and updating RDF graphs is described. “Shape Constraints and Expressions (SHACL/ShEx)” introduces two languages for describing the expected structure of – and expressing constraints over – RDF graphs for the purposes of validation. “Linked Data” discusses the principles and best practices by which interlinked (RDF) data can be published on the Web, and how they have been adopted. The final chapter highlights open problems and concludes with a general discussion on the future of the Web of Data. The book is intended for students, researchers and advanced practitioners interested in learning more about the Web of Data, and about closely related topics such as the Semantic Web, Knowledge Graphs, Linked Data, Graph Databases, Ontologies, etc. Offering a range of accessible examples and exercises, it can be used as a textbook for students and other newcomers to the field. It can also serve as a reference handbook for researchers and developers, as it offers up-to-date details on key standards (RDF, RDFS, OWL, SPARQL, SHACL, ShEx, RDB2RDF, LDP), along with formal definitions and references to further literature. The associated website webofdatabook.org offers a wealth of complementary material, including solutions to the exercises, slides for classes, interactive examples, and a section for comments and questions.

SPARQL Query Language

Hogan A. (2020) SPARQL Query Language. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_6

This chapter provides a detailed introduction to the SPARQL Protocol and RDF Query Language (SPARQL 1.1): the standard query language for RDF. After some initial motivation, we delve into the features of the query language, illustrated with concrete examples. We then formally define the semantics of these query features. We next discuss how federated queries can be used to evaluate queries over multiple remote sources on the Web. We detail the SPARQL Update language, which allows for modifying the data indexed by a SPARQL query service. We introduce SPARQL Entailment Profiles, which allow for query results to consider entailments, including support for RDF, RDFS and OWL semantics. We further discuss the HTTP-based protocol by which requests can be issued to a SPARQL service over the Web, as well as the SPARQL Service Description vocabulary, which can be used to describe and advertise the features supported by such services. We conclude by discussing the importance of SPARQL for the Web of Data, the key research directions that are currently being explored, as well as open challenges.

Shape Constraints and Expressions

Hogan A. (2020) Shape Constraints and Expressions. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_7

In this chapter, we introduce two languages for describing shapes and constraints for RDF graphs, namely the Shapes Constraint Language (SHACL) and the Shape Expressions Language (ShEx 2.1). Both languages allow for defining constraints over RDF graphs in terms of what data are expected, what data are obligatory, what data are allowed, and what data are disallowed. This in turn allows RDF graphs to be validated with respect to the specified constraints. We first look at SHACL, describing the SHACL-Core fragment and the constraints it allows. We then discuss how SHACL-SPARQL allows for further constraints to be expressed using SPARQL query syntax. Turning to ShEx, we describe its syntaxes, and how it differs from SHACL. We outline and provide a semantics for an abstract shapes syntax that generalises SHACL and ShEx. We conclude with a general discussion of the role of shapes languages on the Web of Data, as well as open challenges.

Resource Description Framework

Hogan A. (2020) Resource Description Framework. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_3

This chapter provides a detailed primer for the Resource Description Framework (RDF 1.1) standard, proposed as a common data model for publishing and exchanging structured data on the Web. We first motivate the need for a data model like RDF. We then describe the types of terms used in RDF: the basic building blocks of the framework. We discuss how these terms can be combined to make coherent statements in the form of RDF triples, and how triples form graphs and datasets. Thereafter we discuss the RDF vocabulary: a built-in set of terms used for modeling more complex data, such as complex relations and ordered lists. Finally, we give an overview of the different syntaxes by which RDF can be serialized and communicated.

RDF Schema and Semantics

Hogan A. (2020) RDF Schema and Semantics. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_4

This chapter presents an in-depth primer for the RDF Schema (RDFS 1.1) standard, which is primarily used to define a lightweight semantics for the classes and properties used in RDF graphs. After an initial motivation and overview, we discuss the RDFS vocabulary, and how it can be used to define sub-classes, sub-properties, domain and ranges, amongst other types of definitions. We then describe in detail how the semantics of RDF(S) can be formalised in a model-theoretic way, discussing key concepts such as interpretations, models, satisfiability and entailment. We introduce various semantics for RDF(S), including the simple semantics, D semantics, RDF semantics, and the RDFS semantics. We conclude the chapter by discussing how rules can be used to support entailment under such semantics.

Linked Data

Hogan A. (2020) Linked Data. In: The Web of Data. Springer, Cham. https://doi.org/10.1007/978-3-030-51580-5_8

This chapter motivates, introduces and describes Linked Data, which centres around a concise set of principles by which data can be published and interlinked on the Web, and by which a Web of Data can ultimately be formed. We first discuss the core Linked Data principles, which espouse the use of HTTP IRIs to identify the entities described in data, returning a machine-readable description of the entity (typically RDF) when its corresponding IRI is looked up on the Web. We then discuss some further best practices for publishing data conformant with the Linked Data principles in a way that enhances interoperability. We discuss the Linking Open Data (LOD) project founded on the idea of publishing Open Data on the Web in a standard, machine-readable fashion using Linked Data; we describe the most prominent datasets and vocabularies that have results from this initiative. We then discuss tools and techniques for converting legacy data to RDF, discovering links, and hosting Linked Data. We subsequently discuss the Linked Data Platform: a standard that outlines the protocols and resources needed to build a new generation of read–write Linked Data applications. We conclude the chapter with a discussion of open challenges yet to be addressed in the context of Linked Data.

Explaining VQA predictions using visual grounding and a knowledge base

Felipe Riquelme, Alfredo De Goyeneche, Yundong Zhang, Juan Carlos Niebles, Alvaro Soto. Image and Vision Computing, Volume 101, 2020, 103968, ISSN 0262-8856. https://doi.org/10.1016/j.imavis.2020.103968.

Abstract: In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model.

Fine-Grained Entity Linking

Henry Rosales-Méndez; Aidan Hogan; Barbara Poblete. IMFD, Chile. Department of Computer Science, University of Chile, Chile. Journal of Web Semantics, Volume 65, December 2020, 100600. https://doi.org/10.1016/j.websem.2020.100600

The Entity Linking (EL) task involves linking mentions of entities in a text with their identifier in a Knowledge Base (KB) such as Wikipedia, BabelNet, DBpedia, Freebase, Wikidata, YAGO, etc. Numerous techniques have been proposed to address this task down through the years. However, not all works adopt the same convention regarding the entities that the EL task should target; for example, while some EL works target common entities like “interview” appearing in the KB, others only target named entities like “Michael Jackson”. The lack of consensus on this issue (and others) complicates research on the EL task; for example, how can the performance of EL systems be evaluated and compared when systems may target different types of entities? In this work, we first design a questionnaire to understand what kinds of mentions and links the EL research community believes should be targeted by the task. Based on these results we propose a fine-grained categorization scheme for EL that distinguishes different types of mentions and links. We propose a vocabulary extension that allows to express such categories in EL benchmark datasets. We then relabel (subsets of) three popular EL datasets according to our novel categorization scheme, where we additionally discuss a tool used to semi-automate the labeling process. We next present the performance results of five EL systems for individual categories. We further extend EL systems with Word Sense Disambiguation and Coreference Resolution components, creating initial versions of what we call Fine-Grained Entity Linking (FEL) systems, measuring the impact on performance per category. Finally, we propose a configurable performance measure based on fuzzy sets that can be adapted for different application scenarios Our results highlight a lack of consensus on the goals of the EL task, show that the evaluated systems do indeed target different entities, and further reveal some open challenges for the (F)EL task regarding more complex forms of reference for entities.

Differentiable adaptive computation time for visual reasoning

C. Eyzaguirre and Á. Soto, "Differentiable Adaptive Computation Time for Visual Reasoning," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12814-12822, https://doi.org/10.1109/CVPR42600.2020.01283

Abstract: This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable. Our method can be used in conjunction with many networks; in particular, we study its application to the widely know MAC architecture, obtaining a significant reduction in the number of recurrent steps needed to achieve similar accuracies, therefore improving its performance to computation ratio. Furthermore, we show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset, demonstrating that our approach is able to control the number of steps without significant loss of performance. Additional advantages provided by our approach include considerably improving interpretability by discarding useless steps and providing more insights into the underlying reasoning process. Finally, we present adaptive computation as an equivalent to an ensemble of models, similar to a mixture of expert formulation. Both the code and the configuration files for our experiments are made available to support further research in this area.

Los gobiernos socialdemócratas en Chile

Ana Farías Antognini, Sergio Toro-Maureira (2020). In: ¿Giro a la izquierda o viraje al centro? Argentina y el Cono Sur, entre la continuidad y el cambio", Santiago C. Leiras (compilador). https://www.teseopress.com/giroalaizquierda/

“Los gobiernos socialdemócratas en Chile” in “¿Giro a la izquierda o viraje al centro? Argentina y el Cono Sur, entre la continuidad y el cambio”.

PLAN ESTRATÉGICO DE LA LOGÍSTICA URBANO-PORTUARIA Mejores ciudades portuarias en el Área Metropolitana de Concepción

Mabel Alarcón, Violeta Montero, Alejandro Tudela, Sergio Toro-Maureira (2020). In: https://www.researchgate.net/publication/345948379_PLAN_ESTRATEGICO_DE_LA_LOGISTICA_URBANO-PORTUARIA_Mejores_ciudades_portuarias_en_el_Area_Metropolitana_de_Concepcion

Technical Report: “PLAN ESTRATÉGICO DE LA LOGÍSTICA URBANO-PORTUARIA Mejores ciudades portuarias en el Área Metropolitana de Concepción”.

Special Issue on Database Theory

Pablo Barceló, Marco Calautti (2020). In: Theory of Computing Systems volume 65, pages 1–2 (2021). https://link.springer.com/article/10.1007/s00224-020-10006-9

Sorry, this entry is only available in European Spanish.

Information extraction meets the Semantic Web: A survey

Martinez-Rodriguez, Jose L., Cinvestav Tamaulipas, Ciudad Victoria, Mexico; Hogan, Aidan, IMFD Chile; Department of Computer Science, University of Chile, Chile; Lopez-Arevalo, Ivana, Cinvestav Tamaulipas, Ciudad Victoria, Mexico. Journal Semantic Web, vol. 11, no. 2, pp. 255-335, 2020. https://doi.org/10.3233/SW-180333

Abstract: We provide a comprehensive survey of the research literature that applies Information Extraction techniques in a Semantic Web setting. Works in the intersection of these two areas can be seen from two overlapping perspectives: using Semantic Web resources (languages/ontologies/knowledge-bases/tools) to improve Information Extraction, and/or using Information Extraction to populate the Semantic Web. In more detail, we focus on the extraction and linking of three elements: entities, concepts and relations. Extraction involves identifying (textual) mentions referring to such elements in a given unstructured or semi-structured input source. Linking involves associating each such mention with an appropriate disambiguated identifier referring to the same element in a Semantic Web knowledge-base (or ontology), in some cases creating a new identifier where necessary. With respect to entities, works involving (Named) Entity Recognition, Entity Disambiguation, Entity Linking, etc. in the context of the Semantic Web are considered. With respect to concepts, works involving Terminology Extraction, Keyword Extraction, Topic Modeling, Topic Labeling, etc., in the context of the Semantic Web are considered. Finally, with respect to relations, works involving Relation Extraction in the context of the Semantic Web are considered. The focus of the majority of the survey is on works applied to unstructured sources (text in natural language); however, we also provide an overview of works that develop custom techniques adapted for semi-structured inputs, namely markup documents and web tables.

CompactNets: Compact Hierarchical Compositional Networks for Visual Recognition

Hans Lobel, Alvaro Soto, Pontificia Universidad Católica de Chile; René Vidal, The Johns Hopkins University. Computer Vision and Image Understanding, Volume 191, February 2020, 102841. https://doi.org/10.1016/j.cviu.2019.102841

Abstract: CNN-based models currently provide state-of-the-art performance in image categorization tasks. While these methods are powerful in terms of representational capacity, they are generally not conceived with explicit means to control complexity. This might lead to scenarios where resources are used in a non-optimal manner, increasing the number of unspecialized or repeated neurons, and overfitting to data. In this work we propose CompactNets, a new approach to visual recognition that learns a hierarchy of shared, discriminative, specialized, and compact representations. CompactNets naturally capture the notion of compositional compactness, a characterization of complexity in compositional models, consisting on using the smallest number of patterns to build a suitable visual representation. We employ a structural regularizer with group-sparse terms in the objective function, that induces on each layer, an efficient and effective use of elements from the layer below. In particular, this allows groups of top-level features to be specialized based on category information. We evaluate CompactNets on the ILSVRC12 dataset, obtaining compact representations and competitive performance, using an order of magnitude less parameters than common CNN-based approaches. We show that CompactNets are able to outperform other group-sparse-based approaches, in terms of performance and compactness. Finally, transfer-learning experiments on small-scale datasets demonstrate high generalization power, providing remarkable categorization performance with respect to alternative approaches.

An integrated model for textual social media data with spatio-temporal dimensions

Juglar Diaz, Barbara Poblete, Felipe Bravo Marquez. (2020) In: Information Processing & Management. Science Direct. https://www.sciencedirect.com/science/article/pii/S0306457319308738

Sorry, this entry is only available in European Spanish.

Minding the AI Gap in LATAM

Bárbara Poblete, Jórge Pérez 2020. In: Communications of the ACM. https://doi.org/10.1145/3416969

Societies and industries are rapidly changing due to the adoption of artificial intelligence (AI) and will face deep transformations in upcoming years. In this scenario, it becomes critical for under-represented communities in technology, in particular developing countries like Latin America, to foster initiatives that are committed to developing tools for the local adoption of AI. Latin America, as well as many non-English speaking regions, face several problems for the adoption of AI technology, including the lack of diverse and representative resources for automated learning tasks. A highly problematic area in this regard is natural language processing (NLP), which is strongly dependent on labeled datasets for learning. However, most state-of-the-art NLP resources are allocated to English. Therefore, creating efficient NLP tools for diverse languages requires an important investment of time and financial resources. To deal with such issues, our group has worked toward creating language-agnostic approaches as well as adapting and improving existing NLP techniques to local problems. In addition, we have focused on producing new state-of-the-art NLP publicly available data and models in Spanish. Next, we briefly present some of them.

The Semantic Web: Two decades on

Hogan, Aidan; IMFD; Department of Computer Science, University of Chile, Santiago, Chile. Journal Semantic Web, vol. 11, no. 1, pp. 169-185, 2020. https://doi.org/10.3233/SW-190387

Abstract: More than two decades have passed since the establishment of the initial cornerstones of the Semantic Web. Since its inception, opinions have remained divided regarding the past, present and potential future impact of the Semantic Web. In this paper – and in light of the results of over two decades of development on both the Semantic Web and related technologies – we reflect on the current status of the Semantic Web, the impact it has had thus far, and future challenges. We first review some of the external criticism of this vision that has been put forward by various authors; we draw together the individual critiques, arguing both for and against each point based on the current state of adoption. We then present the results of a questionnaire that we have posed to the Semantic Web mailing list in order to understand respondents’ perspective(s) regarding the degree to which the original Semantic Web vision has been realised, the impact it can potentially have on the Web (and other settings), its success stories thus far, as well as the degree to which they agree with the aforementioned critiques of the Semantic Web in terms of both its current state and future feasibility. We conclude by reflecting on future challenges and opportunities in the area.

ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence

Qiu, Jiajun; Bernhofer, Michael; Heinzinger, Michael; Kemper, Sofie; Melo, Francisco; Rost, Burkhard, Tomás Norambuena (2020). In: Journal of Molecular Biology. Volume 432, Issue 7, 27 March 2020, Pages 2428-2443. https://doi.org/10.1016/j.jmb.2020.02.026

Sorry, this entry is only available in European Spanish.

The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

F. Förster, G. Cabrera-Vives, E. Castillo-Navarrete, P. A. Estévez, P. Sánchez-Sáez, J. Arredondo, F. E. Bauer, R. Carrasco-Davis, M. Catelan, F. Elorrieta, P. Huijse, G. Pignata, E. Reyes, I. Reyes, D. Rodríguez-Mancini, D. Ruz-Mieres, C. Valenzuela, I. Alvarez-Maldonado, N. Astorga, J. Borissova, A. Clocchiatti, D. De Cicco, C. Donoso-Oliva, M. J. Graham, R. Kurtev, A. Mahabal, J.C. Maureira, R. Molina-Ferreiro, A. Moya, W. Palma, M. Pérez-Carrasco, P. Protopapas, M. Romero, L. Sabatini-Gacitúa, A. Sánchez, J. San Martín, C. Sepúlveda-Cobo, E. Vera, J. R. Vergara, Susana Eyheramendy (2020). In: arxiv:2008.03303v1. https://arxiv.org/abs/2008.03303

Sorry, this entry is only available in European Spanish.

A Polygenic Risk Score Suggests Shared Genetic Architecture of Voice Break With Early Markers of Pubertal Onset in Boys

María C Lardone, Alexander S Busch, José L Santos, Patricio Miranda, Susana Eyheramendy, Ana Pereira, Anders Juul, Kristian Almstrup, Verónica Mericq (2020). In: The Journal of Clinical Endocrinology & Metabolism, Volume 105, Issue 3, March 2020, Pages e349–e357. https://doi.org/10.1210/clinem/dgaa003

Sorry, this entry is only available in European Spanish.

Gobernanza en ciudades portuarias. Aprendizajes desde el Área Metropolitana de Concepción

Mabel Alarcón, Violeta Montero, Alejandro Tudela, Sergio Toro-Maureira (2020). Research Gate. https://www.researchgate.net/publication/345948467_GOBERNANZA_EN_CIUDADES_PORTUARIAS_Aprendizajes_desde_el_Area_Metropolitana_de_Concepcion

“GOBERNANZA EN CIUDADES PORTUARIAS Aprendizajes desde el Área Metropolitana de Concepción”

“Fake News is Anything They Say!”–Conceptualization and Weaponization of Fake News Among the American Public

Chau Tong, Hyungjin Gill, Jianing Li, Hernando Rojas, Sebastián Valenzuela (2020). In: Mass Communication and Society. Volume 23, 2020 - Issue 5: What IS News?. https://doi.org/10.1080/15205436.2020.1789661

Sorry, this entry is only available in European Spanish.

Nuevos desafíos, enfoques y perspectivas para estudiar élites políticas

Alejandro Olivares, Bastián González, Sergio Toro-Maureira, Juan Carlos Arellano, Anabel Yanes-Rojas, José Zurita-Tapia, Amanda Vitoria, Claudio Robelo, Juan Bautista Canavesi (2020). In: BEROAMERICANA. América Latina - España - Portugal, 20(74), 229–259. https://doi.org/10.18441/ibam.20.2020.74.229-259

This section analyzes the new challenges, analytical approaches, and methodologies for the study of different political elites. Based on the existing literature and the review of cases in Latin America, these articles reflect on the various research perspectives considering as dependent variables or phenomena to explain the governmental, legislative, and ministerial elites.

Studying incidental news: Antecedents, dynamics and implications

Neta Kligler-Vilenchik, Alfred Hermida, Mikko Villi, Sebastián Valenzuela (2020). In: Journalism. Vol 21, Issue 8, 2020. https://doi.org/10.1177/1464884920915372

Sorry, this entry is only available in European Spanish.

The Future is Big Graphs! A Community View on Graph Processing Systems

Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei R. Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, Joshua Shinavier, Gábor Szárnyas, Riccardo Tommasini, Antonino Tumeo, Alexandru Uta, Ana Lucia Varbanescu, Hsiang-Yun Wu, Nikolay Yakovets, Da Yan, Eiko Yoneki (2020). In: arXiv:2012.06171v1. https://arxiv.org/abs/2012.06171

Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?

Setting the agenda: The news media and public opinion, 3rd edition

Sebastián Valenzuela, Maxwell McCombs (2020). In: Polity Press. SBN: 9781509535798. https://www.politybooks.com/bookdetail?book_slug=setting-the-agenda-mass-media-and-public-opinion-3rd-edition--9781509535798

News media strongly influence how we picture public affairs across the world, playing a significant and sometimes controversial role in determining which topics are at the centre of public attention and action. Setting the Agenda, first published in 2004, has become the go-to textbook on this crucial topic.In this timely third edition, Maxwell McCombs – a pioneer of agenda-setting research – and Sebastián Valenzuela – a senior scholar of agenda setting in Latin America – have expanded and updated the book for a new generation of students. In describing the media’s influence on what we think about and how we think about it, Setting the Agenda also examines the sources of media agendas, the psychological explanation for their impact on the public agenda, and their consequences for attitudes, opinions and behaviours. New to this edition is a discussion of agenda setting in the widened media landscape, including a full chapter on network agenda setting and a lengthened presentation on agenda melding. The book also contains expanded material on social media and the role of agenda setting beyond the realm of public affairs, as well as a foreword from Donald L. Shaw and David H. Weaver, the co-founders of agenda-setting theory.This exciting new edition is an invaluable source for students of media, communications and politics, as well as those interested in the role of news in shaping and directing public opinion.

R3MAT: A Rapid and Robust Graph Generator

Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei R. Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, Joshua Shinavier, Gábor Szárnyas, Riccardo Tommasini, Antonino Tumeo, Alexandru Uta, Ana Lucia Varbanescu, Hsiang-Yun Wu, Nikolay Yakovets, Da Yan, Eiko Yoneki (2020). In: IEEE Access. arxiv:2012.06171. https://arxiv.org/abs/2012.06171

Sorry, this entry is only available in European Spanish.

Transforming RDF Data into Property Graphs

Renzo Angles; Roberto García (2020). In: IEEE Latin America Transactions ( Volume: 18, Issue: 01, January 2020). https://ieeexplore.ieee.org/document/9049470

Sorry, this entry is only available in European Spanish.

Mapping RDF Databases to Property Graph Databases

Renzo Angles; Harsh Thakkar; Dominik Tomaszuk (2020). In: IEEE Access. Vol. 8. https://ieeexplore.ieee.org/document/9088985

Sorry, this entry is only available in European Spanish.

PGO: Describing Property Graphs in RDF

Dominik Tomaszuk; Renzo Angles; Harsh Thakkar (2020). In: IEEE Access ( Volume: 8). https://ieeexplore.ieee.org/document/9115617

Sorry, this entry is only available in European Spanish.

GSP4PDB: a web tool to visualize, search and explore protein-ligand structural patterns

Renzo Angles, Mauricio Arenas-Salinas, Roberto García, Jose Antonio Reyes-Suarez & Ehmke Pohl (2020). In: BMC Bioinformatics volume 21, Article number: 85 (2020). https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3352-x

Sorry, this entry is only available in European Spanish.

The Expressive Power of Graph Neural Networks as a Query Language

Pablo Barceló, Egor V. Kostylev, Mikaël Monet, Jorge Pérez, Juan L. Reutter, Juan-Pablo Silva (2020). In: ACM SIGMOD Record. Volume 49. Issue 2. June 2020, pp 6–17. https://doi.org/10.1145/3442322.3442324

Sorry, this entry is only available in European Spanish.

On the expressiveness of Lara: A unified language for linear and relational algebra

Pablo Barceló, Nelson Higuera, Jorge Pérez, Bernardo Subercaseaux (2020). In: Leibniz International Proceedings in Informatics, LIPIcs. arxiv:1909.11693v1. https://arxiv.org/abs/1909.11693

Sorry, this entry is only available in European Spanish.

First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries

Pablo Barcelo, Gerald Berger, Carsten Lutz, Andreas Pieris (2020). In: arxiv:2011.09314v1. https://doi.org/10.48550/arXiv.2011.09314

We focus on ontology-mediated queries (OMQs) based on (frontier-)guarded existential rules and (unions of) conjunctive queries, and we investigate the problem of FO-rewritability, i.e., whether an OMQ can be rewritten as a first-order query. We adopt two different approaches. The first approach employs standard two-way alternating parity tree automata. Although it does not lead to a tight complexity bound, it provides a transparent solution based on widely known tools. The second approach relies on a sophisticated automata model, known as cost automata. This allows us to show that our problem is 2ExpTime-complete. In both approaches, we provide semantic characterizations of FO-rewritability that are of independent interest.

A More General Theory of Static Approximations for Conjunctive Queries

Pablo Barceló, Miguel Romero & Thomas Zeume (2020). In: Theory of Computing Systems. 64, pages 916–964 (2020). https://link.springer.com/article/10.1007/s00224-019-09924-0

Conjunctive query (CQ) evaluation is NP-complete, but becomes tractable for fragments of bounded hypertreewidth. Approximating a hard CQ by a query from such a fragment can thus allow for an efficient approximate evaluation. While underapproximations (i.e., approximations that return correct answers only) are well-understood, the dual notion of overapproximations (i.e, approximations that return complete – but not necessarily sound – answers), and also a more general notion of approximation based on the symmetric difference of query results, are almost unexplored. In fact, the decidability of the basic problems of evaluation, identification, and existence of those approximations has been open. This article establishes a connection between overapproximations and existential pebble games that allows for studying such problems systematically. Building on this connection, it is shown that the evaluation and identification problem for overapproximations can be solved in polynomial time. While the general existence problem remains open, the problem is shown to be decidable in 2EXPTIME over the class of acyclic CQs and in PTIME for Boolean CQs over binary schemata. Additionally we propose a more liberal notion of overapproximations to remedy the known shortcoming that queries might not have an overapproximation, and study how queries can be overapproximated in the presence of tuple generating and equality generating dependencies. The techniques are then extended to symmetric difference approximations and used to provide several complexity results for the identification, existence, and evaluation problem for this type of approximations.

Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases

Mikaël Monet (2020). In: PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. June 2020. Pages 149–163. https://doi.org/10.1145/3375395.3387642

Sorry, this entry is only available in European Spanish.

The Limits of Efficiency for Open- and Closed-World Query Evaluation under Guarded TGDs

Pablo Barcelo, Victor Dalmau, Cristina Feier, Carsten Lutz, Andreas Pieris (2020). In: PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. June 2020. Pages 259–270. https://doi.org/10.1145/3375395.3387653

Ontology-mediated querying and querying in the presence of constraints are two key database problems where tuple-generating dependencies (TGDs) play a central role. In ontology-mediated querying, TGDs can formalize the ontology and thus derive additional facts from the given data, while in querying in the presence of constraints, they restrict the set of admissible databases. In this work, we study the limits of efficient query evaluation in the context of the above two problems, focusing on guarded and frontier-guarded TGDs and on UCQs as the actual queries. We show that a class of ontology-mediated queries (OMQs) based on guarded TGDs can be evaluated in FPT iff the OMQs in the class are equivalent to OMQs in which the actual query has bounded treewidth, up to some reasonable assumptions. For querying in the presence of constraints, we consider classes of constraint-query specifications (CQSs) that bundle a set of constraints with an actual query. We show a dichotomy result for CQSs based on guarded TGDs that parallels the one for OMQs except that, additionally, FPT coincides with PTime combined complexity. The proof is based on a novel connection between OMQ and CQS evaluation. Using a direct proof, we also show a similar dichotomy result, again up to some reasonable assumptions, for CQSs based on frontier-guarded TGDs with a bounded number of atoms in TGD heads. Our results on CQSs can be viewed as extensions of Grohe’s well-known characterization of the tractable classes of CQs (without constraints). Like Grohe’s characterization, all the above results assume that the arity of relation symbols is bounded by a constant. We also study the associated meta problems, i.e., whether a given OMQ or CQS is equivalent to one in which the actual query has bounded treewidth.

Semantic Optimization of Conjunctive Queries

Figueira, D.; Gottlob, G, Pieris, A.; Barceló, P. (2020). In: Journal of the ACMVolume 67Issue 6December 2020 Article No.: 34pp 1–60https://doi.org/10.1145/3424908. https://dl.acm.org/doi/10.1145/3424908

This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHW_k. A CQ is semantically in GHW_k if it is equivalent to a CQ in GHW_k. The problem of checking whether a CQ is semantically in GHW_k has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHW_k under the constraints, while not being semantically in GHW_k without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHW_k, for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHW_k over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW₁ is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHW_k alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHW_k, we discuss how it can be approximated via a CQ that falls in GHW_k in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable.

Abstracting gradual references

MatíasToro, Éric Tanter (2020). In: Science of Computer Programming. Volume 197, 1 October 2020, 102496. https://doi.org/10.1016/j.scico.2020.102496

Sorry, this entry is only available in European Spanish.

Model Interpretability through the Lens of Computational Complexity

Pablo Barceló, Mikaël Monet, Jorge Pérez, Bernardo Subercaseaux (2020). In: arxiv:2010.12265. https://arxiv.org/abs/2010.12265

Sorry, this entry is only available in European Spanish.

Learning to Detect Online Harassment on Twitter with the Transformer

Margarita Bugueño, Marcelo Mendoza (2020). In: Communications in Computer and Information Science. ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases pp 298–306. https://link.springer.com/chapter/10.1007/978-3-030-43887-6_23

Sorry, this entry is only available in European Spanish.

Arabic dialect sentiment analysis with ZERO effort. Case study: Algerian dialect

Marcelo Mendoza, Imane Guellil, Faical Azoauau (2020). In: Inteligencia Artificial, 23(65), 124–135. https://doi.org/10.4114/intartif.vol23iss65pp124-135

Sorry, this entry is only available in European Spanish.

Towards Large-scale RoI Indexing for Content-aware Data Discovery

Araya, M. ; Caceres, R. ; Gutierrez, L. ; Mendoza, M. ; Ponce, C. ; Valenzuela, C. (2020). In: Astronomical Data Analysis Software and Systems XXVII. ASP Conference Series, Vol. 522, Proceedings of a conference held. https://ui.adsabs.harvard.edu/abs/2020ASPC..522...57A/abstract

Sorry, this entry is only available in European Spanish.

Bots in Social and Interaction Networks: Detection and Impact Estimation

Marcelo Mendoza, Maurizio Tesconi, Stefano Cresci (2020). In: ACM Transactions on Information SystemsVolume 39Issue 1January 2021 Article No.: 5pp 1–32https://doi.org/10.1145/3419369

Sorry, this entry is only available in European Spanish.

Cryptocurrency mining games with economic discount and decreasing rewards

Arenas, Marcelo ; Reutter, Juan ; Toussaint, Etienne ; Ugarte, Martín ; Vial, Francisco ; Vrgoč, Domagoj (2020). In: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). https://drops.dagstuhl.de/opus/volltexte/2020/11915/

Sorry, this entry is only available in European Spanish.

When is Approximate Counting for Conjunctive Queries Tractable?

Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, Cristian Riveros (2020). In: arxiv:2005.10029v3. https://doi.org/10.48550/arXiv.2005.10029

Sorry, this entry is only available in European Spanish.

Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation

Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, Cristian Riveros (2020). In: ACM SIGMOD RecordVolume 49Issue 1March 2020 pp 52–59https://doi.org/10.1145/3422648.3422661

Sorry, this entry is only available in European Spanish.

Chile’s new interdisciplinary institute for foundational research on data

Marcelo Arenas, Pablo Barceló (2020) In: Communications of the ACMVolume 63Issue 11November 2020 pp 78–83https://doi.org/10.1145/3416975

Article “Chile’s new interdisciplinary institute for foundational research on data” in Communications of the ACM.

Think the Vote: Information Processing, Selective Exposure to Social Media, and Support for Trump and Clinton

Thomas J. Johnson, Magdalena Saldaña, Barbara K. Kaye (2020). In: International Journal of Communication. Vol 14 (2020) . https://ijoc.org/index.php/ijoc/article/view/13494

This study proposes a three-way interaction model that examines how (1) partisan selective exposure to political information on social media, (2) information processing, and (3) ideology influenced support for Hillary Clinton and Donald Trump for president. Findings indicate that processing election information systematically affected support for Clinton among those who were exposed to diverse information; otherwise, heuristics were the main cue to process political information. Conservatives supporting Trump relied on heuristic processing and avoided information that challenged their beliefs. Liberals, in contrast, were more likely to systematically process election information, but the effect was significant only for those who exposed themselves to diverse information. As such, systematic processing might not make a difference in highly polarized environments, where strong partisans are unlikely to engage with different viewpoints and expose themselves to diverse information.

Peripheral elaboration model: The impact of incidental news exposure on political participation

Saif Shahin, Magdalena Saldaña, Homero Gil de Zúñiga (2020). In: ournal of Information Technology & Politics. Volume 18, 2021. Issue 2. Pages 148-163. https://doi.org/10.1080/19331681.2020.1832012

This study places the “cognitive elaboration model” on news gathering and political behavior within the dual-processing “elaboration likelihood model” to derive hypotheses about the effects of incidental news exposure and tests them using two-wave panel data. Results indicate incidental news exposure predicts online participation but not offline participation – underlining the importance of differentiating between political behaviors in the two environments. The key finding, however, is that news elaboration mediates the positive relationship between incidental exposure and political participation, which is theorized as taking place through the peripheral route of elaboration – as opposed to intentional exposure, which engages the central route.

Score-Based Explanations in Data Management and Machine Learning

Leopoldo Bertossi (2020). In: International Conference on Scalable Uncertainty Management. SUM 2020: Scalable Uncertainty Management pp 17–31. https://link.springer.com/chapter/10.1007/978-3-030-58449-8_2

Sorry, this entry is only available in European Spanish.

The shapley value of tuples in query answering

Livshits, Ester ; Bertossi, Leopoldo ; Kimelfeld, Benny ; Sebag, Moshe (2020). In: Leibniz International Proceedings in Informatics, LIPIcs. 23rd International Conference on Database Theory (ICDT 2020). https://drops.dagstuhl.de/opus/volltexte/2020/11944/

Sorry, this entry is only available in European Spanish.

An ASP-Based Approach to Counterfactual Explanations for Classification

Leopoldo Bertossi (2020). In: International Joint Conference on Rules and Reasoning. RuleML+RR 2020: Rules and Reasoning pp 70–81. https://link.springer.com/chapter/10.1007/978-3-030-57977-7_5

We propose answer-set programs that specify and compute counterfactual interventions as a basis for causality-based explanations to decisions produced by classification models. They can be applied with black-box models and models that can be specified as logic programs, such as rule-based classifiers. The main focus is on the specification and computation of maximum responsibility causal explanations. The use of additional semantic knowledge is investigated.

Specifying and computing causes for query answers in databases via database repairs and repair-programs

Leopoldo Bertossi (2020). In: Knowledge and Information Systems volume 63, pages 199–231 (2021). https://link.springer.com/article/10.1007/s10115-020-01516-6

Sorry, this entry is only available in European Spanish.

Stable Model Semantics for Recursive SHACL

Medina Adresel, Julien Corman, Magdalena Ortiz, Ognjen Savković, Juan Reutter, Mantas Šimkus (2022). In: WWW '20: Proceedings of The Web Conference 2020April 2020 Pages 1570–1580https://doi.org/10.1145/3366423.3380229

Sorry, this entry is only available in European Spanish.

Current Challenges in Graph Databases (Invited Talk)

Juan Reutter (2020). In: 23rd International Conference on Database Theory (ICDT 2020). https://drops.dagstuhl.de/opus/volltexte/2020/11927/

Sorry, this entry is only available in European Spanish.

Contextual Linear Types for Differential Privacy

Matías Toro, David Darais, Chike Abuah, Joe Near, Damián Árquez, Federico Olmedo, Éric Tanter (2020). In: arxiv:2010.11342. https://arxiv.org/abs/2010.11342

Sorry, this entry is only available in European Spanish.

Querying APIs with SPARQL

Matthieu Mosser, Fernando Pieressa, Juan L. Reutter, Adrián Soto, Domago Vrgoč (2020). In: Information Systems Volume 105, March 2022, 101650. https://www.sciencedirect.com/science/article/abs/pii/S0306437920301125

Sorry, this entry is only available in European Spanish.

Using Deep Learning to Detect Rumors in Twitter

Eliana Providel, Marcelo Mendoza (2020). In: International Conference on Human-Computer Interaction. HCII 2020: Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis pp 321–334. Lecture Notes in Computer Science book series (LNISA,volume 12194). https://link.springer.com/chapter/10.1007/978-3-030-49570-1_22

Sorry, this entry is only available in European Spanish.

Political parties, diminished subtypes, and democracy

Juan Pablo Luna; Rafael Pineiro Rodríguez; Fernando Rosenblatt; Gabriel Vommaro (2020). In: Party Politics. Vol 27, Issue 2, 2021. https://doi.org/10.1177/1354068820923723

Sorry, this entry is only available in European Spanish.

Improving query expansion strategies with word embeddings

Alfredo Silva, Marcelo Mendoza (2020). In: DocEng '20: Proceedings of the ACM Symposium on Document Engineering 2020. September 2020. Article No.: 10. Pages 1–4. https://doi.org/10.1145/3395027.3419601

Sorry, this entry is only available in European Spanish.

Spanish pre-trained BERT model and evaluation data

José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jorge Pérez (2020). In: Practical ML for Developing Countries Workshop @ ICLR 2020. https://pml4dc.github.io/iclr2020/program/pml4dc_10.html

Sorry, this entry is only available in European Spanish.

An Interoperable Repository of Clinical Data

Mauricio Solar; Mauricio Araya-López; Juan Cockbaine; Victor Castaneda; Marcelo Mendoza (2020). In: 2020 Seventh International Conference on eDemocracy & eGovernment (ICEDEG). https://ieeexplore.ieee.org/document/9096707/keywords#keywords

This article shows an innovation project that aims contributing, from the ICT perspective, to necessities of health sector, specifically in interoperability and generation of information starting from distributed sources. For these purposes, the technological objective is to develop a standardized interoperable repository and intelligent applications, at prototype level, feasible to massify in order to contribute to the timely care of patients, through information generated by the processing of radiological images and reports associated with clinical records.

A Simple and Fast Bi-Objective Search Algorithm

Carlos Hernández Ulloa, William Yeoh, Jorge Baier, Han Zhang, Luis Suazo, Sven Koenig (2020). In: Proceedings of the International Conference on Automated Planning and Scheduling, 30(1), 143-151. https://ojs.aaai.org/index.php/ICAPS/article/view/6655

Sorry, this entry is only available in European Spanish.

The 2(k) Neighborhoods for Grid Path Planning

Nicolás Rivera, Carlos Hernández, Nicolás Hormazábal, Jorge A Baier (2020). In: Journal of Artificial Intelligence Research. Vol. 67 (2020). https://doi.org/10.1613/jair.1.11383

Sorry, this entry is only available in European Spanish.

Let’s build Bridges, not Walls: SPARQL Querying of TinkerPop Graph Databases with Sparql-Gremlin

Harsh Thakkar; Renzo Angles; Marko Rodriguez; Stephen Mallette; Jens Lehmann (2020). In: 2020 IEEE 14th International Conference on Semantic Computing (ICSC). https://ieeexplore.ieee.org/document/9031506

Sorry, this entry is only available in European Spanish.

A MOOC-based flipped experience: Scaffolding SRL strategies improves learners’ time management and engagement

Mar Pérez‐Sanagustín, Diego Sapunar‐Opazo, Ronald Pérez‐Álvarez, Isabel Hilliger, Anis Bey, Jorge Maldonado‐Mahauad (2020). In: Computer Applications in Engineering Education. Volume 29, Issue 4. Special Issue: Distance learning, MOOCs and globalisation of engineering education. July 2021. Pages 750-768. https://doi.org/10.1002/cae.22337

Sorry, this entry is only available in European Spanish.

Learning to combine classifiers outputs with the transformer for text classification

Margarita Bugueño, Marcelo Mendoza (2020). In: Intelligent Data Analysis. vol. 24, no. S1, pp. 15-41, 2020. https://content.iospress.com/articles/intelligent-data-analysis/ida200007

ext classification is a fairly explored task that has allowed dealing with a considerable amount of problems. However, one of its main difficulties is to conduct a learning process in data with class imbalance, i.e., datasets with only a few examples in some classes, which often represent the most interesting cases for the task. In this context, text classifiers overfit some particular classes, showing poor performance. To address this problem, we propose a scheme that combines the outputs of different classifiers, coding them in the encoder of a transformer. Feeding also a BERT encoding of each example, the encoder learns a joint representation of the text and the outputs of the classifiers. These encodings are used to train a new text classifier. Since the transformer is a highly complex model, we introduce a data augmentation technique, which allows the representation learning task to be driven without over-fitting the encoding to a particular class. The data augmentation technique also allows for producing a balanced dataset. The combination of both methods, representation learning, and data augmentation, allows improving the performance of trained classifiers. Results in benchmark data for two text classification tasks (stance classification and online harassment detection) show that the proposed scheme outperforms all of its direct competitors.

Correcting for differential recruitment in respondent-driven sampling data using ego-network information

Isabelle S. Beaudry, Krista J. Gile (2020). In: Electron. J. Statist. 14 (2) 2678 - 2713, 2020. https://doi.org/10.1214/20-EJS1718

Respondent-Driven sampling (RDS) is a sampling method devised to overcome challenges with sampling hard-to-reach human populations. The sampling starts with a limited number of individuals who are asked to recruit a small number of their contacts. Every surveyed individual is subsequently given the same opportunity to recruit additional members of the target population until a pre-established sample size is achieved. The recruitment process consequently implies that the survey respondents are responsible for deciding who enters the study. Most RDS prevalence estimators assume that participants select among their contacts completely at random. The main objective of this work is to correct the inference for departure from this assumption, such as systematic recruitment based on the characteristics of the individuals or based on the nature of relationships. To accomplish this, we introduce three forms of non-random recruitment, provide estimators for these recruitment behaviors and extend three estimators and their associated variance procedures. The proposed methodology is assessed through a simulation study capturing various sampling and network features. Finally, the proposed methods are applied to a public health setting.

The Complexity of Counting Problems over Incomplete Databases

Marcelo Arenas, Pablo Barceló, Mikaël Monet (2020). In: arXiv:2011.06330. https://doi.org/10.48550/arXiv.2011.06330

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between \#P-hardness and polynomial-time computability for these problems when q is a self-join-free conjunctive query, and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: for instance, while the latter is always in \#P, we prove that the former is not in \#P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

“Towards a Definitive Measure of Repetitiveness. “

Tomasz Kociumaka, Gonzalo Navarro, Nicola Prezza (2020). In: Lecture Notes in Computer Science book series (LNTCS, volume 12118). https://link.springer.com/chapter/10.1007/978-3-030-61792-9_17

Sorry, this entry is only available in European Spanish.

Counting Problems over Incomplete Databases

Marcelo Arenas, Pablo Barceló, Mikaël Monet (2020). In: PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. June 2020. Pages 165–177. https://doi.org/10.1145/3375395.3387656

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return or the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join-free conjunctive query, and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations (for instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption). Moreover, we find that both (1) and (2) reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial randomized approximation scheme, in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

“Ranked Document Selection. “

J. Ian Munro, Gonzalo Navarro, Rahul Shah, Sharma V. Thankachan (2020). In: Theoretical Computer Science Volume 812, 6 April 2020, Pages 149-159. https://www.sciencedirect.com/science/article/abs/pii/S0304397519306310?via%3Dihub

Sorry, this entry is only available in European Spanish.

“On Dynamic Succinct Graph Representations. “

Miguel E. Coimbra; Alexandre P. Francisco; Luís M. S. Russo; Guillermo De Bernardo; Susana Ladra; Gonzalo Navarro (2020). In: Data Compression Conference (DCC), 2020, pp. 213-222, doi: 10.1109/DCC47342.2020.00029.

Sorry, this entry is only available in European Spanish.

The Tractability of SHAP-scores over Deterministic and Decomposable Boolean Circuits

Marcelo Arenas, Pablo Barceló Leopoldo Bertossi, Mikaël Monet (2020). In: arxiv:2007.14045. https://doi.org/10.48550/arXiv.2007.14045

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

Descriptive Complexity for Counting Complexity Classes

Marcelo Arenas ; Martin Muñoz ; Cristian Riveros (2020). In: lmcs:4493 - Logical Methods in Computer Science, February 10, 2020, Volume 16, Issue 1 - https://doi.org/10.23638/LMCS-16(1:9)2020

Descriptive Complexity has been very successful in characterizing complexity classes of decision problems in terms of the properties definable in some logics. However, descriptive complexity for counting complexity classes, such as FP and #P, has not been systematically studied, and it is not as developed as its decision counterpart. In this paper, we propose a framework based on Weighted Logics to address this issue. Specifically, by focusing on the natural numbers we obtain a logic called Quantitative Second Order Logics (QSO), and show how some of its fragments can be used to capture fundamental counting complexity classes such as FP, #P and FPSPACE, among others. We also use QSO to define a hierarchy inside #P, identifying counting complexity classes with good closure and approximation properties, and which admit natural complete problems. Finally, we add recursion to QSO, and show how this extension naturally captures lower counting complexity classes such as #L.Descriptive Complexity has been very successful in characterizing complexity classes of decision problems in terms of the properties definable in some logics. However, descriptive complexity for counting complexity classes, such as FP and #P, has not been systematically studied, and it is not as developed as its decision counterpart. In this paper, we propose a framework based on Weighted Logics to address this issue. Specifically, by focusing on the natural numbers we obtain a logic called Quantitative Second Order Logics (QSO), and show how some of its fragments can be used to capture fundamental counting complexity classes such as FP, #P and FPSPACE, among others. We also use QSO to define a hierarchy inside #P, identifying counting complexity classes with good closure and approximation properties, and which admit natural complete problems. Finally, we add recursion to QSO, and show how this extension naturally captures lower counting complexity classes such as #L.

From access deprivation to skill acquisition: Cluster analysis of user behavior in face of a 12-hour legal blockage of WhatsApp in Brazil.

Marcelo Santos, Andrés Rosenberg, Magdalena Saldaña (2020). In: First Monday. https://firstmonday.org/article/view/10401/8316

This study takes advantage of a forceful legal 12-hour deprivation of access to WhatsApp messaging service nationwide in Brazil on 18 December 2015. Right after the blockage, we ran a survey to capture the reaction of Brazilians that were cut off from the App, aiming to understand which factors point to the capacity to successfully circumvent the blockage. Anxiety, digital skills and gender were found to be related to success, while isolation and age were not. Furthermore, a cluster analysis of 306 respondents that attempted to bypass the blockage identified four groups that summarize the reaction patterns in face of the blockage: the deprived, the challengers, the addicted and the elite. We discuss the possible implications of the findings for the field.

Stronger and Safer Together: Motivations for and Challenges of (Trans)National Collaboration in Investigative Reporting in Latin America

Lourdes M. Cueva Chacón, Magdalena Saldaña (2020). In: Digital Journalism. Volume 9, 2021 - Issue 2: Digital Journalism in Latin America. Guest Editors: Pablo J. Boczkowski and Eugenia Mitchelstein. https://www.tandfonline.com/doi/full/10.1080/21670811.2020.1775103

Despite the growing scholarship on investigative journalism in Latin America, very few studies have addressed collaboration across newsrooms in the region. By analyzing the responses of 251 journalists who work for investigative units in Latin American news outlets, this study explores a) the reasons why Latin American journalists are increasingly seeking to participate in national and transnational collaborative enterprises, b) the challenges they identify, and c) the role digital technologies are playing in this trend of transnational collaboration. Using mixed methods, we found that collaborations occur to enhance the impact of investigative projects, to reach larger audiences, and to achieve a big-picture coverage. We also found that safety is an important motivation to work in conjunction with other newsrooms—by collaborating, journalists are able to strengthen security measures and challenge censorship. Yet, coordinating teams—especially at the transnational level—remains the biggest challenge to overcome. Digital technologies are significantly related to reporters’ likelihood of collaborating, but these technologies require other reporting skills to be useful for investigative journalism. Implications for research and practice are discussed.

I Don’t Want You to Be My President! Incivility and Media Bias During the Presidential Election in Chile

Magdalena Saldaña, Andres Rosenberg (2020) In: Social Media + Society. October 2020. https://journals.sagepub.com/doi/full/10.1177/2056305120969891

This study observes two relevant issues in today’s media ecosystem: incivility in online news comments and media bias during election periods. By analyzing 84 stories and 4670 comments published during the 2017 presidential election in Chile, we observed the extent to which news commenters addressed political figures using uncivil discourse, and the extent to which incivility and media bias were related in comments discussing the election. Results indicate incivility in comment sections of Chilean news outlets is higher than that found in the Global North, and the levels of uncivil speech are even higher when the conversation mentions female politicians, especially former president Michelle Bachelet. We also found a relationship between media bias and user bias—stories positively biased toward current president Sebastián Piñera were associated with more positive comments about him. Implications and future research are discussed.

Data Quality and Explainable AI

Floris Geerts, Leopoldo Bertossi (2020). In: Journal of Data and Information Quality. Volume 12. Issue 2. June 2020 Article No.: 11, pp 1–9. https://doi.org/10.1145/3386687

In this work, we provide some insights and develop some ideas, with few technical details, about the role of explanations in Data Quality in the context of data-based machine learning models (ML). In this direction, there are, as expected, roles for causality, and explainable artificial intelligence. The latter area not only sheds light on the models, but also on the data that support model construction. There is also room for defining, identifying, and explaining errors in data, in particular, in ML, and also for suggesting repair actions. More generally, explanations can be used as a basis for defining dirty data in the context of ML, and measuring or quantifying them. We think dirtiness as relative to the ML task at hand, e.g., classification.

Causality-based Explanation of Classification Outcomes

Jordan Li, Maximilian Schleich, Dan Suciu, Zografoula Vagena, Leopoldo Bertossi (2020). In: DEEM'20: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning. June 2020. Article No.: 6. Pages 1–10. https://doi.org/10.1145/3399579.3399865

We propose a simple definition of an explanation for the outcome of a classifier based on concepts from causality. We compare it with previously proposed notions of explanation, and study their complexity. We conduct an experimental evaluation with two real datasets from the financial domain.

An ASP-Based Approach to Counterfactual Explanations for Classification

Leopoldo Bertossi (2020). In: International Joint Conference on Rules and Reasoning. RuleML+RR 2020: Rules and Reasoning pp 70–81. Lecture Notes in Computer Science book series. https://link.springer.com/chapter/10.1007/978-3-030-57977-7_5

JSON: Data model and query languages

Leopoldo Bertossi (2020). In: arxiv:2011.07423v3. https://doi.org/10.48550/arXiv.2011.07423

Sorry, this entry is only available in European Spanish.

JSON: Data model and query languages

Pierre Bourhis, Domagoj Vrgoč, Juan Reutter (2020). in: Information Systems. Volume 89, March 2020, 101478. https://doi.org/10.1016/j.is.2019.101478

Despite the fact that JSON is currently one of the most popular formats for exchanging data on the Web, there are very few studies on this topic and there is no agreement upon a theoretical framework for dealing with JSON. Therefore in this paper we propose a formal data model for JSON documents and, based on the common features present in available systems using JSON, we define a lightweight query language allowing us to navigate through JSON documents, study the complexity of basic computational tasks associated with this language, and compare its expressive power with practical languages for managing JSON data.

Recursive SPARQL for Graph Analytics

Aidan Hogan, Juan Reutter, Adrian Soto (2020). In: arxiv:2004.01816. https://doi.org/10.48550/arXiv.2004.01816

Sorry, this entry is only available in European Spanish.

Predecessor Search

Gonzalo Navarro, Javiel Rojas-Ledesma (2020). In: ACM Computing Surveys. Volume 53. Issue 5. September 2021. Article No.: 105pp 1–35. https://dl.acm.org/doi/10.1145/3409371

Sorry, this entry is only available in European Spanish.

Crisis de la representación política en América Latina y los ciclos pendulares de coaliciones electorales oligárquicas y antisistema

Juan Pablo Luna (2020). In: Inclusión y cohesión social en el marco de la Agenda 2030 para el Desarrollo Sostenible: claves para un desarrollo social inclusivo en América Latina. Santiago: CEPAL, 2020. LC/TS.2020/59. p. 103-106. https://repositorio.cepal.org/handle/11362/46136

Sorry, this entry is only available in European Spanish.

“Extending General Compact Querieable Representations to GIS Applications. “

Nieves R. Brisaboa, Ana Cerdeira-Pena, Guillermo de Bernardo, Gonzalo Navarro, Óscar Pedreira (2020). In: Information Sciences Volume 506, January 2020, Pages 196-216. https://www.sciencedirect.com/science/article/abs/pii/S0020025519307418?via%3Dihub

Sorry, this entry is only available in European Spanish.

Do Conditionalities Increase Support for Government Transfers?

Cesar Zucco; O. Gokce Baykal, Juan Pablo Luna (2020). In: Journal of Development Studies. Volume 56, 2020 - Issue 3. https://doi.org/10.1080/00220388.2019.1577388

Sorry, this entry is only available in European Spanish.

Hate speech detection is not as easy as you may think: A closer look at model validation (extended version)

Aymé Arango, Bárbara Poblete, Jorge Pérez (2020). In: Information Systems. Volume 105, March 2022, 101584. https://doi.org/10.1016/j.is.2020.101584

Sorry, this entry is only available in European Spanish.

Uruguay 2019: Party system restructuring and the end of the progressive cycle

Lihuen Nocetto, Rafael Piñeiro, Fernando Rosenblatt (2020). In: Revsta de Ciencia Política. Volumen 40. N° 2. 2020, pp. 511-538. https://scielo.conicyt.cl/pdf/revcipol/v40n2/0718-090X-revcipol-S0718-090X2020005000117.pdf

Sorry, this entry is only available in European Spanish.

Political parties, diminished subtypes, and democracy

Juan Pablo Luna, Rafael Piñeiro Rodríguez, Fernando Rosenblatt, Gabriel Vommaro (2020). In: Political parties, diminished subtypes, and democracy. Party Politics. 2021;27(2):294-307. https://journals.sagepub.com/doi/10.1177/1354068820923723

Sorry, this entry is only available in European Spanish.

Solving Sum-of-Costs Multi-Agent Pathfinding with Answer-Set Programming

Rodrigo Gómez, Carlos Hernández, Jorge Baier (2020). In: Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v34i06.6540

Sorry, this entry is only available in European Spanish.

Multipath Adaptive A*: Factors That Influence Performance in Goal-Directed Navigation in Unknown Terrain

Carlos Hernández Ulloa, Roberto Asín-Achá, Jorge Baier (2020). In: IEEE Access ( Volume: 8). 10.1109/ACCESS.2020.3003344

Sorry, this entry is only available in European Spanish.

Hybrid Hashtags: #YouKnowYoureAKiwiWhen Your Tweet Contains Māori and English

David Trye, Andreea S. Calude, Felipe Bravo-Marquez, Te Taka Keegan (2020). In: Frontiers in Artificial Intelligence . Volume 2. https://www.frontiersin.org/articles/10.3389/frai.2020.00015/full

Sorry, this entry is only available in European Spanish.

Computing coverage kernels under restricted settings

Jérémy Barbay, Pablo Pérez-Lantero, Javiel Rojas-Ledesma (2020). In: Theoretical Computer Science. Volume 815, 2 May 2020, Pages 270-288. https://doi.org/10.1016/j.tcs.2020.01.021

Sorry, this entry is only available in European Spanish.

Tree path majority data structures

Travis Gagie, Meng He, Carlos Ochoa, Gonzalo Navarro (2020). In: Theoretical Computer Science. Volume 833, 12 September 2020, Pages 107-119. https://doi.org/10.1016/j.tcs.2020.05.039

Sorry, this entry is only available in European Spanish.

“Semantrix: A Compressed Semantic Matrix. “

Nieves Rodríguez Brisaboa; Antonio Fariña; Gonzalo Navarro; Tirso Varela Rodeiro (2020). In: 2020 Data Compression Conference (DCC). https://ieeexplore.ieee.org/document/9105851

Sorry, this entry is only available in European Spanish.

Optimal Joins Using Compact Data Structures

Javiel Rojas-Ledesma, Juan Reutter, Gonzalo Navarro (2020). In: 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs). https://drops.dagstuhl.de/opus/volltexte/2020/11945/

Sorry, this entry is only available in European Spanish.

“Approximating Optimal Bidirectional Macro Schemes. “

Luís M. S. Russo; Ana D. Correia; Gonzalo Navarro; Alexandre P. Francisco (2020). In: 2020 Data Compression Conference (DCC). 10.1109/DCC47342.2020.00023

Sorry, this entry is only available in European Spanish.

“Fast and Compact Planar Embeddings. “

Leo Ferres, José Fuentes-Sepúlveda, Travis Gagie, Meng He, Gonzalo Navarro (2020). In: Computational Geometry Volume 89, August 2020, 101630. https://doi.org/10.1016/j.comgeo.2020.101630

Sorry, this entry is only available in European Spanish.

“Compressed Dynamic Range Majority and Minority Data Structures. “

Travis Gagie, Meng He, Gonzalo Navarro (2020). In: Algorithmica volume 82, pages 2063–2086 (2020). https://link.springer.com/article/10.1007/s00453-020-00687-6

Sorry, this entry is only available in European Spanish.

Stability and incorporation: Toward a new concept of party system institutionalization

Rafael Piñeiro, Fernando Rosenblatt (2020).In: Party Politics. Volume: 26 issue: 2, page(s): 249-260. https://journals.sagepub.com/doi/10.1177/1354068818777895

Sorry, this entry is only available in European Spanish.

WEFE: The Word Embeddings Fairness Evaluation Framework.

Pablo Badilla, Felipe Bravo-Marquez, Jorge Pérez (2020). In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20). https://www.ijcai.org/Proceedings/2020/0060.pdf

Sorry, this entry is only available in European Spanish.

“DCC-Uchile at SemEval-2020 Task 1: Temporal Referencing Word Embeddings”

Frank D. Zamora-Reina, Felipe Bravo-Marquez (2020). In: SemEval-2020: International Workshop on Semantic Evaluation. https://felipebravom.com/publications/semeval2020.pdf

Sorry, this entry is only available in European Spanish.

A mechanized formalization of GraphQL

Tomás Díaz, Federico Olmedo, Éric Tanter (2020). In: CPP 2020: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. January 2020. Pages 201–214. https://doi.org/10.1145/3372885.3373822

Sorry, this entry is only available in European Spanish.

Gradualizing the Calculus of Inductive Constructions

Meven Lennon-Bertrand, Kenji Maillard, Nicolas Tabareau, Éric Tanter (2020). In: ACM Transactions on Programming Languages and Systems (TOPLAS), 44(2), 1-82. Arxiv. https://arxiv.org/abs/2011.10618

We investigate gradual variations on the Calculus of Inductive Construction (CIC) for swifter prototyping with imprecise types and terms. We observe, with a no-go theorem, a crucial tradeoff between graduality and the key properties of normalization and closure of universes under dependent product that CIC enjoys. Beyond this Fire Triangle of Graduality, we explore the gradualization of CIC with three different compromises, each relaxing one edge of the Fire Triangle. We develop a parametrized presentation of Gradual CIC (GCIC) that encompasses all three variations, and develop their metatheory. We first present a bidirectional elaboration of GCIC to a dependently-typed cast calculus, CastCIC, which elucidates the interrelation between typing, conversion, and the gradual guarantees. We use a syntactic model of CastCIC to inform the design of a safe, confluent reduction, and establish, when applicable, normalization. We study the static and dynamic gradual guarantees as well as the stronger notion of graduality with embedding-projection pairs formulated by New and Ahmed, using appropriate semantic model constructions. This work informs and paves the way towards the development of malleable proof assistants and dependently-typed programming languages.

Trace-Relating Compiler Correctness and Secure Compilation

Carmine Abate, Roberto Blanco, Ștefan Ciobâcă, Adrien Durier, Deepak Garg, Cătălin Hrițcu, Marco Patrignani, Éric Tanter, Jérémy Thibault (2020. In: European Symposium on Programming ESOP 2020: Programming Languages and Systems pp 1–28. Lecture Notes in Computer Science book series. https://link.springer.com/chapter/10.1007/978-3-030-44914-8_1

Sorry, this entry is only available in European Spanish.

Gradual verification of recursive heap data structures

Jenna Wise, Johannes Bader, Cameron Wong, Jonathan Aldrich, Joshua Sunshine, Eric Tanter (2020). In: Proceedings of the ACM on Programming Languages. Volume 4. Issue OOPSLA. November 2020. Article No.: 228pp 1–28. https://doi.org/10.1145/3428296

Sorry, this entry is only available in European Spanish.

cBiK: A Space-Efficient Data Structure for Spatial Keyword Queries

Carlos E. Sanjuan-Contreras; Gilberto Gutiérrez Retamal; Miguel A. Martínez-Prieto, Diego Seco (2020). In: IEEE Access ( Volume: 8). https://ieeexplore.ieee.org/document/9099243

Sorry, this entry is only available in European Spanish.

The Little Prover

Eric Tanter (2020). In: Journal of Functional Programming, 30, E6. https://www.cambridge.org/core/journals/journal-of-functional-programming/article/review-of-the-little-prover-by-daniel-p-friedman-and-carl-eastlund-mit-press-2015/F85BF92A2EBE5A0B46063D3E6EA9D457

Sorry, this entry is only available in European Spanish.

Analyzing the effect of the topology on succinct tree encodings

José Fuentes , Alexander Irribarra, Diego Seco (2020). SPIRE 2020: 27th International Symposium on String Processing and Information Retrieval.https://www.cs.ucf.edu/spire2020/wp-content/uploads/2020/10/11-wctaTopology.pdf

Sorry, this entry is only available in European Spanish.

Compressing and randomly accessing sequences (note)

Laith Ali Abdusahib; Diego Arroyuelo; Rajeev Raman (2020). In: 2020 Data Compression Conference (DCC). https://ieeexplore.ieee.org/document/9105832/authors#authors

Sorry, this entry is only available in European Spanish.

New initialization for algorithms to solve Median String Problem

Pedro Mirabal; José Abreu; Diego Seco; Oscar Pedreira; Edgar Chávez (2020). In: 2020 39th International Conference of the Chilean Computer Science Society (SCCC). https://ieeexplore.ieee.org/document/9281215

Sorry, this entry is only available in European Spanish.

Neural language models for text classification in evidence-based medicine

Andres Carvallo, Denis Parra, Gabriel Rada, Daniel Perez, Juan Ignacio Vasquez, Camilo Vergara (2020). In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. https://arxiv.org/abs/2012.00584

Sorry, this entry is only available in European Spanish.

Scalable recommendation of wikipedia articles to editors using representation learning

Oleksii Moskalenko, Denis Parra, Diego Saez-Trumper (2020). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2697/paper1_complexrec.pdf

Sorry, this entry is only available in European Spanish.

To index or not to index: Time-space trade-offs for positional ranking functions in search engines

Senen González, Mauricio Marín, Mauricio Oyarzún, Torsten Suel, Luis Valenzuela (2020). In: Information Systems Volume 89, March 2020, 101466. Science Direct. https://www.sciencedirect.com/science/article/abs/pii/S0306437919305186

Sorry, this entry is only available in European Spanish.

“Three Success Stories About Compact Data Structures”

Diego Arroyuelo, José Fuentes-Sepülveda, Diego Seco (2020). In: Communications of the ACM. Volume 63. Issue 11. November 2020, pp 64–65. https://dl.acm.org/doi/10.1145/3416971

Article Three Success Stories About Compact Data Structures in Communications of the ACM, Latin America Region Special Section: Hot Topics.

CuratorNet: Visually-aware recommendation of art images

Pablo Messina, Manuel Cartagena, Patricio Cerda-Mardini, Felipe del Rio, Denis Parra (2020). In: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2697/paper2_complexrec.pdf

Sorry, this entry is only available in European Spanish.

A Survey on Deep Learning and Explainability for Automatic Image-based Medical Report Generation

Pablo Messina, Pablo Pino, Denis Parra, Alvaro Soto, Cecilia Besa, Sergio Uribe, Marcelo andía, Cristian Tejos, Claudia Prieto, Daniel Capurro (2020). In: arXiv preprint arXiv:2010.10563. https://arxiv.org/abs/2010.10563

Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.

On Adversarial Examples for Biomedical NLP Tasks

Vladimir Araujo, Andres Carvallo, Carlos Aspillaga, Denis Parra (2020). In: arxiv:2004.11157. https://arxiv.org/abs/2004.11157

Sorry, this entry is only available in European Spanish.

Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Games

Andrés Villa, Vladimir Araujo, Francisca Cattan, Denis Parra (2020). In: Fourteenth ACM Conference on Recommender Systems. September 2020. Pages 503–508. https://dl.acm.org/doi/10.1145/3383313.3412211

The video game industry has adopted recommendation systems to boost users interest with a focus on game sales. Other exciting applications within video games are those that help the player make decisions that would maximize their playing experience, which is a desirable feature in real-time strategy video games such as Multiplayer Online Battle Arena (MOBA) like as DotA and LoL. Among these tasks, the recommendation of items is challenging, given both the contextual nature of the game and how it exposes the dependence on the formation of each team. Existing works on this topic do not take advantage of all the available contextual match data and dismiss potentially valuable information. To address this problem we develop TTIR, a contextual recommender model derived from the Transformer neural architecture that suggests a set of items to every team member, based on the contexts of teams and roles that describe the match. TTIR outperforms several approaches and provides interpretable recommendations through visualization of attention weights. Our evaluation indicates that both the Transformer architecture and the contextual information are essential to get the best results for this item recommendation task. Furthermore, a preliminary user survey indicates the usefulness of attention weights for explaining recommendations as well as ideas for future work. The code and dataset are available at https://github.com/ojedaf/IC-TIR-Lol .

Algorithmic and HCI Aspects for Explaining Recommendations of Artistic Images

Vicente Dominguez, Ivania Donoso-Guzmán, Pablo Messina, Denis Parra (2020). In: ACM Transactions on Interactive Intelligent Systems. Volume 10. Issue 4. December 2020. Article No.: 30, pp 1–31. https://doi.org/10.1145/3369396

Sorry, this entry is only available in European Spanish.

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Pablo Pino, Denis Parra, Pablo Messina, Cecilia Besa, Sergio Uribe (2020). In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Sorry, this entry is only available in European Spanish.

Adversarial Evaluation of BERT for Biomedical Named Entity Recognition

Vladimir Araujo, Andrés Carvallo, Denis Parra (2020). In: Proceedings of the The Fourth Widening Natural Language Processing Workshop. http://www.winlp.org/wp-content/uploads/2020/final_papers/34_Paper.pdf

Sorry, this entry is only available in European Spanish.

Automatic document screening of medical literature using word and text embeddings in an active learning setting

Andrés Carvallo, Denis Parra, Hans Lobel, Álvaro Soto (2020). In: Scientometrics volume 125, pages 3047–3084 (2020). https://link.springer.com/article/10.1007/s11192-020-03648-6

Sorry, this entry is only available in European Spanish.

Social QA in non-CQA platforms

José Herrera, Denis Parra, Barbara Poblete (2020). In: Elsevier Future Generation Computer Systems. Volume 105, April 2020, Pages 631-649. Sience Direct. https://www.sciencedirect.com/science/article/abs/pii/S0167739X19308180?via%3Dihub

Sorry, this entry is only available in European Spanish.

Probabilistic automata of bounded ambiguity

Nathanaël Fijalkow, Cristián Riveros, James Worrell (2020). In: Information and Computation. Volume 282, January 2022, 104648. https://doi.org/10.1016/j.ic.2020.104648

Sorry, this entry is only available in European Spanish.

Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing

Grez, Alejandro ; Riveros, Cristian (2020). In: 23rd International Conference on Database Theory (ICDT 2020). https://drops.dagstuhl.de/opus/volltexte/2020/11938/

Sorry, this entry is only available in European Spanish.

A Family of Centrality Measures for Graph Data Based on Subgraphs

Cristian Riveros, Jorge Salas (2020). In: 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs). https://drops.dagstuhl.de/opus/volltexte/2020/11947/

Sorry, this entry is only available in European Spanish.

Ranked enumeration of MSO logic on words

Pierre Bourhis, Alejandro Grez, Louis Jachiet, Cristian Riveros (2020). In: arxiv:2010.08042. https://arxiv.org/abs/2010.08042

Sorry, this entry is only available in European Spanish.

Constant-delay enumeration algorithms for document spanners over nested documents

Martín Muñoz, Cristian Riveros (2020). In: arxiv:2010.06037. https://arxiv.org/abs/2010.06037

Sorry, this entry is only available in European Spanish.

Efficient Enumeration Algorithms for Regular Document Spanners

Fernando Florenzano , Cristian Riveros, Martín Ugarte, Stijn Vansummeren, Domagoj Vrgoč (2020). In: ACM Transactions on Database Systems. Volume 45. Issue 1. March 2020. Article No.: 3pp 1–42. https://doi.org/10.1145/3351451

Sorry, this entry is only available in European Spanish.

Kevin LaGrandeur, James J. Hughes (eds) (2017) Surviving the Machine Age. Intelligent Technology and the Transformation of Human Work. Cham: Palgrave Macmillan. 166 pages. ISBN: 978-3-319-84584-5

Claudio Gutiérrez (2020). In: Science & Technology Studies. Special Issue: Expertise and its tensions. Vol. 33 No. 2 (2020). https://sciencetechnologystudies.journal.fi/article/view/89243

Sorry, this entry is only available in European Spanish.

Differential Privacy and SPARQL

Carlos Buil Aranda, Jorge Lobo, Federico Olmedo (2020). In: Semantic Web Journal. http://www.semantic-web-journal.net/content/differential-privacy-and-sparql

Sorry, this entry is only available in European Spanish.

Efficient GPU thread mapping on embedded 2D fractals

Cristobál A.Navarro, Felipe A. Quezada, Nancy Hitschfeld, Raimundo Vega, Benjamin Bustos (2020). In: Future Generation Computer Systems. Volume 113, December 2020, Pages 158-169. https://doi.org/10.1016/j.future.2020.07.006

Sorry, this entry is only available in European Spanish.

Extending SPARQL with Similarity Joins

Benjamín Bustos, Sebastián Ferrada, Aidan Hogan (2020). In: Book cover International Semantic Web Conference. ISWC 2020: The Semantic Web – ISWC 2020 pp 201–217. Lecture Notes in Computer Science book series (LNISA,volume 12506). https://link.springer.com/chapter/10.1007/978-3-030-62419-4_12

Sorry, this entry is only available in European Spanish.

An efficient algorithm for approximated self-similarity joins in metric spaces.

Sebastián Ferrada, Benjamin Bustos, NoraReyes (2020). In: Information Systems. Volume 91, July 2020, 101510. https://doi.org/10.1016/j.is.2020.101510

Sorry, this entry is only available in European Spanish.

A sketch-aided retrieval approach for incomplete 3D objects

Stefan Lengauer, Alexander Komar, Stephan Karl, Elisabeth Trinkl, Reinhold Preiner, Benjamin Bustos, Tobias Schreck (2020). In: Computers & Graphics. Volume 87, April 2020, Pages 111-122. https://doi.org/10.1016/j.cag.2020.02.001

Sorry, this entry is only available in European Spanish.

3D Shape Matching for Retrieval and Recognition

Benjamin Bustos, Ivan Sipiran (2020). Liu, Y., Pears, N., Rosin, P.L., Huber, P. (eds) 3D Imaging, Analysis and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-44070-1_9

Sorry, this entry is only available in European Spanish.

Mining Social Networks to Learn about Rumors, Hate Speech, Bias and Polarization – Abstract

Bárbara Poblete (2020). In: Proceedings of the Workshop on Online Misinformation- and Harm-Aware Recommender Systems co-located with 14th ACM Conference on Recommender Systems (RecSys 2020). http://ceur-ws.org/Vol-2758/OHARS-invited1.pdf

Sorry, this entry is only available in European Spanish.

On the Expressiveness of Languages for Complex Event Recognition

Grez, Alejandro; Riveros, Cristian ; Ugarte, Martín ; Vansummeren, Stijn (2020). In: 23rd International Conference on Database Theory (ICDT 2020). https://drops.dagstuhl.de/opus/volltexte/2020/11939/

Sorry, this entry is only available in European Spanish.

The monitoring problem for timed automata

Alejandro Grez, Filip Mazowiecki, Michał Pilipczuk, Gabriele Puppis, Cristian Riveros (2020). In: Arxiv, Cornell University. https://arxiv.org/abs/2002.07049

Sorry, this entry is only available in European Spanish.

Pumping lemmas for weighted automata

Agnishom Chattopadhyay, Filip Mazowiecki, Anca Muscholl, Cristian Riveros (2020). In: arxiv, Cornell University. https://arxiv.org/abs/2001.06272

Sorry, this entry is only available in European Spanish.

Expressive power of linear algebra query languages

Floris Geerts, Thomas Muñoz, Cristian Riveros, Domagoj Vrgoč (2020). In: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Arxiv, Cornell University. https://arxiv.org/abs/2010.13717

Sorry, this entry is only available in European Spanish.

Knowledge Graphs: A Tutorial on the History of Knowledge Graph’s Main Ideas

Claudio Gutierrez, Juan F. Sequeda (2020). In: CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020 Pages 3509–3510. https://doi.org/10.1145/3340531.3412176

Knowledge Graphs can be considered as fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. Stemming from scientific advancements in research areas of Semantic Web, Databases, Knowledge representation, NLP, Machine Learning, among others, Knowledge Graphs have rapidly gained popularity in academia and industry in the past years. The integration of such disparate disciplines and techniques give the richness to Knowledge Graphs, but also present the challenge to practitioners and theoreticians to know how current advances develop from early techniques in order, on one hand, take full advantage of them, and on the other, avoid reinventing the wheel. This tutorial will provide a historical context on the roots of Knowledge Graphs grounded in the advancements of Logic, Data and the combination thereof.

A User Interface for Exploring and Querying Knowledge Graphs (Extended Abstract)

Hernán Vargas, Carlos Buil-Aranda, Aidan Hogan, Claudia López (2020). In: Twenty-Ninth International Joint Conference on Artificial Intelligence Sister Conferences Best Papers. Pages 4785-4789. https://doi.org/10.24963/ijcai.2020/666

As the adoption of knowledge graphs grows, more and more non-experts users need to be able to explore and query such graphs. These users are not typically familiar with graph query languages such as SPARQL, and may not be familiar with the knowledge graph’s structure. In this extended abstract, we provide a summary of our work on a language and visual interface — called RDF Explorer — that help non-expert users to navigate and query knowledge graphs. A usability study over Wikidata shows that users successfully complete more tasks with RDF Explorer than with the existing Wikidata Query Helper interface.

Postadmixture Selection on Chileans Targets Haplotype Involved in Pigmentation, Thermogenesis and Immune Defense against Pathogens

Lucas Vicuña, Olga Klimenkova, Tomás Norambuena, Felipe I Martinez, Mario I Fernandez, Vladimir Shchur, Susana Eyheramendy (2020). In: Genome Biology and Evolution, Volume 12, Issue 8, August 2020, Pages 1459–1470. https://doi.org/10.1093/gbe/evaa136

Detection of positive selection signatures in populations around the world is helping to uncover recent human evolutionary history as well as the genetic basis of diseases. Most human evolutionary genomic studies have been performed in European, African, and Asian populations. However, populations with Native American ancestry have been largely underrepresented. Here, we used a genome-wide local ancestry enrichment approach complemented with neutral simulations to identify postadmixture adaptations underwent by admixed Chileans through gene flow from Europeans into local Native Americans. The top significant hits (P = 2.4×10⁻⁷) are variants in a region on chromosome 12 comprising multiple regulatory elements. This region includes rs12821256, which regulates the expression of KITLG, a well-known gene involved in lighter hair and skin pigmentation in Europeans as well as in thermogenesis. Another variant from that region is associated with the long noncoding RNA RP11-13A1.1, which has been specifically involved in the innate immune response against infectious pathogens. Our results suggest that these genes were relevant for adaptation in Chileans following the Columbian exchange.

Alert Classification for the ALeRCE Broker System: The Light Curve Classifier

P. Sánchez-Sáez, I. Reyes, C. Valenzuela, F. Förster, S. Eyheramendy, F. Elorrieta, F. E. Bauer, G. Cabrera-Vives, P. A. Estévez, M. Catelan, G. Pignata, P. Huijse, D. De Cicco, P. Arévalo, R. Carrasco-Davis, J. Abril, R. Kurtev, J. Borissova, J. Arredondo, E. Castillo-Navarrete, D. Rodriguez, D. Ruz-Mieres, A. Moya, L. Sabatini-Gacitúa, C. Sepúlveda-Cobo, E. Camacho-Iñiguez (2020). In: arxiv:2008.03311. https://arxiv.org/abs/2008.03311

We present the first version of the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream, and colors obtained from AllWISE and ZTF photometry. We apply a Balanced Random Forest algorithm with a two-level scheme, where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolves each of these hierarchical classes, amongst 15 total classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including core- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and {\em Gaia} DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with ≥6 g-band or ≥6 r-band detections in ZTF (868,371 sources as of 2020/06/09), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively.

SHREC 2020: Retrieval of digital surfaces with similar geometric reliefs

EliaMoscoso Thompson, SilviaBiasotti, Andrea Giachetti, ClaudioTortorici, Naoufel Werghik, Ahmad Shaker Obeidk, Stefano Berretti, Hoang-Phuc Nguyen-Dinh, Minh-Quan Le, Hai-Dang Nguyen, Minh-Triet Tran, Leonardo Giglid, Santiago Velasco Forero, Beatriz Marcotegui, Ivan Sipirane, Benjamin Bustos, Loannis Romanelis, Vlassis Fotis, Ramamoorthy Luxman (2020). In: Computers & Graphics Volume 91, October 2020, Pages 199-218. https://www.sciencedirect.com/science/article/abs/pii/S0097849320301138?via%3Dihub

This paper presents the methods that have participated in the SHREC’20 contest on retrieval of surface patches with similar geometric reliefs and the analysis of their performance over the benchmark created for this challenge. The goal of the context is to verify the possibility of retrieving 3D models only based on the reliefs that are present on their surface and to compare methods that are suitable for this task. This problem is related to many real world applications, such as the classification of cultural heritage goods or the analysis of different materials. To address this challenge, it is necessary to characterize the local ”geometric pattern” information, possibly forgetting model size and bending. Seven groups participated in this contest and twenty runs were submitted for evaluation. The performances of the methods reveal that good results are achieved with a number of techniques that use different approaches.

Attentive Visual Semantic Specialized Network for Video Captioning

Jesus Perez-Martin, Benjamin Bustos, Jorge Pérez (2020). In: Millenium Institute for Foundational Research on Data (IMFD), Chile Department of Computer Science (DCC), University of Chile. https://users.dcc.uchile.cl/~jeperez/media/2020/icpr_2020_AVSSN.pdf

Semantic Search of Memes on Twitter

Jesus Perez-Martin, Benjamin Bustos, Magdalena Saldana (2020). In: Computational Methods Interest Group of the 70th International Communication Association Conference, May 2020 Virtual conference presentation. https://arxiv.org/abs/2002.01462

Memes are becoming a useful source of data for analyzing behavior on social media. However, a problem to tackle is how to correctly identify a meme. As the number of memes published every day on social media is huge, there is a need for automatic methods for classifying and searching in large meme datasets. This paper proposes and compares several methods for automatically classifying images as memes. Also, we propose a method that allows us to implement a system for retrieving memes from a dataset using a textual query. We experimentally evaluate the methods using a large dataset of memes collected from Twitter users in Chile, which was annotated by a group of experts. Though some of the evaluated methods are effective, there is still room for improvement.

A Survey on Frameworks Used for Robustness Analysis on Interdependent Networks

Ivana Bachmann, Javier Bustos-Jiménez, Benjamin Bustos (2020). In: Complexity, vol. 2020, Article ID 2363514, 17 pages, 2020. https://doi.org/10.1155/2020/2363514

The analysis of network robustness tackles the problem of studying how a complex network behaves under adverse scenarios, such as failures or attacks. In particular, the analysis of interdependent networks’ robustness focuses on the specific case of the robustness of interacting networks and their emerging behaviors. This survey systematically reviews literature of frameworks that analyze the robustness of interdependent networks published between 2005 and 2017. This review shows that there exists a broad range of interdependent network models, robustness metrics, and studies that can be used to understand the behaviour of different systems under failure or attack. Regarding models, we found that there is a focus on systems where a node in one layer interacts with exactly one node at another layer. In studies, we observed a focus on the network percolation. While among the metrics, we observed a focus on measures that count network elements. Finally, for the networks used to test the frameworks, we found that the focus was on synthetic models, rather than analysis of real network systems. This review suggests opportunities in network research, such as the study of robustness on interdependent networks with multiple interactions and/or spatially embedded networks, and the use of interdependent network models in realistic network scenarios.

A Multi-resolution Approximation for Time Series

Heider Sanchez, Benjamin Bustos (2020). In: Neural Processing Letters. Springer. https://link.springer.com/article/10.1007/s11063-018-9929-y

Sorry, this entry is only available in European Spanish.

“IMFD IMPRESEE at TRECVID 2020: Description Generation by Visual-Syntactic Embedding”

Jesus Perez-Martin, Benjamin Bustos, Jorge Pérez, Juan Manuel Barrios. (2020). In: Millennium Institute for Foundational Research on Data. Impresee Inc., CA, USA. https://www-nlpir.nist.gov/projects/tvpubs/tv20.papers/imfd_impresee.pdf

Sorry, this entry is only available in European Spanish.

2019

Socialized for News Media Use: How Family Communication, Information-Processing Needs, and Gratifications Determine Adolescents’ Exposure to News

Valenzuela S, Bachmann I, Aguilar M. Socialized for News Media Use: How Family Communication, Information-Processing Needs, and Gratifications Determine Adolescents’ Exposure to News. Communication Research. 2019;46(8):1095-1118. doi:10.1177/0093650215623833

Adolescence is a key period in the development of individuals’ news habits, but little is known about the processes involved in the process of news media socialization. This study proposes an integrated model in which the influence of family communication on motivations and behaviors of adolescents in relation to news consumption occurs through the development of personality traits related to information processing (namely, need for cognition and need to evaluate). Structural equation modeling of data from a representative survey of 2,273 adolescents, aged 13 to 17, provide support for the theorized model, such that concept-oriented communication within families is associated to news exposure indirectly, via personality traits and motivations. Thus, the study provides an initial assessment of one way children are socialized to become news enthusiasts and news avoiders. It also provides empirical evidence that information-processing traits are influenced by family communication patterns, confirming what hitherto was theoretical speculation.

https://doi.org/10.1177/0093650215623833

Improved Compressed String Dictionaries

Nieves R. Brisaboa, Ana Cerdeira-Pena, Guillermo de Bernardo, and Gonzalo Navarro. 2019. Improved Compressed String Dictionaries. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19). Association for Computing Machinery, New York, NY, USA, 29–38. DOI: https://doi.org/10.1145/3357384.3357972

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.

LINK: https://doi.org/10.1145/3357384.3357972

Motif-driven Retrieval of Greek Painted Pottery

Lengauer, S., Komar, A., Labrada, A., Karl, S., Trinkl, E., Preiner, R., ... Schreck, T. (2019). Motif-driven Retrieval of Greek Painted Pottery. In S. Rizvic, & K. Rodriguez Echavarria (Eds.), Eurographics Workshop on Graphics and Cultural Heritage (pp. 91-98). Eurographics - European Association for Computer Graphics. https://doi.org/10.2312/gch.20191354

The analysis of painted pottery is instrumental for understanding ancient Greek society and human behavior of past cultures in Archaeology. A key part of this analysis is the discovery of cross references to establish links and correspondences. However, due to the vast amount of documented images and 3D scans of pottery objects in today’s domain repositories, manual search is very time consuming. Computer aided retrieval methods are of increasing importance. Mostly, current retrieval systems for this kind of cultural heritage data only allow to search for pottery of similar vessel’s shape. However, in many cases important similarity cues are given by motifs painted on these vessels. We present an interactive retrieval system that makes use of this information to allow for a motif-driven search in cultural heritage repositories. We address the problem of unsupervised motif extraction for preprocessing and the shape-based similarity search for Greek painted pottery. Our experimental evaluation on relevant repository data demonstrates effectiveness of our approach on examples of different motifs of interests.

https://doi.org/10.2312/gch.20191354

Covering a Set of Points with k Bounding Boxes

C. S. Sepúlveda, A. Rodríguez and D. Seco, "Covering a Set of Points with k Bounding Boxes," 2019 38th International Conference of the Chilean Computer Science Society (SCCC), Concepcion, Chile, 2019, pp. 1-6, DOI: https://doi.org/10.1109/SCCC49216.2019.8966419

Covering a set of points with k orthogonal bounding boxes is useful for implementing Spatio-temporal index structures that are built from a given dataset. In this work, we deal with the problem of covering a set of points with k-parallel axis boxes, under the restriction that the total area enclosed by the boxes must be minimized. To achieve this, we present a novel algorithm that, using dynamic programming techniques, finds the optimal solution for covering a set of points with k-bounding boxes where the total sum of the areas of the boxes is minimum. This is compared with the process of generating k-bounding boxes every l units of distance, achieving an improvement of about 50% of the unuseful area covered

DOI https://doi.org/10.1109/SCCC49216.2019.8966419

Compressed Data Structures for Astronomical Content-Aware Resource Search

M. Araya, D. Arroyuelo, C. Saldías, and M. Solar, "Compressed Data Structures for Astronomical Content-Aware Resource Search," 2019 38th International Conference of the Chilean Computer Science Society (SCCC), Concepcion, Chile, 2019, pp. 1-8, doi: https://doi.org/10.1109/SCCC49216.2019.8966420

We introduce an efficient approach that aims at supporting content-based queries on the Chilean Virtual Observatory. In particular we are interested in retrieving relevant information from virtual-observatory tables. This introduces several challenges that make the information-retrieval process harder. We define an algorithm that uses a compressed data structure to obtain the count of the number of occurrences of a string query within each column of a table. This kind of query has been used in the literature for faceted and semantic search as well as for retrieving information from web tables. This is in order to improve search effectiveness. We show that using only 15%-25% the space of a table our approach contains the table data (and hence the table can be deleted) and is able to answer queries efficiently in a few milliseconds.

DOI https://doi.org/10.1109/SCCC49216.2019.8966420

A Compact Rank/Select Data Structure for the Streaming Model

N. González and D. Arroyuelo, "A Compact Rank/Select Data Structure for the Streaming Model," 2019 38th International Conference of the Chilean Computer Science Society (SCCC), Concepcion, Chile, 2019, pp. 1-7, doi: https://doi.org/10.1109/SCCC49216.2019.8966418

For a sorted set S from a universe [1..u] received under the streaming model (i.e., elements are received one at a time, in sorted order), such that at a given time it contains n elements {x ₁ , . . . , x _n }, and whose characteristic bit vector is C _S = 0(σ ¹ )11···10(σ ₂ )11···1 · · · 0(σ _g )11···1 (i.e., the set elements are actually arranged in g <; n intervals of size ≥ 1), we propose a compact data structure that answers operations select and rank in Θ(lg(g/ lg g)) worst-case time, and append in O(1) amortized time, using 2g lg u-n/g +g lg n/g +o(g lg lg g) bits of space. The structure is suitable in cases where g ≤ n/2.

DOI https://doi.org/10.1109/SCCC49216.2019.8966418

Fine-Grained Evaluation for Entity Linking

Henry Rosales-Méndez, Aidan Hogan, Barbara Poblete. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019. Pages: 718–727. DOI: 10.18653/v1/D19-1066. https://www.aclweb.org/anthology/D19-1066

The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with an unambiguous identifier in a Knowledge Base. While much work has been done on the topic, we first present the results of a survey that reveal a lack of consensus in the community regarding what forms of mentions in a text and what forms of links the EL task should consider. We argue that no one definition of the Entity Linking task fits all, and rather propose a fine-grained categorization of different types of entity mentions and links. We then re-annotate three EL benchmark datasets – ACE2004, KORE50, and VoxEL – with respect to these categories. We propose a fuzzy recall metric to address the lack of consensus and conclude with fine-grained evaluation results comparing a selection of online EL systems.

https://www.aclweb.org/anthology/D19-1066

CompactNets: Compact Hierarchical Compositional Networks for Visual Recognition

Hans Lobel, René Vidal, Alvaro Soto. Computer Vision and Image Understanding. Volume 191, February 2020, 102841. Available online October 28, 2019.

CNN-based models currently provide state-of-the-art performance in image categorization tasks. While these methods are powerful in terms of representational capacity, they are generally not conceived with explicit means to control complexity. This might lead to scenarios where resources are used in a non-optimal manner, increasing the number of unspecialized or repeated neurons, and overfitting to data. In this work we propose CompactNets, a new approach to visual recognition that learns a hierarchy of shared, discriminative, specialized, and compact representations. CompactNets naturally capture the notion of compositional compactness, a characterization of complexity in compositional models, consisting on using the smallest number of patterns to build a suitable visual representation. We employ a structural regularizer with group-sparse terms in the objective function, that induces on each layer, an efficient and effective use of elements from the layer below. In particular, this allows groups of top-level features to be specialized based on category information. We evaluate CompactNets on the ILSVRC12 dataset, obtaining compact representations and competitive performance, using an order of magnitude less parameters than common CNN-based approaches. We show that CompactNets are able to outperform other group-sparse-based approaches, in terms of performance and compactness. Finally, transfer-learning experiments on small-scale datasets demonstrate high generalization power, providing remarkable categorization performance with respect to alternative approaches.

https://doi.org/10.1016/j.cviu.2019.102841

SHACL2SPARQL: Validating a SPARQL Endpoint against Recursive SHACL Constraints

Julien Corman, Fernando Florenzano, Juan L. Reutter, Ognjen Savkovic. Free University of Bozen-Bolzano, Bolzano, Italy; PUC Chile and IMFD Chile. Proceedings of the ISWC 2019 Satellite Tracks, 18th International Semantic Web Conference (ISWC 2019), http://ceur-ws.org/Vol-2456/paper43.pdf

The article presents SHACL2SPARQL, a tool that validates an RDF graph stored as a SPARQL endpoint against possibly recursive SHACL constraints. It is based on the algorithm proposed in [3]. This implementation improves upon the original algorithm with a wider range of natively supported constraint operators, SPARQL query optimization techniques, and a mechanism to explain invalid targets.

LINK: http://ceur-ws.org/Vol-2456/paper43.pdf

Estimating the Dynamics of SPARQL Query Results Using Binary Classification

Alberto Moya Loustaunau and Aidan Hogan. IMFD; DCC, University of Chile. CEUR Workshop Proceedings, Vol. 2496. October 26-30, 2019.

We address the problem of estimating when the results of an input SPARQL query over dynamic RDF datasets will change. We evaluate a framework that extracts features from the query and/or from past versions of the target dataset and inputs them into binary classifiers to predict whether or not the results for a query will change at a fixed point in the near future. For this evaluation, we create a gold standard based on 23 versions of Wikidata and a curated collection of 221 SPARQL queries. Our results show that the quality of predictions possible using (only) features based on the query structure and lightweight statistics of the predicate dynamics – though capable of beating a random baseline – are not competitive with results obtained using (more costly to derive) knowledge of the complete historical changes in the query results.

Link: http://ceur-ws.org/Vol-2496/paper1.pdf

Distributed clustering of text collections

J. Zamora, H. Allende-Cid and M. Mendoza, "Distributed Clustering of Text Collections," in IEEE Access, vol. 7, pp. 155671-155685, 2019, doi: 10.1109/ACCESS.2019.2949455

Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solution. We introduce distributed shared nearest neighbors (D-SNN), a novel clustering algorithm that work with disjoint partitions of data. Our algorithm produces a global clustering solution that achieves a competitive performance regarding centralized approaches. The algorithm works effectively with high dimensional data, being advisable for document clustering tasks. Experimental results over five data sets show that our proposal is competitive in terms of quality performance measures when compared to state of the art methods.

https://doi.org/10.1109/ACCESS.2019.2949455

Applying Self-attention for Stance Classification

Bugueño M., Mendoza M. (2019) Applying Self-attention for Stance Classification. In: Nyström I., Hernández Heredia Y., Milián Núñez V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science, vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_5

Stance classification is the task of automatically identify the user’s positions about a specific topic. The classification of stance may help to understand how people react to a piece of target information, a task that is interesting in different areas as advertising campaigns, brand analytics, and fake news detection, among others. The rise of social media has put into the focus of this task the classification of stance in online social networks. A number of methods have been designed for this purpose showing that this problem is hard and challenging. In this work, we explore how to use self-attention models for stance classification. Instead of using attention mechanisms to learn directly from the text we use self-attention to combine different baselines’ outputs. For a given post, we use the transformer architecture to encode each baseline output exploiting relationships between baselines and posts. Then, the transformer learns how to combine the outputs of these methods reaching a consistently better classification than the ones provided by the baselines. We conclude that self-attention models are helpful to learn from baselines’ outputs in a stance classification task.

https://doi.org/10.1007/978-3-030-33904-3_5

Path Queries on Functions

Gagie, T., He, M., & Navarro, G. (2019). Path queries on functions. Theoretical Computer Science, 770, 34–50. DOI https://doi.org/10.1016/j.tcs.2018.10.021

Let $f : [1 . . n] \to [1 . . n]$ be a function, and $ℓ : [1 . . n] \to [1 . . σ]$ indicate a label assigned to each element of the domain. We design several compact data structures that answer various kinds of summary queries on the labels of paths in f. For example, we can find either the minimum label in $f^{k} (i)$ for a given i and any $k \geq 0$ in a given range $[k_{1} . . k_{2}]$ , or the minimum label in $f^{- k} (i)$ for a given i and $k > 0$ , using $n \lg n + n \lg σ + o (n \lg n)$ bits and time $O (α (n))$ , the inverse Ackermann function. Within similar space we can count, in time $O (\lg n / \lg \lg n)$ , the number of labels within a range, and report each element with such labels in $O (\lg n / \lg \lg n)$ additional time. Several other tradeoffs and possible queries are considered, such as selection, top-r queries and τ-majorities. Finally, we consider queries that allow us navigate on the graph of the function, such as the nearest common successor of two elements, or the nearest successor or predecessor of an element within a range of labels.

DOI https://doi.org/10.1016/j.tcs.2018.10.021

Validating Shacl Constraints over a Sparql Endpoint

Corman J., Florenzano F., Reutter J.L., Savković O. (2019) Validating Shacl Constraints over a Sparql Endpoint. In: Ghidini C. et al. (eds) The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_9

SHACL (Shapes Constraint Language) is a specification for describing and validating RDF graphs that has recently become a W3C recommendation. While the language is gaining traction in the industry, algorithms for SHACL constraint validation are still at an early stage. A first challenge comes from the fact that RDF graphs are often exposed as SPARQL endpoints, and therefore only accessible via queries. Another difficulty is the absence of guidelines about the way recursive constraints should be handled. In this paper, we provide algorithms for validating a graph against a SHACL schema, which can be executed over a SPARQL endpoint. We first investigate the possibility of validating a graph through a single query for non-recursive constraints. Then for the recursive case, since the problem has been shown to be NP-hard, we propose a strategy that consists in evaluating a small number of SPARQL queries over the endpoint, and using the answers to build a set of propositional formulas that are passed to a SAT solver. Finally, we show that the process can be optimized when dealing with recursive but tractable fragments of SHACL, without the need for an external solver. We also present a proof-of-concept evaluation of this last approach.

LINK: https://doi.org/10.1007/978-3-030-30793-6_9

BTC-2019: The 2019 Billion Triple Challenge Dataset

Herrera JM., Hogan A., Käfer T. (2019) BTC-2019: The 2019 Billion Triple Challenge Dataset. In: Ghidini C. et al. (eds) The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_11

Six datasets have been published under the title of Billion Triple Challenge (BTC) since 2008. Each such dataset contains billions of triples extracted from millions of documents crawed from hundreds of domains. While these datasets were originally motivated by the annual ISWC competition from which they take their name, they would become widely used in other contexts, forming a key resource for a variety of research works concerned with managing and/or analysing diverse, real-world RDF data as found natively on the Web. Given that the last BTC dataset was published in 2014, we prepare and publish a new version – BTC-2019 – containing 2.2 billion quads parsed from 2.6 million documents on 394 pay-level-domains. This paper first motivates the BTC datasets with a survey of research works using these datasets. Next we provide details of how the BTC-2019 crawl was configured. We then present and discuss a variety of statistics that aim to gain insights into the content of BTC-2019. We discuss the hosting of the dataset and the ways in which it can be accessed, remixed and used.

LINK: https://doi.org/10.1007/978-3-030-30796-7_11

RESOURCE DOI: https://doi.org/10.5281/zenodo.2634588

RDF Explorer: A Visual SPARQL Query Builder

Vargas H., Buil-Aranda C., Hogan A., López C. (2019) RDF Explorer: A Visual SPARQL Query Builder. In: Ghidini C. et al. (eds) The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11778. Springer, Cham. DOI https://doi.org/10.1007/978-3-030-30793-6_37

Despite the growing popularity of knowledge graphs for managing diverse data at large scale, users who wish to pose expressive queries against such graphs are often expected to know (i) how to formulate queries in a language such as SPARQL, and (ii) how entities of interest are described in the graph. In this paper we propose a language that relaxes these expectations; the language’s operators are based on an interactive graph-based exploration that allows non-expert users to simultaneously navigate and query knowledge graphs; we compare the expressivity of this language with SPARQL. We then discuss an implementation of this language that we call RDF Explorer and discuss various desirable properties it has, such as avoiding interactions that lead to empty results. Through a user study over the Wikidata knowledge-graph, we show that users successfully complete more tasks with RDF Explorer than with the existing Wikidata Query Helper, while a usability questionnaire demonstrates that users generally prefer our tool and self-report lower levels of frustration and mental effort.

DOI https://doi.org/10.1007/978-3-030-30793-6_37

A Worst-Case Optimal Join Algorithm for SPARQL

Hogan A., Riveros C., Rojas C., Soto A. (2019) A Worst-Case Optimal Join Algorithm for SPARQL. In: Ghidini C. et al. (eds) The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11778. Springer, Cham. DOI https://doi.org/10.1007/978-3-030-30793-6_15

Worst-case optimal multiway join algorithms have recently gained a lot of attention in the database literature. These algorithms not only offer strong theoretical guarantees of efficiency but have also been empirically demonstrated to significantly improve query runtimes for relational and graph databases. Despite these promising theoretical and practical results, however, the Semantic Web community has yet to adopt such techniques; to the best of our knowledge, no native RDF database currently supports such join algorithms, wherein this paper we demonstrate that this should change. We propose a novel procedure for evaluating SPARQL queries based on an existing worst-case join algorithm called Leapfrog Triejoin. We propose an adaptation of this algorithm for evaluating SPARQL queries and implement it in Apache Jena. We then present experiments over the Berlin and WatDiv SPARQL benchmarks, and a novel benchmark that we propose based on Wikidata that is designed to provide insights into join performance for a more diverse set of basic graph patterns. Our results show that with this new join algorithm, Apache Jena often runs orders of magnitude faster than the base version and two other SPARQL engines: Virtuoso and Blazegraph.

DOI https://doi.org/10.1007/978-3-030-30793-6_15

A Call to Contextualize Public Opinion-Based Research in Political Communication

Hernando Rojas & Sebastián Valenzuela (2019) A Call to Contextualize Public Opinion-Based Research in Political Communication, Political Communication, 36:4, 652-659, DOI: 10.1080/10584609.2019.1670897

For those of us who regularly conduct public opinion research outside of the United States and Europe, it is customary to have to explain whether our findings are “real,” that is, generalizable relationships that advance theory, or some kind of contextual artifact. Infamous Reviewer 2 will ask for an explanation of how context might be affecting the relationships that we are describing, and while it might be irritating to do so, in this case, Reviewer 2 is right. The issue of course is not having to explain how contexts matter, but instead why scholarship examining the US, or certain western countries, is not consistently subject to the same task. In this piece, we advocate for contextualizing public opinion research in all cases, because of course relationships among variables are always context/historic dependent. Rather than being a theoretical shortcoming, we argue that this becomes a theoretical strength: being able to identify the conditions under which our proposed relationships hold and those in which they do not. So, for example, rather than just saying that news consumption is positively related with political participation, as scholars including ourselves have been doing for years, we need to make explicit the news construction conditions under which this is the case, the participatory repertoire being considered, and normative implications of our claims. To engage contextualization, cross-national, cross cultural, cross group and historical comparisons are particularly useful. To build our case for the increasing need for contextualization in political communication research, we will first examine some early comparative research, then we will show some current problematic comparisons, and finally will end with some concluding remarks of the challenges for the field that lie ahead and the benefits of a contextual approach.

https://doi.org/10.1080/10584609.2019.1670897

Implementing the Topological Model Succinctly

Fuentes-Sepúlveda J., Navarro G., Seco D. (2019) Implementing the Topological Model Succinctly. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham. DOI https://doi.org/10.1007/978-3-030-32686-9_35

We show that the topological model, a semantically rich standard to represent GIS data, can be encoded succinctly while efficiently answering a number of topology-related queries. We build on recent succinct planar graph representations so as to encode a model with m edges within $4 m + o (m)$ bits and answer various queries relating nodes, edges, and faces in $o (\log \log m)$ time, or any time in $ω (\log m)$ for a few complex ones.

DOI https://doi.org/10.1007/978-3-030-32686-9_35

A Practical Alphabet-Partitioning Rank/Select Data Structure

Arroyuelo D., Sepúlveda E. (2019) A Practical Alphabet-Partitioning Rank/Select Data Structure. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_32

This paper proposes a practical implementation of an alphabet-partitioning compressed data structure, which represents a string within compressed space and supports the fundamental operations $r a n k$ and $s e l e c t$ efficiently. We show experimental results that indicate that our implementation outperforms the current realizations of the alphabet-partitioning approach (which is one of the most efficient approaches in practice). In particular, the time for operation $s e l e c t$ can be reduced by about 80%, using only 11% more space than current alphabet-partitioning schemes. We also show the impact of our data structure on several applications, like the intersection of inverted lists (where improvements of up to 60% are achieved, using only 2% of extra space), and the distributed-computation processing of $r a n k$ and $s e l e c t$ operations. As far as we know, this is the first study about the support of $r a n k$ / $s e l e c t$ operations on a distributed-computing environme03

DOI https://doi.org/10.1007/978-3-030-32686-9_32

Adaptive Succinctness

Arroyuelo D., Raman R. (2019) Adaptive Succinctness. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham https://doi.org/10.1007/978-3-030-32686-9_33

Although there is an information-theoretic lower bound of $B (n, u) = \lg (\binom{u}{n})$ bits on the space needed to represent S, this applies to worst-case (random) sets S, and sets found in practical applications are compressible. We focus on the case where elements of S contain non-trivial runs of consecutive elements, one that occurs in many practical situations.

Let $C_{n}$ denote the class of $(\binom{u}{n})$ distinct sets of $n$ elements over the universe $[1 . . u]$ . Let also $C_{g}^{n} \subset C_{n}$ contain the sets whose $n$ elements are arranged in $g \leq n$ runs of $ℓ_{i} \geq 1$ consecutive element from U for $i = 1, \dots, g$ , and let $C_{g, r}^{n} \subset C_{g}^{n}$ contain all sets that consist of g runs, such that $r \leq g$ of them have at least 2 elements.

We introduce new compressibility measures for sets, including:
- $L_{1} = \lg | C_{g}^{n} | = \lg (\binom{u - n + 1}{g}) + \lg (\binom{n - 1}{g - 1})$ and
- $L_{2} = \lg | C_{g, r}^{n} | = \lg (\binom{u - n + 1}{g}) + \lg (\binom{n - g - 1}{r - 1}) + \lg (\binom{g}{r})$
We show that $L_{2} \leq L_{1} \leq B (n, u)$ .
We give data structures that use space close to bounds $L_{1}$ and $L_{2}$ and support $r a n k$ and $s e l e c t$ in O(1) time.
We provide additional measures involving entropy-coding run lengths and gaps between items, data structures to support these measures, and show experimentally that these approaches are promising for real-world datasets.

DOI https://doi.org/10.1007/978-3-030-32686-9_33

Faster Dynamic Compressed d-ary Relations

Arroyuelo D., de Bernardo G., Gagie T., Navarro G. (2019) Faster Dynamic Compressed d-ary Relations. In: Brisaboa N., Puglisi S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science, vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_30

The $k^{2}$ -tree is a successful compact representation of binary relations that exhibit sparseness and/or clustering properties. It can be extended to d dimensions, where it is called a $k^{d}$ -tree. The representation boils down to a long bitvector. We show that interpreting the $k^{d}$ -tree as a dynamic trie on the Morton codes of the points, instead of as a dynamic representation of the bitvector as done in previous work, yields operation times that are below the lower bound of dynamic bitvectors and offers improved time performance in practice.

LINK: https://doi.org/10.1007/978-3-030-32686-9_30

Populist Multiculturalism in the Andes: Balancing Political Control and Societal Autonomy

Alberti, Carla. Comparative Politics, Volume 52, Number 1, October 2019, pp. 43-63(21). City University of New York.

Radical populists in the Andes have combined a populist program and a multicultural agenda. However, while populism centralizes power in the hands of the leader and emphasizes the unity of the people, multiculturalism grants cultural rights that strengthen societal autonomy, generating an inherent tension between these two modes of incorporation. How are populist governments able to combine unity and fragmentation as well as centralization and autonomy? This article develops the concept of populist multiculturalism, focusing on the Movimiento al Socialismo (MAS) in Bolivia, which has supported autonomy rights while simultaneously curtailing their implementation. Specifically, it examines the implementation of indigenous autonomous governments and prior consultation and the relationship between indigenous organizations and the ruling party. The article also extends this concept to Ecuador and Venezuela.

DOI https://doi.org/10.5129/001041519X15638217741734

The semantic network of the Spanish dictionary during the last century: Structural stability and resilience

Camilo Garrido; Claudio Gutierrez; Guillermo Soto. University of Chile. Proceedings of Electronic Lexicography in the 21st Century Conference. October 2019, 831-848. https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_47.pdf

The semantic network of a dictionary is a mathematical structure that represents relationships among words of a language. In this work, we study the evolution of the semantic network of the Spanish dictionary during the last century, beginning in 1925 until 2014. We analysed the permanence and changes of its structural properties, such as size of components, average shortest path length, and degree distribution. We found that global structural properties of the Spanish dictionary network are remarkably stable. In fact, if we remove all the labels from the network, networks from different editions of the Spanish dictionary are practically indistinguishable. On the other hand, local properties change over the years offering insights about the evolution of lexicon. For instance, the neighbourhood of a single word or the shared neighbourhood between a pair of words. This paper presents preliminary evidence that dictionary networks are an interesting language tool and good proxies to study semantic clouds of words and their evolution in a given language.

LINK: https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_47.pdf

How Party Activism Survives. Uruguay´s Frente Amplio

Pérez Bentancur, V., Piñeiro Rodríguez, R., & Rosenblatt, F. (2019). How Party Activism Survives: Uruguay's Frente Amplio. Cambridge: Cambridge University Press. DOI https://doi.org/10.1017/9781108750851

Political parties with activists are in decline due to various external shocks. Societal changes, like the emergence of new technologies of communication, have diminished the role and number of activists, while party elites increasingly can make do without grassroots activists. However, recent scholarship concerning different democracies has shown how activism still matters for representation. This book contributes to this literature by analyzing the unique case of the Uruguayan Frente Amplio (FA), the only mass-organic, institutionalized leftist party in Latin America. Using thick description, systematic process tracing, and survey research, this case study highlights the value of an organization-centered approach for understanding parties’ role in democracy. Within the FA, organizational rules grant activists a significant voice, which imbues activists’ participation with a strong sense of efficacy. This book is an excellent resource for scholars and students of Latin America and comparative politics who are interested in political parties and the challenges confronting new democracies.

DOI https://doi.org/10.1017/9781108750851

Arabic sentiment analysis: Studies, resources, and tools

Guellil, I., Azouaou, F. & Mendoza, M. Arabic sentiment analysis: studies, resources, and tools. Soc. Netw. Anal. Min. 9, 56 (2019). https://doi.org/10.1007/s13278-019-0602-x

To determine whether a document or a sentence expresses a positive or negative sentiment, three main approaches are commonly used: the lexicon-based approach, corpus-based approach, and a hybrid approach. The study of sentiment analysis in English has the highest number of sentiment analysis studies, while research is more limited for other languages, including Arabic and its dialects. Lexicon based approaches need annotated sentiment lexicons (containing the valence and intensity of its terms and expressions). Corpus-based sentiment analysis requires annotated sentences. One of the significant problems related to the treatment of Arabic and its dialects is the lack of these resources. We present in this survey the most recent resources and advances that have been done for Arabic sentiment analysis. This survey presents recent work (where the majority of these works are between 2015 and 2019). These works are classified by category (survey work or contribution work). For contribution work, we focus on the construction of sentiment lexicon and corpus. We also describe emergent trends related to Arabic sentiment analysis, principally associated with the use of deep learning techniques.

https://doi.org/10.1007/s13278-019-0602-x

A Simple Data Structure for Optimal Two-Sided 2D Orthogonal Range Queries

Grez A., Calí A., Ugarte M. (2019) A Simple Data Structure for Optimal Two-Sided 2D Orthogonal Range Queries. In: Cuzzocrea A., Greco S., Larsen H., Saccà D., Andreasen T., Christiansen H. (eds) Flexible Query Answering Systems. FQAS 2019. Lecture Notes in Computer Science, vol 11529. Springer, Cham. https://doi.org/10.1007/978-3-030-27629-4_7

Given an arbitrary set A of two-dimensional points over a totally-ordered domain, a two-sided planar range query consists on finding all points of A within an arbitrary quadrant. In this paper we present a novel data structure that uses linear space in |A| while allowing for two-dimensional orthogonal range queries with logarithmic pre-processing and constant-delay enumeration.

https://doi.org/10.1007/978-3-030-27629-4_7

Data mining for item recommendation in MOBA games

Vladimir Araujo, Felipe Rios, and Denis Parra. 2019. Data mining for item recommendation in MOBA games. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). Association for Computing Machinery, New York, NY, USA, 393–397. DOI: https://doi.org/10.1145/3298689.3346986

E-Sports has been positioned as an important activity within MOBA (Multiplayer Online Battle Arena) games in recent years. There is existing research on recommender systems in this topic, but most of it focuses on the character recommendation problem. However, the recommendation of items is also challenging because of its contextual nature, depending on the other characters. We have developed a framework that suggests items for a character based on the match context. The system aims to help players who have recently started the game as well as frequent players to take strategic advantage during a match and to improve their purchasing decision making. By analyzing a dataset of ranked matches through data mining techniques, we can capture purchase dynamic of experienced players to use it to generate recommendations. The results show that our proposed solution yields up to 80% of mAP, suggesting that the method leverages context information successfully. These results, together with open issues we mention in the paper, call for further research in the area.

https://doi.org/10.1145/3298689.3346986

Clustering Approaches for Top-k Recommender Systems

Nicolás Torres and Marcelo Mendoza. International Journal on Artificial Intelligence Tools. Vol. 28, No. 05, 1950019 (2019) https://doi.org/10.1142/S0218213019500192

Clustering-based recommender systems bound the seek of similar users within small user clusters providing fast recommendations in large-scale datasets. Then groups can naturally be distributed into different data partitions scaling up in the number of users the recommender system can handle. Unfortunately, while the number of users and items included in a cluster solution increases, the performance in terms of precision of a clustering-based recommender system decreases. We present a novel approach that introduces a cluster-based distance function used for neighborhood computation. In our approach, clusters generated from the training data provide the basis for neighborhood selection. Then, to expand the search of relevant users, we use a novel measure that can exploit the global cluster structure to infer cluster-outside user’s distances. Empirical studies on five widely known benchmark datasets show that our proposal is very competitive in terms of precision, recall, and NDCG. However, the strongest point of our method relies on scalability, reaching speedups of 20× in a sequential computing evaluation framework and up to 100× in a parallel architecture. These results show that an efficient implementation of our cluster-based CF method can handle very large datasets providing also good results in terms of precision, avoiding the high computational costs involved in the application of more sophisticated techniques.

https://doi.org/10.1142/S0218213019500192

WekaDeeplearning4j: A deep learning package for Weka based on Deeplearning4j

Steven Lang, Felipe Bravo-Marquez, Christopher Beckham, Mark Hall, Eibe Frank. Knowledge-Based Systems. Volume 178, 15 August 2019, Pages 48-50. https://doi.org/10.1016/j.knosys.2019.04.013

Deep learning is a branch of machine learning that generates multi-layered representations of data, commonly using artificial neural networks, and has improved the state-of-the-art in various machine learning tasks (e.g., image classification, object detection, speech recognition, and document classification). However, most popular deep learning frameworks such as TensorFlow and PyTorch require users to write code to apply deep learning. We present WekaDeeplearning4j, a Weka package that makes deep learning accessible through a graphical user interface (GUI). The package uses Deeplearning4j as its backend, provides GPU support, and enables GUI-based training of deep neural networks such as convolutional and recurrent neural networks. It also provides pre-processing functionality for image and text data.

https://doi.org/10.1016/j.knosys.2019.04.013

Adaptation to extreme environments in an admixed human population from the Atacama Desert

Lucas Vicuña, Mario I Fernandez, Cecilia Vial, Patricio Valdebenito, Eduardo Chaparro, Karena Espinoza, Annemarie Ziegler, Alberto Bustamante, Susana Eyheramendy, Adaptation to Extreme Environments in an Admixed Human Population from the Atacama Desert, Genome Biology and Evolution, Volume 11, Issue 9, September 2019, Pages 2468–2479, https://doi.org/10.1093/gbe/evz172

Inorganic arsenic (As) is a toxic xenobiotic and carcinogen associated with severe health conditions. The urban population from the Atacama Desert in northern Chile was exposed to extremely high As levels (up to 600 µg/l) in drinking water between 1958 and 1971, leading to increased incidence of urinary bladder cancer (BC), skin cancer, kidney cancer, and coronary thrombosis decades later. Besides, the Andean Native-American ancestors of the Atacama population were previously exposed for millennia to elevated As levels in water (∼120 µg/l) for at least 5,000 years, suggesting adaptation to this selective pressure. Here, we performed two genome-wide selection tests—PBS_n₁ and an ancestry-enrichment test—in an admixed population from Atacama, to identify adaptation signatures to As exposure acquired before and after admixture with Europeans, respectively. The top second variant selected by PBS_n₁ was associated with LCE4A-C1orf68, a gene that may be involved in the immune barrier of the epithelium during BC. We performed association tests between the top PBS_n₁ hits and BC occurrence in our population. The strongest association (P = 0.012) was achieved by the LCE4A-C1orf68 variant. The ancestry-enrichment test detected highly significant signals (P = 1.3 × 10⁻⁹) mapping MAK16, a gene with important roles in ribosome biogenesis during the G1 phase of the cell cycle. Our results contribute to a better understanding of the genetic factors involved in adaptation to the pathophysiological consequences of exposure.

DOI https://doi.org/10.1093/gbe/evz172

Extending general compact querieable representations to GIS applications

Brisaboa, N. R., Cerdeira-Pena, A., de Bernardo, G., Navarro, G., & Pedreira, Ó. (2020). Extending general compact queriable representations to GIS applications. Information Sciences, 506, 196–216. DOI https://doi.org/10.1016/j.ins.2019.08.007

The raster model is commonly used for the representation of images in many domains and is especially useful in Geographic Information Systems (GIS) to store information about continuous variables of the space (elevation, temperature, etc.). Current representations of raster data are usually designed for external memory or, when stored in main memory, lack efficient query capabilities. In this paper, we propose compact representations to efficiently store and query raster datasets in the main memory. We present different representations for binary raster data, general raster data, and time-evolving raster data. We experimentally compare our proposals with traditional storage mechanisms such as linear quadtrees or compressed GeoTIFF files. Results show that our structures are up to 10 times smaller than classical linear quadtrees, and even comparable in space to non-queriable representations of raster data, while efficiently answering a number of typical queries.

DOI https://doi.org/10.1016/j.ins.2019.08.007

Taming the digital information tide to promote equality

Valenzuela, S., Rojas, H. Taming the digital information tide to promote equality. Nat Hum Behav 3, 1134–1136 (2019). https://doi.org/10.1038/s41562-019-0700-9

Interactive technologies are changing the ways we learn facts, develop attitudes and participate in politics, with the ensuing risk of increasing pre-existing inequalities. Addressing this challenge is the duty of researchers, technology companies, governments and news organizations.

https://doi.org/10.1038/s41562-019-0700-9

Getting Prepared to Be Prepared: How Interpersonal Skills Aid Fieldwork in Challenging Contexts

Alberti, C., & Jenne, N. (2019). Getting Prepared to Be Prepared: How Interpersonal Skills Aid Fieldwork in Challenging Contexts. Qualitative Sociology Review, 15(3), 42-62. https://doi.org/10.18778/1733-8077.15.3.03

This article deals with fieldwork in challenging research contexts that make preparation for field research particularly difficult. Challenging contexts include generally insecure places, politicized contexts, and unknown settings. Drawing on our experience in the field, we discuss four challenges that are common across these contexts: access, positionality, researcher well-being, and research design, and data collection. Bringing together insights from fieldwork with urban elites and in the countryside, this paper describes problems that occurred in both settings and identifies a set of interpersonal skills that helped the authors to tackle the challenges of the field and seize the opportunities it offered. This article posits that recognizing the importance of certain interpersonal skills, namely: openness, empathy, humility, and flexibility, precedes the identification of practical tools. Interpersonal skills, instead, focus on a general attitude that underlies researchers’ capacity to make informed choices about specific courses of action, preparing fieldworkers to be prepared to confront problems once they arise.

DOI: https://doi.org/10.18778/1733-8077.15.3.03

An ELMo-inspired approach to SemDeep-5’s Word-in-Context task

Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer. Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5), August 2019. Pages: 21–25. https://www.aclweb.org/anthology/W19-5804

This paper describes a submission to the Word-in-Context competition for the IJCAI 2019 SemDeep-5 workshop. The task is to determine whether a given focus word is used in the same or different senses in two contexts. We took an ELMo-inspired approach similar to the baseline model in the task description paper, where contextualized representations are obtained for the focus words and a classification is made according to the degree of similarity between these representations. Our model had a few simple differences, notably joint training of the forward and backward LSTMs, a different choice of states for the contextualized representations and a new similarity measure for them. These changes yielded a 3.5% improvement on the ELMo baseline.

https://www.aclweb.org/anthology/W19-5804.pdf

Analyzing the Design Space for Visualizing Neural Attention in Text Classification

Parra, D., Valdivieso, H., Carvallo, A., Rada, G., Verbert, K., & Schreck, T. (2019). Analyzing the Design Space for Visualizing Neural Attention in Text Classification. In Proc. IEEE VIS Workshop on Vis X AI: 2nd Workshop on Visualization for AI Explainability (VISxAI). https://observablehq.com/@clpuc/analyzing-the-design-space-for-visualizing-neural-attenti

Introduction: Deep Neural Networks (DNNs) are a type of machine learning model (Goodfellow et al, 2016) which have reported state-of-the-art results in several tasks in the past years. Despite the impressive results reported by these models in several fields such as computer vision (Krizhevsky et al., 2014), natural language processing (Mikolov et al., 2013) or recommender systems (Covington et al., 2016), one of their biggest drawbacks is their lack of interpretability and transparency. Some of the best performing DNN models have millions of parameters, so making sense of what these models learn is an active research challenge. These algorithms can help to solve and automate difficult and expensive tasks, but their adoption in critical domains, which usually requires liability, depends on making their decision interpretable by humans. Some large funding programs such as DARPA XAI (Gunning and Aha, 2019) are addressing this problem, providing evidence of their importance. On the other side, recent legislation such as Europe’s GDPR gives people the right to explainability of automated decisions regarding their private data.

One of the most significant techniques introduced to DNNs in the latest years is the so called attention mechanism (Larrochelle and Hinton, 2010). The idea is inspired by our visual system, since humans focus selectively on parts rather than on a whole image, combining information from several fixations to form the full scene (Mnih et al, 2014). This mechanism allows the network to focus on a subset of inputs or parameters when trained on a task. Attention has improved the performance of these models, and it has also given them a chance to be more explainable. Inspecting what the model is paying attention to helps to make the model accountable in tasks such as image classification, document classification or automatic image captioning. Despite this potential, researchers in the area of machine learning usually use the traditional visualization idioms available in software packages, rather than studying all the options for visual encodings to represent models, results or parameters more effectively. We see a chance of using design principles from information visualization in order to improve the way that neural attention models are visually presented.

This article focuses on the design space to analyze, inspect and understand what neural attention models are learning. In particular, we aim at contributing to the field of Explainable Artificial Intelligence (XAI), by describing the potential design space as well as informed decisions to take into account when presenting the results of neural networks using the attention mechanism. We also propose some initial ideas with a use case: classification of biomedical documents.

https://observablehq.com/@clpuc/analyzing-the-design-space-for-visualizing-neural-attenti

Comparing Word Embeddings for Document Screening based on Active Learning

Carvallo, Andres and Denis Parra. “Comparing Word Embeddings for Document Screening based on Active Learning.” CEUR Workshop Proceedings. BIRNDL@SIGIR (July 25, 2019). http://ceur-ws.org/Vol-2414/paper10.pdf

Document screening is a fundamental task within Evidencebased Medicine (EBM), a practice that provides scientific evidence to support medical decisions. Several approaches are attempting to reduce the workload of physicians who need to screen and label hundreds or thousands of documents in order to answer specific clinical questions. Previous works have attempted to semi-automate document screening, reporting promising results, but their evaluation is conducted using small datasets, which hinders generalization. Moreover, some recent works have used recently introduced neural language models, but no previous work have compared, for this task, the performance of different language models based on neural word embeddings, which have reported good results in the latest years for several NLP tasks. In this work, we evaluate the performance of two popular neural word embeddings (Word2vec and GloVe) in an active learning-based setting for document screening in EBM, with the goal of reducing the number of documents that physicians need to label in order to answer clinical questions. We evaluate these methods in a small public dataset (HealthCLEF 2017) as well as a larger one (Epistemonikos). Our experiments indicate that Word2vec have less variance and better general performance than GloVe when using active learning strategies based on uncertainty sampling.

http://ceur-ws.org/Vol-2414/paper10.pdf

A Meta-Analysis of the Effects of Cross-Cutting Exposure on Political Participation

Jörg Matthes, Johannes Knoll, Sebastián Valenzuela, David Nicolas Hopmann & Christian Von Sikorski (2019) A Meta-Analysis of the Effects of Cross-Cutting Exposure on Political Participation, Political Communication, 36:4, 523-542, DOI: 10.1080/10584609.2019.1619638

Scholars have advanced many theoretical explanations for expecting a negative or positive relationship between individuals’ cross-cutting exposure—either through interpersonal or mediated forms of communication—and their political participation. However, whether cross-cutting exposure is a positive or negative predictor of participation is still an unsettled question. To help fill this gap, we conducted a meta-analysis of 48 empirical studies comprising more than 70,000 participants examining the association between cross-cutting exposure and political participation. The meta-analysis produced two main findings. First, it shows that, over all studies, there is no significant relationship, r = .002, Zr = .002 (95% CI = −.04 to .05). Second, the null relationship cannot be explained by variations in the characteristics of cross-cutting environments (e.g., topic, place, or source of exposure), participation outcomes (e.g., online vs. offline activities), or methods employed (e.g., experiment vs. survey). Taken together, these results should alleviate concerns about negative effects of cross-cutting exposure on political engagement. Implications for future research are discussed.

https://doi.org/10.1080/10584609.2019.1619638

Knowledge Discovery from News Events on Social Media

Mauricio Quezada, Department of Computer Science, Universidad de Chile; Millenium Institute for Foundational Research on Data, Santiago, Chile. FDIA 2019, JULY 2019. CEUR Workshop Proceedings Vol-2537. http://ceur-ws.org/Vol-2537/

Online activity involves the consumption and production of event-related content. There are about 500 million Twitter messages published every day, and according to surveys, 59% of its users use the platform as a way to get the news. Its high rate of production of multimodal content (text, images, and videos) necessitates having flexible models to understand the dynamics of the information disseminated on social media. This thesis proposes the creation of context models from usergenerated messages on Twitter to discover knowledge as a way to perform high-level quantitative analysis of news events. These models are useful in three perspectives: the spatio-temporal context in which the events develop, the activity of users that react when a high-impact event happens, and the multimodal content that can be exploited to generate a comprehensive summary of the event. Our current work involves the creation of a geopolitical model that relates events and countries, allowing us to discover international relations; the study of what features make an event susceptible to provoke high activity from users, and a characterization that allows us to predict with high precision which events are going to produce high activity. This includes our ongoing work on generating automatic multimodal summaries of events based on the assumption that the users describe the non-textual content in their tweets when they express their facts and opinions around events.

http://ceur-ws.org/Vol-2537/paper-17.pdf

A Learning-Based Framework for Memory-Bounded Heuristic Search: First Results

Carlos Hernández Ulloa, Jorge Baier, William Yeoh, Vadim Bulitko, Sven Koenig. Proceedings of the 12th Annual Symposium on Combinatorial Search (SoCS 2019), July 2019. 978-1-57735-808-4. https://aaai.org/ocs/index.php/SOCS/SOCS19/paper/viewFile/18376/17491

Introduction Memory-bounded search algorithms are typically used when the search space is too large for regular best-first search algorithms like A* to store in memory. There exists a large class of memory-bounded best-first search algorithms including Depth-First Branch-and-Bound (DFBnB), Iterative Deepening A* (IDA*) (Korf 1985), Recursive Best-First Search (RBFS) (Korf 1993), and Simplified Memory-Bounded A* (SMA*) (Russell 1992). Each of these algorithms rely on a different strategy to ensure that they use only a bounded amount of memory: IDA* bounds the amount of memory used by repeatedly running depth-first searches, increasing the explored depth at each iteration. RBFS uses lower and upper bounds that are tightened over time as it explores the search space while keeping only b · d nodes in memory, where b is the branching factor and d is the depth of the tree. And, finally, SMA* keeps only a bounded number of nodes in memory by pruning the least promising nodes from the OPEN list when it runs out of memory. In this abstract, we summarize an alternative approach to memory-bounded best-first search. It is motivated by realtime heuristic search algorithms (Korf 1990), many of which iterate the following steps until the goal is reached: up to k nodes are expanded, where k is a user-defined bound; the h values of expanded nodes are updated to make them more informed; the agents moves along a path along the search tree just expanded. We propose a general framework that iteratively (1) runs a memory-bounded best-first search algorithm that terminates when k nodes are generated. If no solution is found, (2) it updates the h-values of the generated nodes, and (3) purges the h values of some nodes from memory. As such, the total number of h-values ever stored by our approach is upper-bounded by a constant. Under certain (reasonable) conditions, our framework is complete and preserves the (sub)optimality guarantees of the given best-first search algorithm in tree-shaped search spaces. The main conceptual difference between our framework and the SMA* algorithm is that it can be combined with any bestfirst algorithm with very minor modifications. We present experimental results where we plug into our framework memory-bounded variants of Weighted A* (Pohl 1970). On traveling salesman problems we show that our framework is often able to find better solutions than DFBnB and Weighted DFBnB (wDFBnB) and in a smaller amount of time, especially in problems with large search spaces.

https://aaai.org/ocs/index.php/SOCS/SOCS19/paper/viewFile/18376/17491

Compiling Cost-Optimal Multi-Agent Pathfinding to ASP

Rodrigo N. Gómez, Carlos Hernández, Jorge Baier. Proceedings of the 12th Annual Symposium on Combinatorial Search (SoCS 2019), July 2019. 978-1-57735-808-4. https://aaai.org/ocs/index.php/SOCS/SOCS19/paper/viewFile/18374/17489

Introduction: Multi-Agent Pathfinding (MAPF) over grids is the problem of finding n non-conflicting paths that lead n agents from a given initial cell to a given goal cell. Sum-of-costsoptimal MAPF, or simply cost-optimal MAPF, in addition, minimizes the total number of actions performed by each agent before stopping at the goal. Being a combinatorial problem in nature, a number of compilations from MAPF to Satisfiability (SAT) (Surynek et al. 2016) and Answer Set Programming (ASP) exist (Erdem et al. 2013; Gebser et al. 2018). Here we propose and evaluate a new compilation of MAPF over grids to ASP. Unlike existing compilations we are aware of, both to SAT and to ASP, our encoding is the first that produces a number of clauses that is linear on the number of agents. In addition, the clauses that allow representing the optimization objective are also efficiently written, and do not depend on the size of the grid. Like makespan-optimal approaches, our algorithm searches for cost-optimal solutions with increasing makespan. When a solution is found a provably correct upper bound on the maximum makespan at which a true cost-optimal solution exists is computed, and the solver is rerun once more.

https://aaai.org/ocs/index.php/SOCS/SOCS19/paper/viewFile/18374/17489

Boundedness of Conjunctive Regular Path Queries

Pablo Barceló and Diego Figueira and Miguel Romero. 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs). Vol 132. https://doi.org/10.4230/LIPIcs.ICALP.2019.104

We study the boundedness problem for unions of conjunctive regular path queries with inverses (UC2RPQs). This is the problem of, given a UC2RPQ, checking whether it is equivalent to a union of conjunctive queries (UCQ). We show the problem to be ExpSpace-complete, thus coinciding with the complexity of containment for UC2RPQs. As a corollary, when a UC2RPQ is bounded, it is equivalent to a UCQ of at most triple-exponential size, and in fact we show that this bound is optimal. We also study better behaved classes of UC2RPQs, namely acyclic UC2RPQs of bounded thickness, and strongly connected UCRPQs, whose boundedness problem is, respectively, PSpace-complete and Pi_2^P-complete. Most upper bounds exploit results on limitedness for distance automata, in particular extending the model with alternation and two-wayness, which may be of independent interest.

https://doi.org/10.4230/LIPIcs.ICALP.2019.104

Monadic Decomposability of Regular Relations

Pablo Barcelo, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, Reino Niskanen. 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs). Vol 132. https://doi.org/10.4230/LIPIcs.ICALP.2019.103

Monadic decomposibility – the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas – is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it is recognizable, i.e., it has a monadic decomposition (that is, a representation as a boolean combination of cartesian products of regular languages). Regular relations are expressive formalisms which, using an appropriate string encoding, can capture relations definable in Presburger Arithmetic. In fact, their expressive power coincide with relations definable in a universal automatic structure; equivalently, those definable by finite set interpretations in WS1S (Weak Second Order Theory of One Successor). Determining whether a regular relation admits a recognizable relation was known to be decidable (and in exponential time for binary relations), but its precise complexity still hitherto remains open. Our main contribution is to fully settle the complexity of this decision problem by developing new techniques employing infinite Ramsey theory. The complexity for DFA (resp. NFA) representations of regular relations is shown to be NLOGSPACE-complete (resp. PSPACE-complete).

https://doi.org/10.4230/LIPIcs.ICALP.2019.103

On the reproducibility of experiments of indexing repetitive document collections

Antonio Fariña, Miguel A. Martínez-Prieto, Francisco Claude, Gonzalo Navarro, Juan J. Lastra-Díaz, Nicola Prezza, Diego Seco, On the reproducibility of experiments of indexing repetitive document collections, Information Systems, Volume 83, 2019, Pages 181-194, ISSN 0306-4379. https://doi.org/10.1016/j.is.2019.03.007.

This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as a reproducibility package.

DOI https://doi.org/10.1016/j.is.2019.03.007

MāOri Loanwords: A Corpus of New Zealand English Tweets

David Trye, Andreea Calude, Felipe Bravo-Marquez, Te Taka Keegan. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. DOI: 10.18653/v1/P19-2018

Māori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Māori community. Motivated by the lack of linguistic resources for studying how Māori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Māori words that are likely to be known by New Zealanders who do not speak Māori. Since over 30% of these words turned out to be irrelevant, we manually annotated a sample of our tweets into relevant and irrelevant categories. This data was used to train machine learning models to automatically filter out irrelevant tweets.

https://www.aclweb.org/anthology/P19-2018.pdf

A Lightweight Representation of News Events on Social Media

Mauricio Quezada and Barbara Poblete. 2019. A Lightweight Representation of News Events on Social Media. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 1049–1052. DOI: https://doi.org/10.1145/3331184.3331300

The sheer amount of newsworthy information published by users in social media platforms makes it necessary to have efficient and effective methods to filter and organize content. In this scenario, off-the-shelf methods fail to process large amounts of data, which is usually approached by adding more computational resources. Simple data aggregations can help to cope with space and time constraints, while at the same time improve the effectiveness of certain applications, such as topic detection or summarization. We propose a lightweight representation of newsworthy social media data. The proposed representation leverages microblog features, such as redundancy and re-sharing capabilities, by using surrogate texts from shared URLs and word embeddings. Our representation allows us to achieve comparable clustering results to those obtained by using the complete data, while reducing running time and required memory. This is useful when dealing with noisy and raw user-generated social media data.

https://doi.org/10.1145/3331184.3331300

Hate speech detection is not as easy as you may think: A closer look at model validation (extended version)

Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, USA, 45–54. DOI: https://doi.org/10.1145/3331184.3331262

Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods.

https://doi.org/10.1145/3331184.3331262

Cell cycle and protein complex dynamics in discovering signaling pathways

Inostroza, D., Hernández, C., Seco, D., Navarro, G., & Olivera-Nappa, A. (2019). Cell cycle and protein complex dynamics in discovering signaling pathways. Journal of Bioinformatics and Computational Biology, 1950011. DOI: https://doi.org/10.1142/s0219720019500112

Signaling pathways are responsible for the regulation of cell processes, such as monitoring the external environment, transmitting information across membranes, and making cell fate decisions. Given the increasing amount of biological data available and the recent discoveries showing that many diseases are related to the disruption of cellular signal transduction cascades, in silico discovery of signaling pathways in cell biology has become an active research topic in past years. However, reconstruction of signaling pathways remains a challenge mainly because of the need for systematic approaches for predicting causal relationships, like edge direction and activation/inhibition among interacting proteins in the signal flow. We propose an approach for predicting signaling pathways that integrates protein interactions, gene expression, phenotypes, and protein complex information. Our method first finds candidate pathways using a directed-edge-based algorithm and then defines a graph model to include causal activation relationships among proteins, in candidate pathways using cell cycle gene expression and phenotypes to infer consistent pathways in yeast. Then, we incorporate protein complex coverage information for deciding on the final predicted signaling pathways. We show that our approach improves the predictive results of the state of the art using different ranking metrics.24

DOI https://doi.org/10.1142/S0219720019500112

When is Ontology-Mediated Querying Efficient?

P. Barceló, C. Feier, C. Lutz and A. Pieris, "When is Ontology-Mediated Querying Efficient?," 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), Vancouver, BC, Canada, 2019, pp. 1-13, doi: 10.1109/LICS.2019.8785823

In ontology-mediated querying, description logic (DL) ontologies are used to enrich incomplete data with domain knowledge which results in more complete answers to queries. However, the evaluation of ontology-mediated queries (OMQs) over relational databases is computationally hard. This raises the question when OMQ evaluation is efficient, in the sense of being tractable in combined complexity or fixed-parameter tractable. We study this question for a range of ontology-mediated query languages based on several important and widely-used DLs, using unions of conjunctive queries as the actual queries. For the DL ELHI⊥, we provide a characterization of the classes of OMQs that are fixed-parameter tractable. For its fragment ELH⊥ ^dr , which restricts the use of inverse roles, we provide a characterization of the classes of OMQs that are tractable in combined complexity. Both results are in terms of equivalence to OMQs of bounded tree width and rest on a reasonable assumption from parameterized complexity theory. They are similar in spirit to Grohe’s seminal characterization of the tractable classes of conjunctive queries over relational databases. We further study the complexity of the meta problem of deciding whether a given OMQ is equivalent to an OMQ of bounded tree width, providing several completeness results that range from NP to 2ExpTIME, depending on the DL used. We also consider the DL-Lite family of DLs, including members that, unlike εLHI _⊥ , admit functional roles.

https://doi.org/10.1109/LICS.2019.8785823

A Theoretical View on Reverse Engineering Problems for Database Query Languages

Pablo Barceló; University of Chile (CL). CEUR Workshop Proceedings. DL 2019, International Workshop on Description Logics, June 18-21, 2019. http://ceur-ws.org/Vol-2373/invited-1.pdf

Abstract. A typical reverse engineering problem for a query language L is, given a database D and a sets P and N of tuples over D labeled as positive and negative examples, respectively, is there a query q in L that explains P and N, i.e., the evaluation of q on D contains all positive examples in P and none of the negative examples in N? Applications of reverse engineering problems include query-by-example, classifier engineering, and the study of the expressive power of query languages. In this talk I will present a family of tests that solve the reverse engineering problem described above for several query languages of interest, e.g., FO, CQ, UCQs, RPQs, CRPQs, etc. We will see that in many cases such tests directly provide optimal bounds for the problem, as well as for the size of the smallest query that explains the given labeled examples. I will also present restrictions that alleviate the complexity of the problem when it is too high. Finally, I will develop the relationship between reverse engineering and a separability problem recently introduced to assist the task of feature engineering with data management tools.

http://ceur-ws.org/Vol-2373/invited-1.pdf

The Paradox of Participation Versus Misinformation: Social Media, Political Engagement, and the Spread of Misinformation

Sebastián Valenzuela, Daniel Halpern, James E. Katz & Juan Pablo Miranda (2019) The Paradox of Participation Versus Misinformation: Social Media, Political Engagement, and the Spread of Misinformation, Digital Journalism, 7:6, 802-823, DOI: 10.1080/21670811.2019.1623701

The mechanisms by which users of platforms such as Facebook and Twitter spread misinformation are not well understood. In this study, we argue that the effects of informational uses of social media on political participation are inextricable from its effects on misinformation sharing. That is, political engagement is both a major consequence of using social media for news as well as a key antecedent of sharing misinformation. We test our expectations via a two-wave panel survey of online media users in Chile, a country experiencing information disorders comparable to those of the global North. Analyses of the proposed and alternative causal models with two types of structural equation specifications (fixed effects and autoregressive) support our theoretical model. We close with a discussion on how changes in the way people engage with news and politics – brought about by social media – have produced a new dilemma: how to sustain a citizenry that is enthusiastically politically active, yet not spreading misinformation?

https://doi.org/10.1080/21670811.2019.1623701

Connecting Knowledge Compilation Classes Width Parameters

Amarilli, A., Capelli, F., Monet, M. et al. Connecting Knowledge Compilation Classes Width Parameters. Theory Comput Syst 64, 861–914 (2020). https://doi.org/10.1007/s00224-019-09930-2

The field of knowledge compilation establishes the tractability of many tasks by studying how to compile them to Boolean circuit classes obeying some requirements such as structuredness, decomposability, and determinism. However, in other settings such as intensional query evaluation on databases, we obtain Boolean circuits that satisfy some width bounds, e.g., they have bounded treewidth or pathwidth. In this work, we give a systematic picture of many circuit classes considered in knowledge compilation and show how they can be systematically connected to width measures, through upper and lower bounds. Our upper bounds show that bounded-treewidth circuits can be constructively converted to d-SDNNFs, in time linear in the circuit size and singly exponential in the treewidth; and that bounded-pathwidth circuits can similarly be converted to uOBDDs. We show matching lower bounds on the compilation of monotone DNF or CNF formulas to structured targets, assuming a constant bound on the arity (size of clauses) and degree (number of occurrences of each variable): any d-SDNNF (resp., SDNNF) for such a DNF (resp., CNF) must be of exponential size in its treewidth, and the same holds for uOBDDs (resp., n-OBDDs) when considering pathwidth. Unlike most previous work, our bounds apply to any formula of this class, not just a well-chosen family. Hence, we show that pathwidth and treewidth respectively characterize the efficiency of compiling monotone DNFs to uOBDDs and d-SDNNFs with compilation being singly exponential in the corresponding width parameter. We also show that our lower bounds on CNFs extend to unstructured compilation targets, with an exponential lower bound in the treewidth (resp., pathwidth) when compiling monotone CNFs of constant arity and degree to DNNFs (resp., nFBDDs).

https://doi.org/10.1007/s00224-019-09930-2

From Belief in Conspiracy Theories to Trust in Others: Which Factors Influence Exposure, Believing and Sharing Fake News

Halpern D., Valenzuela S., Katz J., Miranda J.P. (2019) From Belief in Conspiracy Theories to Trust in Others: Which Factors Influence Exposure, Believing and Sharing Fake News. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Human Behavior and Analytics. HCII 2019. Lecture Notes in Computer Science, vol 11578. Springer, Cham. https://doi.org/10.1007/978-3-030-21902-4_16

Drawing on social-psychological and political research, we offer a theoretical model that explains how people become exposed to fake news, come to believe in them and then share them with their contacts. Using two waves of a nationally representative sample of Chileans with internet access, we pinpoint the relevant causal factors. Analysis of the panel data indicate that three groups of variables largely explain these phenomena: (1) Personal and psychological factors such as belief in conspiracy theories, trust in others, education and gender; (2) Frequency and specific uses of social media; and (3) Political views and online activism. Importantly, personal and political-psychological factors are more relevant in explaining this behavior than specific uses of social media.

https://doi.org/10.1007/978-3-030-21902-4_16

An Empirical Analysis of Rumor Detection on Microblogs with Recurrent Neural Networks

Bugueño M., Sepulveda G., Mendoza M. (2019) An Empirical Analysis of Rumor Detection on Microblogs with Recurrent Neural Networks. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Human Behavior and Analytics. HCII 2019. Lecture Notes in Computer Science, vol 11578. Springer, Cham. https://doi.org/10.1007/978-3-030-21902-4_21

The popularity of microblogging websites makes them important for information dissemination. The diffusion of large volumes of fake or unverified information could emerge and spread producing damage. Due to the ever-increasing volume of data and the nature of complex diffusion, automatic rumor detection is a very challenging task. Supervised classification and other approaches have been widely used to identify rumors in social media posts. However, despite achieving competitive results, only a few studies have delved into the nature of the problem itself in order to identify key empirical factors that allow defining both the baseline models and their performance. In this work, we learn discriminative features from tweets content and propagation trees by following their sequential propagation structure. To do this we study the performance of a number of architectures based on recursive neural networks conditioning for rumor detection. In addition, to ingest tweets into each network, we study the effect of two different word embeddings schemes: Glove and Google news skip-grams. Results on the Twitter16 dataset show that model performance depends on many empirical factors and that some specific experimental configurations consistently drive to better results.

https://doi.org/10.1007/978-3-030-21902-4_21

Estimating Ground Shaking Regions with Social Media Propagation Trees

Mendoza M., Poblete B., Valderrama I. (2019) Estimating Ground Shaking Regions with Social Media Propagation Trees. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Human Behavior and Analytics. HCII 2019. Lecture Notes in Computer Science, vol 11578. Springer, Cham. https://doi.org/10.1007/978-3-030-21902-4_26

The Mercalli scale of quake damages is based on perceived effects and it has a strong dependence on observers. Recently, we proposed a method for ground shaking intensity estimation based on lexical features extracted from tweets, showing good performance in terms of mean absolute error (MAE). One of the flaws of that method is the detection of the region of interest, i.e., the area of a country where the quake was felt. Our previous results showed enough recall in terms of municipality recovery but a poor performance in terms of accuracy. One of the reasons that help to explain this effect is the presence of data noise as many people comment or confirm a quake in areas where the event was unperceived. This happens because people get awareness of an event by watching news or by word-of-mouth propagation. To alleviate this problem in our earthquake detection system we study how propagation features behave in a region of interest estimation task. The intuition behind our study is that the patterns that characterize a word-of-mouth propagation differ from the patterns that characterize a perceived event. If this intuition is true, we expect to separate both kinds of propagation modes. We do this by computing a number of features to represent propagation trees. Then, we trained a learning algorithm using our features in the specific task of region of interest estimation. Our results show that propagation features behave well in this task, outperforming lexical features in terms of accuracy.

https://doi.org/10.1007/978-3-030-21902-4_26

Claim Behavior over Time in Twitter

Weiss F., Espinoza I., Hurtado J., Mendoza M. (2019) Claim Behavior over Time in Twitter. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Human Behavior and Analytics. HCII 2019. Lecture Notes in Computer Science, vol 11578. Springer, Cham. https://doi.org/10.1007/978-3-030-21902-4_33

Social media is the primary source of information for many people around the world, not only to know about their families and friends but also to read about news and trends in different areas of interest. Fake News or rumors can generate big problems of misinformation, being able to change the mindset of a large group of people concerning a specific topic. Many companies and researchers have put their efforts into detecting these rumors with machine learning algorithms creating reports of the influence of these “news” in social media (https://www.knightfoundation.org/reports/disinformation-fake-news-and-influence-campaigns-on-twitter). Only a few studies have been made in detecting rumors in real-time, considering the first hours of propagation. In this work, we study the spread of a claim, analyzing different characteristics and how propagation patterns behave in time. Experiments show that rumors have different behaviours that can be used to classify them within the first hours of propagation.

https://doi.org/10.1007/978-3-030-21902-4_33

Trajectory Patterns Based on Segment-Cutting Clustering

Luis Cabrera-Crot, Andrea Rodríguez, and Diego Seco; Universidad de Concepción, Chile. Mónica Caniupán, Universidad del Bío-Bío, Chile. CEUR Workshop Proceedings. AMW 2019, June 3-7, 2019. http://ceur-ws.org/Vol-2369/paper02.pdf

Trajectory patterns characterize similar behaviors among trajectories, which play an important role in applications such as urban planning, traffic congestion control, and studies of animal migration and natural phenomena. In this paper we model trajectories as a sequence of line segments that represent the steady movement of an object along time. We use a segment-clustering process to group trajectories’ segments and partial segments based on their temporal and spatial closeness. Then, it defines a trajectory pattern that results from the aggregation of segment clusters, aggregation that is not only based on spatial and temporal sequentiality, but also on the compatibility of trajectories in each segment cluster. The experimental assessment shows the effectiveness of the method.

LINK: http://ceur-ws.org/Vol-2369/paper02.pdf

Linear Recursion in G-CORE

Valentina Urzua and Claudio Gutierrez. Department of Computer Science, Universidad de Chile and IMFD. CEUR Workshop Proceedings. Alberto Mendelzon Workshop on Foundations of Data Management. June 3–7, 2019. http://ceur-ws.org/Vol-2369/short07.pdf

G-CORE is a query language with two key characteristics: It is closed under graphs and incoporates paths as first-class citizens. Currently G-CORE does not have recursion. In this paper we propose this extension and show how to code classical polynomial graph algorithms with it.

LINK: http://ceur-ws.org/Vol-2369/short07.pdf

RDF and Property Graphs Interoperability: Status and Issues

Renzo Angles, Universidad de Talca, Chile; Millennium Institute for Foundational Research on Data, Chile. Harsh Thakkar, University of Bonn, Germany. Dominik Tomaszuk, University of Bialystok, Poland. CEUR Workshop Proceedings, AMW 2019, June 3-7, 2019. http://ceur-ws.org/Vol-2369/paper01.pdf

RDF and Property Graph databases are two approaches for data management that are based on modeling, storing and querying graph-like data. In this paper, we present a short study about the interoperability between these approaches. We review the current solutions to the problem, identify their features, and discuss the inherent issues.

LINK: http://ceur-ws.org/Vol-2369/paper01.pdf

Preferences for Redistribution and Tax Burdens in Latin America

Bogliaccini, J., & Luna, J. (2019). Preferences for Redistribution and Tax Burdens in Latin America. In G. Flores-Macías (Ed.), The Political Economy of Taxation in Latin America (pp. 219-241). Cambridge: Cambridge University Press. DOI https://doi.org/10.1017/9781108655934.009

DOI https://doi.org/10.1017/9781108655934.009

Agenda Setting and Journalism

Valenzuela, S. (2019, June 25). Agenda Setting and Journalism. Oxford Research Encyclopedia of Communication. Retrieved 27 Oct. 2020, from https://oxfordre.com/communication/view/10.1093/acrefore/9780190228613.001.0001/acrefore-9780190228613-e-777.

People use the news media to learn about the world beyond their family, neighborhood, and workplace. As news consumers, we depend on what television, social media, websites, radio stations, and newspapers decide to inform us about. This is because all news media, whether through journalists or digital algorithms, select, process, and filter information to their users. Over time, the aspects that are prominent in the news media usually become prominent in public opinion. The ability of journalists to influence which issues, aspects of these issues, and persons related to these issues, are perceived as the most salient has come to be called the agenda-setting effect of journalism.

First described by Maxwell McCombs and Donald Shaw in a seminal study conducted during the 1968 elections in the United States, agenda-setting theory has expanded to include several other aspects beyond the transfer of salience of issues from the media agenda to the public agenda. These aspects include: the influence of journalism on the attributes of issues and people that make news; the networks between the different elements in the media and public agendas; the determinants of the news media agenda; the psychological mechanisms that regulate agenda-setting effects; and the consequences of agenda setting on both citizens’ and policymakers’ attitudes and behaviors. As one of the most comprehensive and international theories of journalism studies available, agenda setting continues to evolve in the expanding digital media landscape.

https://doi.org/10.1093/acrefore/9780190228613.013.777

Testing the Hypothesis of “Impressionable Years” With Willingness to Self-Censor in Chile

Nicolle Etchegaray, Andrés Scherman, Sebastián Valenzuela, Testing the Hypothesis of “Impressionable Years” With Willingness to Self-Censor in Chile, International Journal of Public Opinion Research, Volume 31, Issue 2, Summer 2019, Pages 331–348, https://doi.org/10.1093/ijpor/edy012

This study seeks to deepen our understanding of the factors that explain individuals’ willingness to self-censor (WtSC)—the proclivity to withhold an opinion from an audience perceived to disagree with that opinion. It does so by testing the “impressionable years” hypothesis, which states that the historical context experienced between the age of 18 and 25 years has a lasting effect on individual dispositions such as WtSC. The study was conducted in Chile, an ideal case to explore possible cohort effects because of the profound political changes experienced there in the past 50 years. Analysis of an original cross-sectional survey shows that—as expected—people who came of age in periods of political repression exhibit significantly higher levels of WtSC later in life compared with those who grew up during less repressive times.

https://doi.org/10.1093/ijpor/edy012

Regularizing Conjunctive Features for Classification

Pablo Barceló, Alexander Baumgartner, Victor Dalmau, and Benny Kimelfeld. 2019. Regularizing Conjunctive Features for Classification. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '19). Association for Computing Machinery, New York, NY, USA, 2–16. DOI: https://doi.org/10.1145/3294052.3319680

We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and the goal is to find feature queries that allow for a linear separation between the two sets of examples. We focus on conjunctive feature queries, and explore two fundamental problems: (a) deciding whether separating feature queries exist (separability), and (b) generating such queries when they exist. In the approximate versions of these problems, we allow a predefined fraction of the examples to be misclassified. To restrict the complexity of the generated classifiers, we explore various ways of regularizing (i.e., imposing simplicity constraints on) them by limiting their dimension, the number of joins in feature queries, and their generalized hypertree width (ghw). Among other results, we show that the separability problem is tractable in the case of bounded ghw; yet, the generation problem is intractable, simply because the feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without ever explicitly generating the feature queries.

https://doi.org/10.1145/3294052.3319680

Expressiveness of Matrix and Tensor Query Languages in terms of ML Operators

Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux. 2019. Expressiveness of Matrix and Tensor Query Languages in terms of ML Operators. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning (DEEM'19). Association for Computing Machinery, New York, NY, USA, Article 9, 1–5. DOI: https://doi.org/10.1145/3329486.3329498

Tensors are one of the most widely used data structures in modern Machine Learning applications. Although they provide a flexible way of storing and accessing data, they often expose too many low-level details that may result in error prone code that is difficult to maintain and extend. Abstracting low-level functionalities into high-level operators in the form of a query language is a task in which the Data Management community has extensive experience. It is thus important to understand how such an experience can be applied in the design of useful languages for tensor manipulation.

In this short paper we study a matrix and a tensor query language that have been recently proposed in the database literature. We show, by using examples, how these proposals are in line with the practical interest in rethinking tensor abstractions. On the technical side, we compare the two languages in terms of operators that naturally arise in Machine Learning pipelines, such as convolution, matrix-inverse, and Einstein summation. We hope our results to provide a theoretical kick-off for the discussion on the design of core declarative query languages for tensors.

https://doi.org/10.1145/3329486.3329498

Database Repairs and Consistent Query Answering: Origins and Further Developments

Leopoldo Bertossi. 2019. Database Repairs and Consistent Query Answering: Origins and Further Developments. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '19). Association for Computing Machinery, New York, NY, USA, 48–58. DOI: https://doi.org/10.1145/3294052.3322190

In this article we review the main concepts around database repairs and consistent query answering, with emphasis on tracing back the origin, motivation, and early developments. We also describe some research directions that has spun from those main concepts and the original line of research. We emphasize, in particular, fruitful and recent connections between repairs and causality in databases.

https://doi.org/10.1145/3294052.3322190

Rapid Sequence Matching for Visualization Recommender Systems

Shaoliang Nie, Christopher G. Healey, Rada Y. Chirkova, and Juan L. Reutter. 2019. Rapid Sequence Matching for Visualization Recommender Systems. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (GI'19). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 5, 1–8. DOI: https://doi.org/10.20380/GI2019.05

We present a method to support high quality visualization recommendations for analytic tasks. Visualization converts large datasets into images that allow viewers to efficiently explore, discover, and validate within their data. Visualization recommenders have been proposed that store past sequences: an ordered collection of design choices leading to successful task completion; then match them against an ongoing visualization construction. Based on this matching, a system recommends visualizations that better support the analysts’ tasks. A problem of scalability occurs when many sequences are stored. One solution would be to index the sequence database. However, during matching we require sequences that are similar to the partially constructed visualization, not only those that are identical. We implement a locality sensitive hashing algorithm that converts visualizations into set representations, then uses Jaccard similarity to store similar sequence nodes in common hash buckets. This allows us to match partial sequences against a database containing tens of thousands of full sequences in less than 100ms. Experiments show that our algorithm locates 95% or more of the sequences found in an exhaustive search, producing high-quality visualization recommendations.

LINK: https://doi.org/10.20380/GI2019.05

Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation

Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, Cristian Riveros. PODS '19: Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsJune 2019 Pages 59–73 https://doi.org/10.1145/3294052.3319704

We study two simple yet general complexity classes, which provide a unifying framework for ecient query evaluation in areas like graph databases and information extraction, among others. We investigate the complexity of three fundamental algorithmic problems for these classes: enumeration, counting and uniform generation of solutions, and show that they have several desirable properties in this respect. Both complexity classes are defined in terms of non deterministic logarithmic-space transducers (NL transducers). For the first class, we consider the case of unambiguous NL transducers, and we prove constant delay enumeration, and both counting and uniform generation of solutions in polynomial time. For the second class, we consider unrestricted NL transducers, and we obtain polynomial delay enumeration, approximate counting in polynomial time, and polynomialtime randomized algorithms for uniform generation. More specifically, we show that each problem in this second class admits a fully polynomial-time randomized approximation scheme (FPRAS) and a polynomial-time Las Vegas algorithm (with preprocessing) for uniform generation. Remarkably, the key idea to prove these results is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length n (given in unary) accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem, and obtain as a welcome corollary that every function in SpanL admits an FPRAS.

LINK: https://doi.org/10.1145/3294052.3319704

Repair-Based Degrees of Database Inconsistency

Bertossi L. (2019) Repair-Based Degrees of Database Inconsistency. In: Balduccini M., Lierler Y., Woltran S. (eds) Logic Programming and Nonmonotonic Reasoning. LPNMR 2019. Lecture Notes in Computer Science, vol 11481. Springer, Cham. https://doi.org/10.1007/978-3-030-20528-7_15

We propose and investigate a concrete numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on a database repair semantics associated to cardinality-repairs. More specifically, it is shown that the computation of this measure can be intractable in data complexity, but answer-set programs are exhibited that can be used to compute it. Furthermore, its is established that there are polynomial-time deterministic and randomized approximations. The behavior of this measure under small updates is analyzed, obtaining fixed-parameter tractability results. We explore abstract extensions of this measure that appeal to generic classes of database repairs. Inconsistency measures and repairs at the attribute level are investigated as a particular, but relevant and natural case.

https://doi.org/10.1007/978-3-030-20528-7_15

Using Twitter to Infer User Satisfaction With Public Transport: The Case of Santiago, Chile

J. T. Méndez, H. Lobel, D. Parra and J. C. Herrera, "Using Twitter to Infer User Satisfaction With Public Transport: The Case of Santiago, Chile," in IEEE Access, vol. 7, pp. 60255-60263, 2019, doi: 10.1109/ACCESS.2019.2915107

User satisfaction is an important aspect to consider in any public transport system, and as such, regular and sound measurements of its levels are fundamental. However, typical evaluation schemes involve costly and time-consuming surveys. As a consequence, their frequency is not enough to properly and timely characterize the satisfaction of the users. In this paper, we propose a methodology, based on Twitter data, to capture the satisfaction of a large mass of users of public transport, allowing us to improve the characterization and location of their satisfaction level. We analyzed a massive volume of tweets referring to the public transport system in Santiago, Chile (Transantiago) using text mining techniques, such as sentiment analysis and topic modeling, in order to capture and group bus users’ expressions. Results show that, although the level of detail and variety of answers obtained from surveys are higher than the ones obtained by our method, the amount of bus stops and bus services covered by the proposed scheme is larger. Moreover, the proposed methodology can be effectively used to diagnose problems in a timely manner, as it is able to identify and locate trends, and issues related to bus operating firms, whereas surveys tend to produce average answers. Based on the consistency and logic of the results, we argue that the proposed methodology can be used as a valuable complement to surveys, as both present different, but compatible characteristics.

https://doi.org/10.1109/ACCESS.2019.2915107

AffectiveTweets: a Weka Package for Analyzing Affect in Tweets

Bravo-Marquez, Felipe & Pfahringer, Bernhard & Mohammad, Saif & Frank, Eibe. (2019). AffectiveTweets: a Weka Package for Analyzing Affect in Tweets. Journal of Machine Learning Research. 20. 1-6. https://www.jmlr.org/papers/volume20/18-450/18-450.pdf

AffectiveTweets is a set of programs for analyzing emotion and sentiment of social media messages such as tweets. It is implemented as a package for the Weka machine learning workbench and provides methods for calculating state-of-the-art affect analysis features from tweets that can be fed into machine learning algorithms implemented in Weka. It also implements methods for building affective lexicons and distant supervision methods for training affective models from unlabeled tweets. The package was used by several teams in the shared tasks: EmoInt 2017 and Affect in Tweets SemEval 2018 Task 1.

https://www.jmlr.org/papers/volume20/18-450/18-450.pdf

Sketch-Aided Retrieval of Incomplete 3D Cultural Heritage Objects

Stefan Lengauer, Alexander Komar, Arniel Labrada, Stephan Karl, Elisabeth Trinkl, Reinhold Preiner, Benjamin Bustos, Tobias Schreck. 12th Eurographics Workshop on 3D Object Retrieval, Italy, May 2019. https://doi.org/10.2312/3dor.20191057

Due to advances in digitization technology, documentation efforts and digital library systems, increasingly large collections of visual Cultural Heritage (CH) object data becomes available, offering rich opportunities for domain analysis, e.g., for comparing, tracing and studying objects created over time. In principle, existing shape- and image-based similarity search methods can aid such domain analysis tasks. However, in practice, visual object data are given in different modalities, including 2D, 3D, sketches or conventional drawings like profile sections or unwrappings. In addition, collections may be distributed across different publications and repositories, posing a challenge for implementing encompassing search and analysis systems. We introduce a methodology and system for cross-modal visual search in CH object data. Specifically, we propose a new query modality based on 3D views enhanced by user sketches (3D+sketch). This allows for adding new context to the search, which is useful e.g., for searching based on incomplete query objects, or for testing hypotheses on existence of certain shapes in a collection. We present an appropriately designed workflow for constructing query views from incomplete 3D objects enhanced by a user sketch based on shape completion and texture inpainting. Visual cues additionally help users compare retrieved objects with the query. We apply our method on a set of relevant 3D and view-based CH object data, demonstrating the feasibility of our approach and its potential to support analysis of domain experts in Archaeology and the field of CH in general.

https://doi.org/10.2312/3dor.20191057

Bitcoin Price Prediction Through Opinion Mining

Germán Cheuque Cerda and Juan L. Reutter. 2019. Bitcoin Price Prediction Through Opinion Mining. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, New York, NY, USA, 755–762. DOI: https://doi.org/10.1145/3308560.3316454

The Bitcoin protocol and its underlying cryptocurrency have started to shape the way we view digital currency, and opened up a large list of new and interesting challenges. Amongst them, we focus on the question of how is the price of digital currencies affected, which is a natural question especially when considering the price rollercoaster we witnessed for bitcoin in 2017-2018. We work under the hypothesis that price is affected by the web footprint of influential people, we refer to them as crypto-influencers.

In this paper we provide neural models for predicting bitcoin price. We compare what happens when the model is fed only with recent price history versus what happens when fed, in addition, with a measure of the positivity or negativity of the sayings of these influencers, measured through a sentiment analysis of their twitter posts. We show preliminary evidence that twitter data should indeed help to predict the price of bitcoin, even though the measures we use in this paper have a lot of room for refinement. In particular, we also discuss the challenges of measuring the correct sensation of these posts, and discuss the work that should help improving our discoveries even further.

https://doi.org/10.1145/3308560.3316454

NIFify: Towards Better Quality Entity Linking Datasets

Henry Rosales-Méndez, Aidan Hogan, and Barbara Poblete. 2019. NIFify: Towards Better Quality Entity Linking Datasets. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, New York, NY, USA, 815–818. DOI: https://doi.org/10.1145/3308560.3316465

The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with a corresponding unambiguous entry in a Knowledge Base. The evaluation of EL systems relies on the comparison of their results against gold standards. A common format used to represent gold standard datasets is the NLP Interchange Format (NIF), which uses RDF as a data model. However, creating gold standard datasets for EL is a time-consuming and error-prone process. In this paper we propose a tool called NIFify to help manually generate, curate, visualize and validate EL annotations; the resulting tool is useful, for example, in the creation of gold standard datasets. NIFify also serves as a benchmark tool that enables the assessment of EL results. Using the validation features of NIFify, we further explore the quality of popular EL gold standards.

https://doi.org/10.1145/3308560.3316465

Car Theft Reports: a Temporal Analysis from a Social Media Perspective

Juglar Diaz and Barbara Poblete. 2019. Car Theft Reports: a Temporal Analysis from a Social Media Perspective. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, New York, NY, USA, 779–782. DOI: https://doi.org/10.1145/3308558.3316462

Complex human behaviors related to crime require multiple sources of information to understand them. Social Media is a place where people share opinions and news. This allows events in the physical world like crimes to be reflected on Social Media. In this paper we study crimes from the perspective of Social Media, specifically car theft and Twitter. We use data of car theft reports from Twitter and car insurance companies in Chile to perform a temporal analysis. We found that there is an increasing correlation in recent years between the number of car theft reports in Twitter and data collected from insurance companies. We performed yearly, monthly, daily and hourly analyses. Though Twitter is an unstructured source and very noisy, it allows you to estimate the volume of thefts that are reported by the insurers. We experimented with a Moving Average to predict the tendency in the number of car theft reported to insurances using Twitter data and found that one month is the best time window for prediction.

https://doi.org/10.1145/3308558.3316462

Mining the Relationship BetweenCar Theft and Places of Social Interest in Santiago Chile

Karen Oróstica and Barbara Poblete. 2019. Mining the Relationship BetweenCar Theft and Places of Social Interest in Santiago Chile. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, New York, NY, USA, 811–814. DOI: https://doi.org/10.1145/3308558.3316464

Recent work suggests that certain places can be more attractive for car theft based how many people regularly visit them, as well as other factors. In this sense, we must also consider the city or district itself where vehicles are stolen. All cities have different cultural and socioeconomic characteristics that influence car theft patterns. In particular, the distribution of public services and places attract a large crowd could play a key role in the occurrence of car theft. Santiago, a city that displays drastic socioeconomic differences among its districts, presents increasingly-high car theft rates. This represents a serious issue for the city, as for any other major city, which –at least for Santiago– has not been analyzed in depth using quantitative approaches. In this work, we present a preliminary study of how places that create social interest, such as restaurants, bars, schools, and shopping malls, increase car theft frequency in Santiago. We also study if some types of places are more attractive than others for this type of crime. To evaluate this, we propose to analyze car theft points (CTP) from insurance companies and their relationship with places of social interest (PSI) extracted from Google Maps, using a proximity based approach. Our findings show a high correlation between CTP and PSI for all of the social interest categories that we studied in the different districts of the Santiago. In particular our work contributes to the understanding of the social factors that are associated to car thefts.

https://doi.org/10.1145/3308558.3316464

Recommender Systems for Online Video Game Platforms: the Case of STEAM

Germán Cheuque, José Guzmán, and Denis Parra. 2019. Recommender Systems for Online Video Game Platforms: the Case of STEAM. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Association for Computing Machinery, New York, NY, USA, 763–771. DOI: https://doi.org/10.1145/3308560.3316457

The world of video games has changed considerably over the recent years. Its diversification has dramatically increased the number of users engaged in online communities of this entertainment area, and consequently, the number and types of games available. This context of information overload underpins the development of recommender systems that could leverage the information that the video game platforms collect, hence following the trend of new games coming out every year. In this work we test the potential of state-of-the-art recommender models based respectively on Factorization Machines (FM), deep neural networks (DeepNN) and one derived from the mixture of both (DeepFM), chosen for their potential of receiving multiple inputs as well as different types of input variables. We evaluate our results measuring the ranking accuracy of the recommendation and the diversity/novelty of a recommendation list. All the algorithms achieve better results than a baseline based on implicit feedback (Alternating Least Squares model). The best performing algorithm is DeepNN, the high order interactions are more important than the low order ones for this recommendation task. We also analyze the effect of the sentiment extracted directly from game reviews, and find that it is not as relevant for recommendation as one might expect. We are the first in studying the aforementioned recommender systems over the context of online video game platforms, reporting novel results which could be used as baseline in future works.

https://doi.org/10.1145/3308560.3316457

Serialization for Property Graphs

Tomaszuk D., Angles R., Szeremeta Ł., Litman K., Cisterna D. (2019) Serialization for Property Graphs. In: Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D. (eds) Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis. BDAS 2019. Communications in Computer and Information Science, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-030-19093-4_5

Graph serialization is very important for the development of graph-oriented applications. In particular, serialization methods are fundamental in graph data management to support database exchange, benchmarking of systems, and data visualization. This paper presents YARS-PG, a data format for serializing property graphs. YARS-PG was designed to be simple, extensible and platform independent, and to support all the features provided by the current database systems based on the property graph data model.

LINK: https://doi.org/10.1007/978-3-030-19093-4_5

Exposing the president: The political angle of a natural disaster in Chile

Magdalena Saldaña. International Symposium on Online Journalism ISOJ 2019. https://isoj.org/research/exposing-the-president-the-political-angle-of-a-natural-disaster-in-chile/

Chile is a country with high levels of digital news consumption but decreasing levels of confidence in journalism and traditional news media outlets. In a place where natural disasters are common, Chilean citizens usually turn to digital and social media to find out more information about how events unfold. By relying on in-depth interviews with reporters who covered the 2014 earthquake in northern Chile, this study examines how Chilean journalists approached a highly politicized natural disaster. Results show that reporters covered the earthquake as a political issue due to editorial prompting, and they used social media as another way to get close to the sources they already know, but not to look for alternative sources. The implications of these findings for media scholars and practitioners relate to the normalization of social media use among journalists, and the influence of a news outlet’s political leaning on journalistic practices.

https://isoj.org/research/exposing-the-president-the-political-angle-of-a-natural-disaster-in-chile/

Proof-of-Learning: A Blockchain Consensus Mechanism Based on Machine Learning Competitions

F. Bravo-Marquez, S. Reeves and M. Ugarte, "Proof-of-Learning: A Blockchain Consensus Mechanism Based on Machine Learning Competitions," 2019 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON), Newark, CA, USA, 2019, pp. 119-124, doi: 10.1109/DAPPCON.2019.00023

This article presents WekaCoin, a peer-to-peer cryptocurrency based on a new distributed consensus protocol called Proof-of-Learning. Proof-of-learning achieves distributed consensus by ranking machine learning systems for a given task. The aim of this protocol is to alleviate the computational waste involved in hashing-based puzzles and to create a public distributed and verifiable database of state-of-the-art machine learning models and experiments.

https://doi.org/10.1109/DAPPCON.2019.00023

Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

Mirabal, P., Abreu, J., & Seco, D. (2019). Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string. Pattern Recognition Letters, 120, 104–111. DOI: https://doi.org/10.1016/j.patrec.2019.02.004

Different pattern recognition techniques such as clustering, k-nearest neighbor classification, or instance reduction algorithms require prototypes to represent pattern classes. In many applications, strings are used to encode instances, for example, in contour representations or in biological data such as DNA, RNA, and protein sequences. Median strings have been used as representatives of a set of strings in different domains. Finding the median string is an NP-Complete problem for several formulations. Alternatively, heuristic approaches that iteratively refine an initial coarse solution by applying edit operations have been proposed. We propose here a novel algorithm that outperforms the state of the art heuristic approximations to the median string in terms of convergence speed by estimating the effect of a perturbation in the minimization of the expressions that define the median strings. We present comparative experiments to validate these results.

DOI https://doi.org/10.1016/j.patrec.2019.02.004

Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features

Messina, P., Dominguez, V., Parra, D. et al. Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features. User Model User-Adap Inter 29, 251–290 (2019). https://doi.org/10.1007/s11257-018-9206-9

Recommender Systems help us deal with information overload by suggesting relevant items based on our personal preferences. Although there is a large body of research in areas such as movies or music, artwork recommendation has received comparatively little attention, despite the continuous growth of the artwork market. Most previous research has relied on ratings and metadata, and a few recent works have exploited visual features extracted with deep neural networks (DNN) to recommend digital art. In this work, we contribute to the area of content-based artwork recommendation of physical paintings by studying the impact of the aforementioned features (artwork metadata, neural visual features), as well as manually-engineered visual features, such as naturalness, brightness and contrast. We implement and evaluate our method using transactional data from UGallery.com, an online artwork store. Our results show that artwork recommendations based on a hybrid combination of artist preference, curated attributes, deep neural visual features and manually-engineered visual features produce the best performance. Moreover, we discuss the trade-off between automatically obtained DNN features and manually-engineered visual features for the purpose of explainability, as well as the impact of user profile size on predictions. Our research informs the development of next-generation content-based artwork recommenders which rely on different types of data, from text to multimedia.

https://doi.org/10.1007/s11257-018-9206-9

Dv2v: A Dynamic Variable-to-Variable Compressor

N. R. Brisaboa, A. Fariña, A. Gómez-Brandón, G. Navarro and T. V. Rodeiro, "Dv2v: A Dynamic Variable-to-Variable Compressor," 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 83-92, https://doi.org/10.1109/DCC.2019.00016

We present D-v2v, a new dynamic (one-pass) variable-to-variable compressor. Variable-to-variable compression aims at using a modeler that gathers variable-length input symbols and a variable-length statistical coder that assigns shorter codewords to the more frequent symbols. In D-v2v, we process the input text word-wise to gather variable-length symbols that can be either terminals (new words) or non-terminals, subsequences of words seen before in the input text. Those input symbols are set in a vocabulary that is kept sorted by frequency. Therefore, those symbols can be easily encoded with dense codes. Our D-v2v permits real-time transmission of data, i.e. compression/transmission can begin as soon as data become available. Our experiments show thatD-v2vis able to overcome the compression ratios of the v2vDC, the state-of-the-art semi-static variable-to-variable compressor, and to almost reach p7zip values. It also draws a competitive performance at both compression and decompression.

LINK: https://doi.org/10.1109/DCC.2019.00016

A Compact Representation of Raster Time Series

N. Cruces, D. Seco, and G. Guitérrez, "A Compact Representation of Raster Time Series," 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 103-111, DOI: https://doi.org/10.1109/DCC.2019.00018

The raster model is widely used in Geographic Information Systems to represent data that vary continuously in space, such as temperatures, precipitations, elevation, among other spatial attributes. In applications like weather forecast systems, not just a single raster, but a sequence of rasters covering the same region at different timestamps, known as a raster time series, needs to be stored and queried. Compact data structures have proven successful to provide space-efficient representations of rasters with query capabilities. Hence, a naive approach to save space is to use such a representation for each raster in a time series. However, in this paper, we show that it is possible to take advantage of the temporal locality that exists in a raster time series to reduce the space necessary to store it while keeping competitive query times for several types of queries

DOI https://doi.org/10.1109/DCC.2019.00018

Datalog: Bag Semantics via Set Semantics

Leopoldo Bertossi and Georg Gottlob and Reinhard Pichler. 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs). Vol 127. https://doi.org/10.4230/LIPIcs.ICDT.2019.16

Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called warded Datalog±, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. This use of Datalog± is extended to give a set semantics to duplicates in Datalog± itself. We investigate the properties of the resulting Datalog± programs, the problem of deciding multiplicities, and expressibility of some bag operations. Moreover, the proposed translation has the potential for interesting applications such as to Multiset Relational Algebra and the semantic web query language SPARQL with bag semantics.

https://doi.org/10.4230/LIPIcs.ICDT.2019.16

A Formal Framework for Complex Event Processing

Alejandro Grez and Cristian Riveros and Martín Ugarte. 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs). Vol 127. DOI: 10.4230/LIPIcs.ICDT.2019.5.

Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.

LINK: https://doi.org/10.4230/LIPIcs.ICDT.2019.5

Evaluating content novelty in recommender systems

Mendoza, M., Torres, N. Evaluating content novelty in recommender systems. J Intell Inf Syst 54, 297–316 (2020). https://doi.org/10.1007/s10844-019-00548-x

Recommender systems are frequently evaluated using performance indexes based on variants and extensions of precision-like measures. As these measures are biased toward popular items, a list of recommendations simply must include a few popular items to perform well. To address the popularity bias challenge, new approaches for novelty and diversity evaluation have been proposed. On the one hand, novelty-based approaches model the quality of being new as apposed to that which is already known. Novelty approaches are commonly based on item views or user rates. On the other hand, diversity approaches model the quality of an item that is composed of different content elements. Diversity measures are commonly rooted in content-based features that characterize the diversity of the content of an item in terms of the presence/absence of a number of predefined nuggets of information. As item contents are also biased to popular contents (e.g., drama in movies or pop in music), diversity-based measures are also popularity biased. To alleviate the effect of popularity bias on diversity measures, we used an evaluation approach based on the degree of novelty of the elements that make up each item. We named this approach content novelty, as it mixes content and diversity approaches in a single and coherent evaluation framework. Experimental results show that our proposal is feasible and useful. Our findings demonstrate that the proposed measures yield consistent and interpretable results, producing insights that reduce the impact of popularity bias in the evaluation of recommender systems.

https://doi.org/10.1007/s10844-019-00548-x

Microservice-Oriented Platform for Internet of Big Data Analytics: A Proof of Concept

Li, Z.; Seco, D.; Sánchez Rodríguez, A.E. Microservice-Oriented Platform for Internet of Big Data Analytics: A Proof of Concept. Sensors 2019, 19, 1134. doi https://doi.org/10.2290/s19051134

The ubiquitous Internet of Things (IoT) devices nowadays are generating various and numerous data from everywhere at any time. Since it is not always necessary to centralize and analyze IoT data cumulatively (e.g., the Monte Carlo analytics and Convergence analytics demonstrated in this article), the traditional implementations of big data analytics (BDA) will suffer from unnecessary and expensive data transmissions as a result of the tight coupling between computing resource management and data processing logic. Inspired by software-defined infrastructure (SDI), we propose the “micro service-oriented platform” to break the environmental monolith and further decouple data processing logics from their underlying resource management in order to facilitate BDA implementations in the IoT environment (which we name “IoBDA”). Given predesigned standard microservices with respect to specific data processing logics, the proposed platform is expected to largely reduce the complexity in and relieve inexperienced practices of IoBDA implementations. The potential contributions to the relevant communities include (1) new theories of a micro service-oriented platform on top of SDI and (2) a functional micro service-oriented platform for IoBDA with a group of predesigned microservices

DOI https://doi.org/10.3390/s19051134

The agenda-setting role of the news media

Sebastián Valenzuela, Maxwell McCombs. "The agenda-setting role of the news media" in "An Integrated Approach to Communication Theory and Research"; Chapter Eight, 99-112. March 2019. DOI: 10.4324/9780203710753-10

One of the important roles of the mass media is the setting of agendas in daily life. What is emphasized in the media, whether traditional or digital, has been found to have a profound impact on not only what people think, but the salience of the issues at any given point in time. This chapter reviews the theory behind agenda-setting and the variables that form, shape, and prime the public’s opinions, attitudes, and behaviors. The chapter also examines the new media and how it impacts on agenda-setting theory and research.

https://doi.org/10.4324/9780203710753-10

The effect of explanations and algorithmic accuracy on visual recommender systems of artistic images

Vicente Dominguez, Pablo Messina, Ivania Donoso-Guzmán, and Denis Parra. 2019. The effect of explanations and algorithmic accuracy on visual recommender systems of artistic images. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). Association for Computing Machinery, New York, NY, USA, 408–416. DOI: https://doi.org/10.1145/3301275.3302274

There are very few works about explaining content-based recommendations of images in the artistic domain. Current works do not provide a perspective of the many variables involved in the user perception of several aspects of the system such as domain knowledge, relevance, explainability, and trust. In this paper, we aim to fill this gap by studying three interfaces, with different levels of explainability, for artistic image recommendation. Our experiments with N=121 users confirm that explanations of recommendations in the image domain are useful and increase user satisfaction, perception of explainability and relevance. Furthermore, our results show that the observed effects are also dependent on the underlying recommendation algorithm used. We tested two algorithms: Deep Neural Networks (DNN), which has high accuracy, and Attractiveness Visual Features (AVF) with high transparency but lower accuracy. Our results indicate that algorithms should not be studied in isolation, but rather in conjunction with interfaces, since both play a significant role in the perception of explainability and trust for image recommendation. Finally, using the framework by Knijnenburg et al., we provide a comprehensive model which synthesizes the effects between different variables involved in the user experience with explainable visual recommender systems of artistic images.

https://doi.org/10.1145/3301275.3302274

Do Conditionalities Increase Support for Government Transfers?

Cesar Zucco, Juan Pablo Luna & O. Gokce Baykal (2020) Do Conditionalities Increase Support for Government Transfers?, The Journal of Development Studies, 56:3, 527-544, DOI: 10.1080/00220388.2019.1577388

Conditional Cash Transfers (CCTs) have spread through the developing world in the past two decades. It is often assumed that CCTs enjoy political support in the population precisely because they impose conditions on beneficiaries. This article employs survey experiments in Brazil and Turkey to determine whether, and in what contexts, making government transfers conditional on the behavior of beneficiaries increases political support for the programs. Results show that conditional transfers are only marginally more popular than similar unconditional transfers in nationally representative samples, but that this difference is substantially larger among the better-off and among those primed to think of themselves as different from beneficiaries. These findings imply that conditionalities per se are not as strong a determinant of support for transfers as the literature suggests, but that they can still be helpful in building support for transfers among subsets of the population that are least likely to support them.

DOI https://doi.org/10.1080/00220388.2019.1577388

Tag-based information access in image collections: insights from log and eye-gaze analyses

Lin, Y., Parra, D., Trattner, C. et al. Tag-based information access in image collections: insights from log and eye-gaze analyses. Knowl Inf Syst 61, 1715–1742 (2019). https://doi.org/10.1007/s10115-019-01343-4

Tag clouds have been utilized as a “social” way to find and visualize information, providing both one-click access and a snapshot of the “aboutness” of a tagged collection. While many research projects have explored and compared various tag artifacts using information theory and simulations, fewer studies have been conducted to compare the effectiveness of different tag-based browsing interfaces from the user’s point of view. This research aims to investigate how users utilize tags in image search context and to what extent different organizations of tag browsing interfaces are useful for image search. We conducted two experiments to explore user behavior and performance with three interfaces: two tag-enabled interfaces (the regular and faceted tag-clouds) and a baseline (search-only) interface. Our results demonstrate the value of tags in the image search context, the role of tags in the exploratory search, and the strengths of two kinds of tag organization explored in this paper.

https://doi.org/10.1007/s10115-019-01343-4

Nowcasting earthquake damages with Twitter.

Mendoza, M., Poblete, B. & Valderrama, I. Nowcasting earthquake damages with Twitter. EPJ Data Sci. 8, 3 (2019). https://doi.org/10.1140/epjds/s13688-019-0181-0

The Modified Mercalli intensity scale (Mercalli scale for short) is a qualitative measure used to express the perceived intensity of an earthquake in terms of damages. Accurate intensity reports are vital to estimate the type of emergency response required for a particular earthquake. In addition, Mercalli scale reports are needed to estimate the possible consequences of strong earthquakes in the future, based on the effects of previous events. Emergency offices and seismological agencies worldwide are in charge of producing Mercalli scale reports for each affected location after an earthquake. However, this task relies heavily on human observers in the affected locations, who are not always available or accurate. Consequently, Mercalli scale reports may take up to hours or even days to be published after an earthquake. We address this problem by proposing a method for early prediction of spatial Mercalli scale reports based on people’s reactions to earthquakes in social networks. By tracking users’ comments about real-time earthquakes, we create a collection of Mercalli scale point estimates at municipality (i.e., state subdivisions) level granularity. We introduce the concept of reinforced Mercalli support, which combines Mercalli scale point estimates with locally supported data (named ‘local support’). We use this concept to provide Mercalli scale estimates for real-world events by providing smooth point estimates using a spatial smoother that incorporates the distribution of municipalities in each affected region. Our method is the first method based on social media that can provide spatial reports of damages in the Mercalli intensity scale. Experimental results show that our method is accurate and provides early spatial Mercalli reports 30 minutes after an earthquake. Furthermore, we show that our method performs well for earthquake spatial detection and maximum intensity prediction tasks. Our findings indicate that social media is a valuable source of spatial information for quickly estimating earthquake damages.

https://doi.org/10.1140/epjds/s13688-019-0181-0

GraCT: A Grammar-based Compressed Index for Trajectory Data

Brisaboa, N. R., Gómez-Brandón, A., Navarro, G., & Paramá, J. R. (2019). GraCT: A Grammar-based Compressed Index for Trajectory Data. Information Sciences, 483, 106–135. DOI https://doi.org/10.1016/j.ins.2019.01.035

We introduce a compressed data structure for the storage of free trajectories of moving objects that efficiently supports various spatio-temporal queries. Our structure, dubbed GraCT, stores the absolute positions of all the objects at regular time intervals (snapshots) using a k²-tree, which is a space- and time-efficient region quadtree. Positions between snapshots are represented as logs of relative movements and compressed using a grammar-based compressor. The non-terminals of this grammar are enhanced with MBR information to enable fast queries.

The GraCT structure of a dataset occupies less than the raw data compressed with a powerful traditional compressor. Further, instead of requiring full decompression to access the data like a traditional compressor, GraCT supports direct access to object trajectories or to their position at specific time instants, as well as spatial range and nearest-neighbor queries on time instants and/or time intervals.

Compared to traditional methods for storing and indexing spatio-temporal data, GraCT requires two orders of magnitude less space and is competitive in query times. In particular, thanks to its compressed representation, the GraCT structure may reside in main memory in situations where any classical uncompressed index must resort to disk, thereby being one or two orders of magnitude faster.

DOI https://doi.org/10.1016/j.ins.2019.01.035

Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining

Y. Zhang, J. C. Niebles and A. Soto, "Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining," 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 2019, pp. 349-357, doi: 10.1109/WACV.2019.00043

A key aspect of visual question answering (VQA) models that are interpretable is their ability to ground their answers to relevant regions in the image. Current approaches with this capability rely on supervised learning and human annotated groundings to train attention mechanisms inside the VQA architecture. Unfortunately, obtaining human annotations specific for visual grounding is difficult and expensive. In this work, we demonstrate that we can effectively train a VQA architecture with grounding supervision that can be automatically obtained from available region descriptions and object annotations. We also show that our model trained with this mined supervision generates visual groundings that achieve a higher correlation with respect to manually-annotated groundings, meanwhile achieving state-of-the-art VQA accuracy.

https://doi.org/10.1109/WACV.2019.00043

Query rewriting for semantic query optimization in spatial databases

Mella, E., Rodríguez, M.A., Bravo, L. et al. Query rewriting for semantic query optimization in spatial databases. Geoinformatica 23, 79–104 (2019). https://doi.org/10.1007/s10707-018-00335-w

Query processing is an important challenge for spatial databases due to the use of complex data types that represent spatial attributes. In particular, due to the cost of spatial joins, several optimization algorithms based on indexing structures exist. The work in this paper proposes a strategy for semantic query optimization of spatial join queries. The strategy detects queries with empty results and rewrites queries to eliminate unnecessary spatial joins or to replace spatial by thematic joins. This is done automatically by analyzing the semantics imposed by the database schema through topological dependencies and topological referential integrity constraints. In this way, the strategy comes to complement current state-of-art algorithms for processing spatial join queries. The experimental evaluation with real data sets shows that the optimization strategy can achieve a decrease in the time cost of a join query using indexing structures in a spatial database management system (SDBMS).

LINK: https://doi.org/10.1007/s10707-018-00335-w

Explaining subjective perceptions of public spaces as a function of the built environment: A massive data approach

Rossetti, T., Lobel, H., Rocco, V., & Hurtubia, R. (2019). Explaining subjective perceptions of public spaces as a function of the built environment: A massive data approach. Landscape and Urban Planning, 181, 169-178. https://doi.org/10.1016/j.landurbplan.2018.09.020

People’s perceptions of the built environment influence the way they use and navigate it. Understanding these perceptions may be useful to inform the design, management and planning process of public spaces. Recently, several studies have used data collected at a massive scale and machine learning methods to quantify these perceptions, showing promising results in terms of predictive performance. Nevertheless, most of these models can be of little help in understanding users’ perceptions due to the difficulty associated with identifying the importance of each attribute of landscapes. In this work, we propose a novel approach to quantify perceptions of landscapes through discrete choice models, using semantic segmentations of images of public spaces, generated through machine learning algorithms, as explanatory variables. The proposed models are estimated using the Place Pulse dataset, with over 1.2 million perceptual indicators, and are able to provide useful insights into how users perceive the built environment as a function of its features. The models obtained are used to infer perceptual variables in the city of Santiago, Chile, and show they have a significant correlation with socioeconomic indicators.

https://doi.org/10.1016/j.landurbplan.2018.09.020

2018

On the Turing Completeness of Modern Neural Network Architectures

Jorge Pérez, Javier Marinković, Pablo Barceló. ICLR 2019 Conference. https://openreview.net/pdf?id=HyGBdo0qFm

Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.

Link: https://doi.org/10.1016/j.is.2018.06.010 https://openreview.net/pdf?id=HyGBdo0qFm

Territorial sovereignty and the end of inter-cultural diplomacy along the “Southern frontier”

Carsten-Andreas Schulz. European Journal of International Relations. https://doi.org/10.1177/1354066118814890

European politics at the turn of the 19th century saw a dramatic reduction in the number and diversity of polities as the territorial nation-state emerged as the dominant form of political organization. The transformation had a profound impact on the periphery. The study examines how embracing the principle of territoriality transformed relations between settler societies and indigenous peoples in South America. As this shift coincided with independence from Spain, Creole elites rapidly dismantled the remnants of imperial heteronomy, ending centuries of inter-cultural diplomacy. The study illustrates this shift in the case of the “Southern frontier,” where Spain had maintained a practice of treaty making with the Mapuche people since the mid-17th century. This long-standing practice broke down shortly after Chile gained independence in 1818. What followed was a policy of coercive assimilation through military conquest and forced displacement — a policy that settler societies implemented elsewhere in the 19th century. In contrast to explanations that emphasize the spread of capitalist agriculture and racist ideologies, this study argues that territoriality spelled the end of inter-cultural diplomacy along the “Southern frontier.”

Link: https://doi.org/10.1177/1354066118814890

Competitiveness of a Non-Linear Block-Space GPU Thread Map for Simplex Domains

Cristobal A. Navarro, Matthieu Vernier, Benjamin Bustos, Nancy Hitschfeld. IEEE Trans. Parallel Distrib. Syst. 29(12): 2728-2741 (2018). https://doi.org/10.1109/TPDS.2018.2849705

This work presents and studies the efficiency problem of mapping GPU threads onto simplex domains. A non-linear map λ(ω) is formulated based on a block-space enumeration principle that reduces the number of thread-blocks by a factor of approximately 2× and 6× for 2-simplex and 3-simplex domains, respectively, when compared to the standard approach. Performance results show that λ(ω) is competitive and even the fastest map when ran in recent GPU architectures such as the Tesla V100, where it reaches up to 1.5× of speedup in 2-simplex tests. In 3-simplex tests, it reaches up to 2.3× of speedup for small workloads and up to 1.25× for larger ones. The results obtained make λ(ω) a useful GPU optimization technique with applications on parallel problems that define all-pairs, all-triplets or nearest neighbors interactions in a 2-simplex or 3-simplex domain.

Link: https://doi.org/10.1109/TPDS.2018.2849705

Profiling Graphs: Order from Chaos

Aidan Hogan. WWW (Companion Volume) 2018: 1481-1482. https://doi.org/10.1145/3184558.3191603

Graphs are being increasingly adopted as a flexible data model in scenarios (e.g., Google’s Knowledge Graph, Facebook’s Graph API, Wikidata, etc.) where multiple editors are involved in content creation, where the schema is ever changing, where data are incomplete, where the connectivity of resources plays a key rolescenarios where relational models traditionally struggle. But with this flexibility comes a conceptual cost: it can be difficult to summarise and understand, at a high level, the content that a given graph contains. Hence profiling graphs becomes of increasing importance to extract order, a posteriori, from the chaotic processes by which such graphs are generated. This talk will motivate the use of graphs as a data model, abstract recent trends in graph data management, and then turn to the issue of profiling and summarising graphs: what are the goals of such profiling, the principles by which graphs can be summarised, the main techniques by which this can/could be achieved The talk will emphasise the importance of profiling graphs while highlighting a variety of open research questions yet to be tackled.

Link: https://doi.org/10.1145/3184558.3191603

Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Jan Van den Bussche, Marcelo Arenas: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018. ACM 2018. https://dl.acm.org/citation.cfm?doid=3196959

Jan Van den Bussche, Marcelo Arenas: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018. ACM 2018.

This volume contains the proceedings of PODS 2018, which include a paper for the keynote addressed by Michael Benedikt (University of Oxford), abstracts based on two invited tutorials by Rajeev Raman (University of Leicester) and Arvind Narayanan (Princeton University), and 29 contributions that were selected by the Program Committee for presentation at the symposium.

In addition, this volume also contains papers from our two “Gems of PODS” speakers, Hung Ngo (Relational AI) and Phokion G. Kolaitis (UC Santa Cruz and IBM Research – Almaden). The Gems of PODS is an event, started in 2016, where the goal is to promote understanding of past seminal PODS results to the general audience. The Gems of PODS papers were selected by the Gems of PODS committee consisting of Marcelo Arenas (chair) (Pontificia Universidad Católica de Chile), Tova Milo (Tel Aviv University) and Dan Olteanu (Oxford University).

This year, PODS continued with two submission cycles that were introduced two years ago. The first cycle allowed for the possibility for papers to be revised and resubmitted. For the first cycle, 30 papers were submitted, 7 of which were directly selected for inclusion in the proceedings, and 5 were invited for a resubmission after a revision. The quality of most of the revised papers increased substantially with respect to the first submission, and 4 out of 5 revised papers were selected for the proceedings. For the second cycle, 53 papers were submitted, 18 of which were selected, resulting in 29 papers selected overall from a total number of 83 submissions.

An important task for the Program Committee has been the selection of the PODS 2018 Best Paper Award. The committee selected the paper: “Entity Matching with Active Monotone Classification” by Yufei Tao. On behalf of the committee, we would like to extend our sincere congratulations to the author!

Since 2008, PODS assigns the ACM PODS Alberto O. Mendelzon Test-of-Time Award to a paper or a small number of papers published in the PODS proceedings ten years prior that had the most impact over the intervening decade. This year’s committee, consisting of Maurizio Lenzerini, Wim Martens and Nicole Schweikardt, selected the following paper: “The Chase Revisited” by Alin Deutsch, Alan Nash and Jeff Remmel.

Link: https://dl.acm.org/citation.cfm?doid=3196959

Organic Visualization of Document Evolution

Ignacio Pérez-Messina, Claudio Gutiérrez, Eduardo Graells-Garrido. IUI 2018: 497-501. https://doi.org/10.1145/3172944.3173004

Recent availability of data about writing processes at keystroke-granularity has enabled research on the evolution of document writing. A natural task is to develop systems that can actually show this data, that is, user interfaces that transform the data of the process of writing –today a black box– into intelligible forms. On this line, we propose a data structure that captures a document’s fine-grained history and an organic visualization that serves as an interface to it. We evaluate a proof-of-concept implementation of the system through a pilot study using documents written by students at a public university. Our results are promising and reveal facets such as general strategies adopted, local edition density and hierarchical structure of the final text.

Link: https://doi.org/10.1145/3172944.3173004

Modelling Dynamics in Semantic Web Knowledge Graphs with Formal Concept Analysis

Larry González, Aidan Hogan. WWW 2018: 1175-1184. https://doi.org/10.1145/3178876.3186016

In this paper, we propose a novel data-driven schema for large-scale heterogeneous knowledge graphs inspired by Formal Concept Analysis (FCA). We first extract the sets of properties associated with individual entities; these property sets (aka. characteristic sets) are annotated with cardinalities and used to induce a lattice based on set-containment relations, forming a natural hierarchical structure describing the knowledge graph. We then propose an algebra over such schema lattices, which allows to compute diffs between lattices (for example, to summarise the changes from one version of a knowledge graph to another), to add lattices (for example, to project future changes), and so forth. While we argue that this lattice structure (and associated algebra) may have various applications, we currently focus on the use-case of modelling and predicting the dynamic behaviour of knowledge graphs. Along those lines, we instantiate and evaluate our methods for analysing how versions of the Wikidata knowledge graph have changed over a period of 11 weeks. We propose algorithms for constructing the lattice-based schema from Wikidata, and evaluate their efficiency and scalability. We then evaluate use of the resulting schema(ta) for predicting how the knowledge graph will evolve in future versions.

Link: https://doi.org/10.1145/3178876.3186016

Querying APIs with SPARQL: Language and Worst-Case Optimal Algorithms

Matthieu Mosser, Fernando Pieressa, Juan L. Reutter, Adrián Soto, Domagoj Vrgoc. ESWC 2018: 639-654. https://doi.org/10.1007/978-3-319-93417-4_41

Although the amount of RDF data has been steadily increasing over the years, the majority of information on the Web is still residing in other formats, and is often not accessible to Semantic Web services. A lot of this data is available through APIs serving JSON documents. In this work we propose a way of extending SPARQL with the option to consume JSON APIs and integrate the obtained information into SPARQL query answers, thus obtaining a query language allowing to bring data from the “traditional” Web to the Semantic Web. Looking to evaluate these queries as efficiently as possible, we show that the main bottleneck is the amount of API requests, and present an algorithm that produces “worst-case optimal” query plans that reduce the number of requests as much as possible. We also do a set of experiments that empirically confirm the optimality of our approach.

Link: https://doi.org/10.1007/978-3-319-93417-4_41

An Introduction to Graph Data Management

Renzo Angles, Claudio Gutierrez. Graph Data Management 2018: 1-32. https://doi.org/10.1007/978-3-319-96193-4_1

Graph data management concerns the research and development of powerful technologies for storing, processing and analyzing large volumes of graph data. This chapter presents an overview about the foundations and systems for graph data management. Specifically, we present a historical overview of the area, studied graph database models, characterized essential graph-oriented queries, reviewed graph query languages, and explore the features of current graph data management systems (i.e. graph databases and graph-processing frameworks).

Link: https://doi.org/10.1007/978-3-319-96193-4_1

A compact representation for trips over networks built on self-indexes

Nieves R. Brisaboa, Antonio Fariña, Daniil Galaktionov, M. Andrea Rodríguez. Inf. Syst. 78: 1-22 (2018) https://doi.org/10.1016/j.is.2018.06.010

Representing the movements of objects (trips) over a network in a compact way while retaining the capability of exploiting such data effectively is an important challenge of real applications. We present a new Compact Trip Representation (CTR) that handles the spatio-temporal data associated with users’ trips over transportation networks. Depending on the network and types of queries, nodes in the network can represent intersections, stops, or even street segments.

CTR represents separately sequences of nodes and the time instants when users traverse these nodes. The spatial component is handled with a data structure based on the well-known Compressed Suffix Array (CSA), which provides both a compact representation and interesting indexing capabilities. The temporal component is self-indexed with either a Hu–Tucker-shaped Wavelet-Tree or a Wavelet Matrix that solve range-interval queries efficiently. We show how CTR can solve relevant counting-based spatial, temporal, and spatio-temporal queries over large sets of trips. Experimental results show the space requirements (around 50–70% of the space needed by a compact non-indexed baseline) and query efficiency (most queries are solved in the range of 1–1000 µs) of CTR.

Highlights: -We provide a representation for trips over networks and answer counting-based queries. -We adapt a Compressed Suffix Array to deal with the spatial component of trips. -We use a wavelet matrix or a Hu–tucker-shaped Wavelet Tree for the temporal component. -Experiments show space needs until a 50% when compared with a plain representation. -Experiments show counting-based query-times typically within 1–1000 µs.

Link: https://doi.org/10.1016/j.is.2018.06.010

Assessing the public transport travel behavior consistency from smart card data

Catalina Espinoza, Marcela Munizaga, Benjamín Bustos, Martin Trépanier. Transportation Research Procedia 32:44-53. Elsevier, 2018. https://doi.org/10.1016/j.trpro.2018.10.008

The aim of this research is to measure, using smart card data, how much do public transport users change their behavior through time. To quantify the change in behavior, we split the smart card records of a user into a set of separated time windows. Then, we measure the variation between each pair of time windows. Three algorithms that calculate the variation in users’ mobility are assessed. Using data from a Canadian transport system, we show that measuring the stability of user behavior at an individual level provides new insights for public transport operators, e.g., it can be used to measure users’ adaptability to changes in the transport system.

DOI https://doi.org/10.1016/j.trpro.2018.10.008

Efficacy and the Reproduction of Political Activism. Evidence from the Broad Front in Uruguay

Bentancur, V. P., Rodríguez, R. P., & Rosenblatt, F. (2019). Efficacy and the Reproduction of Political Activism: Evidence From the Broad Front in Uruguay. Comparative Political Studies, 52(6), 838–867. https://doi.org/10.1177/0010414018806528

The professionalization of politics and the disappearance of party organizations based on activists seems an inescapable trend. This article shows, by studying the Broad Front of Uruguay as a deviant case, the relevance of organizational rules for explaining the reproduction of party activism. Using data from both an online survey of people differing in their levels of engagement with the Broad Front and in-depth interviews with party activists, we show that those with relatively low levels of engagement—“adherents”—and activists differ in their willingness to cooperate with the party and in the time they devote to party activities. Also, we find that reducing the perceived efficacy of political engagement strongly decreases activists’ self-reported willingness to engage with the party, while this reduction has no effect upon adherents. These findings suggest that the design of organizational rules that grant a political role to grassroots organizers can promote party activism.

DOI https://doi.org/10.1177/0010414018806528

Using Compressed Suffix-Arrays for a compact representation of temporal-graphs

Nieves R. Brisaboa, Diego Caro, Antonio Fariña, M. Andrea Rodríguez. Inf. Sci. 465: 459-483 (2018) https://doi.org/10.1016/j.ins.2018.07.023

Temporal graphs represent binary relationships that change along time. They can model the dynamism of, for example, social and communication networks. Temporal graphs are defined as sets of contacts that are edges tagged with the temporal intervals when they are active. This work explores the use of the Compressed Suffix Array (CSA), a well-known compact and self-indexed data structure in the area of text indexing, to represent large temporal graphs. The new structure, called Temporal Graph CSA (TGCSA), is experimentally compared with the most competitive compact data structures in the state-of-the-art, namely, EdgeLog and CET. The experimental results show that TGCSA obtains a good space-time trade-off. It uses a reasonable space and is efficient for solving complex temporal queries. Furthermore, TGCSA has wider expressive capabilities than EdgeLog and CET, because it is able to represent temporal graphs where contacts on an edge can temporally overlap.

Highlights: -We consider the problem of representing temporal graphs in a compact way. -We can represent temporal graphs with contacts on a same edge that temporally overlap. -We design TGCSA and SHOW how it solves typical temporal queries. -We create a novel representation of Ψ that improves the performance of TGCSA. -We obtain a reasonable space/time tradeoff even on complex temporal queries.

Link: https://doi.org/10.1016/j.ins.2018.07.023

GraFa: Faceted Search & Browsing for the Wikidata Knowledge Graph

José Moreno-Vega, Aidan Hogan. International Semantic Web Conference (P&D/Industry/BlueSky) 2018. http://ceur-ws.org/Vol-2180/paper-44.pdf

We present a demo of the GraFa faceted search and browsing interface over the Wikidata knowledge graph. We describe the key aspects of the interface, including the types of interactions that the system allows, the ranking schemes employed, and other features to aid usability. We also discuss future plans for improving the system. Online Demo: http://grafa.dcc.uchile.cl/

Link: http://ceur-ws.org/Vol-2180/paper-44.pdf

DockerPedia: a Knowledge Graph of Docker Images

Maximiliano Osorio, Carlos Buil Aranda, Hernán Vargas. International Semantic Web Conference (P&D/Industry/BlueSky) 2018. http://ceur-ws.org/Vol-2180/paper-47.pdf

Docker is the most popular implementation of Operating System virtualization, currently its online registry service (Docker Hub) stores more than 4.5 millions of software images. Using that registry it is possible to download and deploy Docker images as software containers. However, these images only show information of the main software, hiding the dependencies needed to run it. To allow users to track what they deploy into their machines, we developed DockerPedia, a resource that publishes information of the packages within the Docker images as Linked Data.

Currently our resource includes 28% of the most downloaded images from Docker Hub providing information about the software dependencies and its vulnerabilities allowing to easily reproduce the environment in which each image was deployed as well as to check the security of the image without the need to download it.

Link: http://ceur-ws.org/Vol-2180/paper-47.pdf

QCan: Normalising Congruent SPARQL Queries

Jaime Salas, Aidan Hogan. International Semantic Web Conference (P&D/Industry/BlueSky) 2018. http://ceur-ws.org/Vol-2180/paper-54.pdf

We demonstrate a system to canonicalise (aka. normalise) SPARQL queries for use-cases such as caching, query log analysis, query minimisation, signing queries, etc. Our canonicalisation method deterministically rewrites a given input query to an equivalent canonical form such that the results for two queries are syntactically (string) equal if and only if they give the same results on any database, modulo variable names. The method is sound and complete for a monotone fragment of SPARQL with selection (equalities), projection, join and union under both set and bag semantics. Considering other SPARQL features (e.g., optional, filter, graph, etc.), the underlying equivalence problem becomes undecidable, where we currently rather support a best-effort canonicalisation for other SPARQL 1.0. features. We demonstrate a prototype of our canonicalisation framework, provide example rewritings, and discuss limitations, use-cases and future work. Demo link: http://qcan.dcc.uchile.cl

Link: http://ceur-ws.org/Vol-2180/paper-54.pdf

A Tractable Notion of Stratification for SHACL

Julien Corman, Juan L. Reutter, Ognjen Savkovic. International Semantic Web Conference (P&D/Industry/BlueSky) 2018. http://ceur-ws.org/Vol-2180/paper-11.pdf

One of the challenges of recent RDF-based applications is managing data quality [1], and several systems already provide RDF validation procedures (e.g., https://www. stardog.com/docs/, https://www.topquadrant.com/technology/shacl/). This created the need for a standardized declarative constraint language for RDF, and for mechanisms to detect violations of such constraints. An important step in this direction is SHACL, or Shapes Constraint Language (https://www.w3.org/TR/shacl/) which has become a W3C recommendation in 2017. The SHACL specification however leaves explicitly undefined the validation of recursive constraints. In a previous article [2], we showed that extending the specification’s semantics to accommodate for recursion leads to intractability (in the size of the graph) for the so-called “core constraint components” of SHACL. This result holds for stratified constraints already, which may come as a surprise, considering that stratification guarantees tractability in well-studied recursive languages such as Datalog. Our previous work identified a tractable fragment of SHACL’s core components. In this paper, we propose an alternative approach to gain tractability, retaining all SHACL operators, but strengthening the stratification condition traditionally used in logic programming. More exactly, we introduce a syntactic condition on shape constraints called “strict stratification”, which guarantees that graph validation is in PTIME in combined (i.e. graph and constraints) complexity. We also describe a procedure to perform such validation. The current paper is not self-contained, due to space limitations, but all definitions can be found in our previous article [2] or its online extended version [3].

Link: http://ceur-ws.org/Vol-2180/paper-11.pdf

Machine Translation vs. Multilingual Approaches for Entity Linking

Henry Rosales-Méndez, Aidan Hogan, Barbara Poblete. International Semantic Web Conference (P&D/Industry/BlueSky) 2018. http://ceur-ws.org/Vol-2180/paper-53.pdf

Entity Linking (EL) associates the entities mentioned in a given input text with their corresponding knowledge-base (KB) entries. A recent EL trend is towards multilingual approaches. However, one may ask: are multilingual EL approaches necessary with recent advancements in machine translation? Could we not simply focus on supporting one language in the EL system and translate the input text to that language? We present experiments along these lines comparing multilingual EL systems with their results over machine translated text.

Link: http://ceur-ws.org/Vol-2180/paper-53.pdf

End-to-End Joint Semantic Segmentation of Actors and Actions in Video

Jingwei Ji, Shyamal Buch, Alvaro Soto, Juan Carlos Niebles. ECCV (4) 2018: 734-749. https://doi.org/10.1007/978-3-030-01225-0_43

Traditional video understanding tasks include human action recognition and actor/object semantic segmentation. However, the combined task of providing semantic segmentation for different actor classes simultaneously with their action class remains a challenging but necessary task for many applications. In this work, we propose a new end-to-end architecture for tackling this task in videos. Our model effectively leverages multiple input modalities, contextual information, and multitask learning in the video to directly output semantic segmentations in a single unified framework. We train and benchmark our model on the Actor-Action Dataset (A2D) for joint actor-action semantic segmentation, and demonstrate state-of-the-art performance for both segmentation and detection. We also perform experiments verifying our approach improves performance for zero-shot recognition, indicating generalizability of our jointly learned feature space.

Link: https://doi.org/10.1007/978-3-030-01225-0_43

Towards Explanations for Visual Recommender Systems of Artistic Images

Vicente Domínguez, Pablo Messina, Christoph Trattner, Denis Parra. IntRS@RecSys 2018: 69-73. http://ceur-ws.org/Vol-2225/paper10.pdf

Explaining automatic recommendations is an active area of research since it has shown an important eect on users’ acceptance over the items recommended. However, there is a lack of research in explaining content-based recommendations of images based on visual features. In this paper, we aim to ll this gap by testing three dierent interfaces (one baseline and two novel explanation interfaces) for artistic image recommendation. Our experiments with N=121 users conrm that explanations of recommendations in the image domain are useful and increase user satisfaction, perception of explainability, relevance, and diversity. Furthermore, our experiments show that the results are also dependent on the underlying recommendation algorithm used. We tested the interfaces with two algorithms: Deep Neural Networks (DNN), with high accuracy but with dicult to explain features, and the more explainable method based on Aractiveness Visual Features (AVF). e beer the accuracy performance –in our case the DNN method– the stronger the positive eect of the explainable interface. Notably, the explainable features of the AVF method increased the perception of explainability but did not increase the perception of trust, unlike DNN, which improved both dimensions. ese results indicate that algorithms in conjunction with interfaces play a signicant role in the perception of explainability and trust for image recommendation. We plan to further investigate the relationship between interface explainability and algorithmic performance in recommender systems.

Link: http://ceur-ws.org/Vol-2225/paper10.pdf

New Structures to Solve Aggregated Queries for Trips over Public Transportation Networks

Nieves R. Brisaboa, Antonio Fariña, Daniil Galaktionov, Tirso V. Rodeiro, M. Andrea Rodríguez. SPIRE 2018: 88-101. https://doi.org/10.1007/978-3-030-00479-8_8

Representing the trajectories of mobile objects is a hot topic from the widespread use of smartphones and other GPS devices. However, few works have focused on representing trips over public transportation networks (buses, subway, and trains) where user’s trips can be seen as a sequence of stages performed within a vehicle shared with many other users. In this context, representing vehicle journeys reduces the redundancy because all the passengers inside a vehicle share the same arrival time for each stop. In addition, each vehicle journey follows exactly the sequence of stops corresponding to its line, which makes it unnecessary to represent that sequence for each journey.

To solve data management for transportation systems, we designed a conceptual model that gave us a better insight into this data domain and allowed us the definition of relevant terms and the detection of redundancy sources among those data. Then, we designed two compact representations focused on users’ trips (𝖳𝖳𝖢𝖳𝖱) and on vehicle trips (𝖠𝖼𝗎𝗆𝖬), respectively. Each approach owns some strengths and is able to answer some queries efficiently.

We include experimental results over synthetic trips generated from accurate schedules obtained from a real network description (from the bus transportation system of Madrid) to show the space/time trade-off of both approaches. We considered a wide range of different queries about the use of the transportation network such as counting-based/aggregate queries regarding the load of any line of the network at different times.

Link: https://doi.org/10.1007/978-3-030-00479-8_8

VoxEL: A Benchmark Dataset for Multilingual Entity Linking

Henry Rosales-Méndez, Aidan Hogan, Barbara Poblete. International Semantic Web Conference (2) 2018: 170-186. https://doi.org/10.1007/978-3-030-00668-6_11

The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with corresponding entities in a given knowledge base. While traditional EL approaches have largely focused on English texts, current trends are towards language-agnostic or otherwise multilingual approaches that can perform EL over texts in many languages. One of the obstacles to ongoing research on multilingual EL is a scarcity of annotated datasets with the same text in different languages. In this work we thus propose VoxEL: a manually-annotated gold standard for multilingual EL featuring the same text expressed in five European languages. We first motivate and describe the VoxEL dataset, using it to compare the behaviour of state of the art EL (multilingual) systems for five different languages, contrasting these results with those obtained using machine translation to English. Overall, our results identify how five state-of-the-art multilingual EL systems compare for various languages, how the results of different languages compare, and further suggest that machine translation of input text to English is now a competitive alternative to dedicated multilingual EL configurations.

Link: https://doi.org/10.1007/978-3-030-00668-6_11

Robust Detection of Extreme Events Using Twitter: Worldwide Earthquake Monitoring

Barbara Poblete, Jheser Guzman, Jazmine A. Maldonado Flores, Felipe A. Tobar. IEEE Trans. Multimedia 20(10): 2551-2561 (2018). https://doi.org/10.1109/TMM.2018.2855107

Timely detection and accurate description of extreme events, such as natural disasters and other crisis situations, are crucial for emergency management and mitigation. Extreme-event detection is challenging, since one has to rely upon reports from human observers appointed to specific geographical areas, or on an expensive and sophisticated infrastructure. In the case of earthquakes, geographically dense sensor networks are expensive to deploy and maintain. Therefore, only some regions-or even countries-are able to acquire useful information about the effects of earthquakes in their own territory. An inexpensive and viable alternative to this problem is to detect extreme real-world events through people’s reactions in online social networks. In particular, Twitter has gained popularity within the scientific community for providing access to real-time “citizen sensor” activity. Nevertheless, the massive amount of messages in the Twitter stream, along with the noise it contains, underpin a number of difficulties when it comes to Twitter-based event detection. We contribute to address these challenges by proposing an online method for detecting unusual bursts in discrete-time signals extracted from Twitter. This method only requires a one-off semisupervised initialization and can be scaled to track multiple signals in a robust manner. We also show empirically how our proposed approach, which was envisioned for generic event detection, can be adapted for worldwide earthquake detection, where we compare the proposed model to the state of the art for earthquake tracking using social media. Experimental results validate our approach as a competitive alternative in terms of precision and recall to leading solutions, with the advantage of implementation simplicity and worldwide scalability.

Link: https://doi.org/10.1109/TMM.2018.2855107

GraFa: Scalable Faceted Browsing for RDF Graphs

José Moreno-Vega, Aidan Hogan. International Semantic Web Conference (1) 2018: 301-317. https://doi.org/10.1007/978-3-030-00671-6_18

Faceted browsing has become a popular paradigm for user interfaces on the Web and has also been investigated in the context of RDF graphs. However, current faceted browsers for RDF graphs encounter performance issues when faced with two challenges: scale, where large datasets generate many results, and heterogeneity, where large numbers of properties and classes generate many facets. To address these challenges, we propose GraFa: a faceted browsing system for heterogeneous large-scale RDF graphs based on a materialisation strategy that performs an offline analysis of the input graph in order to identify a subset of the exponential number of possible facet combinations that are candidates for indexing. In experiments over Wikidata, we demonstrate that materialisation allows for displaying (exact) faceted views over millions of diverse results in under a second while keeping index sizes relatively small. We also present initial usability studies over GraFa.

Link: https://doi.org/10.1007/978-3-030-00671-6_18

A Multi-resolution Approximation for Time Series

Sanchez, H., Bustos, B. A Multi-resolution Approximation for Time Series. Neural Process Lett 52, 75–96 (2020). https://doi.org/10.1007/s11063-018-9929-y

Time series is a common and well-known way for describing temporal data. However, most of the state-of-the-art techniques for analysing time series have focused on generating a representation for a single level of resolution. For analysing of a time series at several levels of resolutions, one would require to compute different representations, one for each resolution level. We introduce a multi-resolution representation for time series based on local trends and mean values. We require the level of resolution as parameter, but it can be automatically computed if we consider the maximum resolution of the time series. Our technique represents a time series using trend-value pairs on each segment belonging to a resolution level. To provide a useful representation for data mining tasks, we also propose dissimilarity measures and a symbolic representation based on the SAX technique for efficient similarity search using a multi-resolution indexing scheme. We evaluate our method for classification and discord discovery tasks over a diversity of data domains, achieving a better performance in terms of efficiency and effectiveness compared with some of the best-known classic techniques. Indeed, for some of the experiments, the time series mining algorithms using our multi-resolution representation were an order of magnitude faster, in terms of distance computations, than the state of the art.

https://doi.org/10.1007/s11063-018-9929-y

Do better ImageNet models transfer better… for image recommendation?

Felipe del-Rio, Pablo Messina, Vicente Dominguez, Denis Parra. CoRR abs/1807.09870 (2018) https://arxiv.org/abs/1807.09870

Visual embeddings from Convolutional Neural Networks (CNN) trained on the ImageNet dataset for the ILSVRC challenge have shown consistently good performance for transfer learning and are widely used in several tasks, including image recommendation. However, some important questions have not yet been answered in order to use these embeddings for a larger scope of recommendation domains: a) Do CNNs that perform better in ImageNet are also better for transfer learning in content-based image recommendation?, b) Does fine-tuning help to improve performance? and c) Which is the best way to perform the fine-tuning?

In this paper we compare several CNN models pre-trained with ImageNet to evaluate their transfer learning performance to an artwork image recommendation task. Our results indicate that models with better performance in the ImageNet challenge do not always imply better transfer learning for recommendation tasks (e.g. NASNet vs. ResNet). Our results also show that fine-tuning can be helpful even with a small dataset, but not every fine-tuning works. Our results can inform other researchers and practitioners on how to train their CNNs for better transfer learning towards image recommendation systems.

Link: https://arxiv.org/abs/1807.09870

On the Progression of Situation Calculus Universal Theories with Constants

Marcelo Arenas, Jorge A. Baier, Juan S. Navarro, Sebastian Sardiña. KR 2018: 484-493. http://aaai.org/ocs/index.php/KR/KR18/paper/view/18074/17173

The progression of action theories is an important problem in knowledge representation. Progression is second-order definable and known to be first-order definable and effectively computable for restricted classes of theories. Motivated by the fact that universal theories with constants (UTCs) are expressive and natural theories whose satisfiability is decidable, in this paper we provide a thorough study of the progression of situation calculus UTCs. First, we prove that progression of a (possibly infinite) UTC is always first-order definable and results in a UTC. Though first-order definable, we show that the progression of a UTC may be infeasible, that is, it may result in an infinite UTC that is not equivalent to any finite set of first-order sentences. We then show that deciding whether %or not there is a feasible progression of a UTC is undecidable. Moreover, we show that deciding whether %or not a sentence (in an expressive fragment of first-order logic) is in the progression of a UTC is CONEXPTIME-complete, and that there exists a family of UTCs for which the size of every feasible progression grows exponentially. Finally, we discuss resolution-based approaches to compute the progression of a UTC. This comprehensive analysis contributes to a better understanding of progression in action theories, both in terms of feasibility and difficulty.

Link: https://aaai.org/ocs/index.php/KR/KR18/paper/view/18074/17173

Certain Answers for SPARQL with Blank Nodes

Daniel Hernández, Claudio Gutierrez, Aidan Hogan. International Semantic Web Conference (1) 2018: 337-353. https://doi.org/10.1007/978-3-030-00671-6_20

Blank nodes in RDF graphs can be used to represent values known to exist but whose identity remains unknown. A prominent example of such usage can be found in the Wikidata dataset where, e.g., the author of Beowulf is given as a blank node. However, while SPARQL considers blank nodes in a query as existentials, it treats blank nodes in RDF data more like constants. Running SPARQL queries over datasets with unknown values may thus lead to counter-intuitive results, which may make the standard SPARQL semantics unsuitable for datasets with existential blank nodes. We thus explore the feasibility of an alternative SPARQL semantics based on certain answers. In order to estimate the performance costs that would be associated with such a change in semantics for current implementations, we adapt and evaluate approximation techniques proposed in a relational database setting for a core fragment of SPARQL. To further understand the impact that such a change in semantics may have on query solutions, we analyse how this new semantics would affect the results of user queries over Wikidata.

Link: https://doi.org/10.1007/978-3-030-00671-6_20

Canonicalisation of Monotone SPARQL Queries

Jaime Salas, Aidan Hogan. International Semantic Web Conference (1) 2018: 600-616. https://doi.org/10.1007/978-3-030-00671-6_35

Caching in the context of expressive query languages such as SPARQL is complicated by the difficulty of detecting equivalent queries: deciding if two conjunctive queries are equivalent is NP-complete, where adding further query features makes the problem undecidable. Despite this complexity, in this paper we propose an algorithm that performs syntactic canonicalisation of SPARQL queries such that the answers for the canonicalised query will not change versus the original. We can guarantee that the canonicalisation of two queries within a core fragment of SPARQL (monotone queries with select, project, join and union) is equal if and only if the two queries are equivalent; we also support other SPARQL features but with a weaker soundness guarantee: that the (partially) canonicalised query is equivalent to the input query. Despite the fact that canonicalisation must be harder than the equivalence problem, we show the algorithm to be practical for real-world queries taken from SPARQL endpoint logs, and further show that it detects more equivalent queries than when compared with purely syntactic methods. We also present the results of experiments over synthetic queries designed to stress-test the canonicalisation method, highlighting difficult cases.

Link: https://doi.org/10.1007/978-3-030-00671-6_35

Semantics and Validation of Recursive SHACL

Julien Corman, Juan L. Reutter, Ognjen Savkovic. International Semantic Web Conference (1) 2018: 318-336. https://doi.org/10.1007/978-3-030-00671-6_19

With the popularity of RDF as an independent data model came the need for specifying constraints on RDF graphs, and for mechanisms to detect violations of such constraints. One of the most promising schema languages for RDF is SHACL, a recent W3C recommendation. Unfortunately, the specification of SHACL leaves open the problem of validation against recursive constraints. This omission is important because SHACL by design favors constraints that reference other ones, which in practice may easily yield reference cycles.

In this paper, we propose a concise formal semantics for the so-called “core constraint components” of SHACL. This semantics handles arbitrary recursion, while being compliant with the current standard. Graph validation is based on the existence of an assignment of SHACL “shapes” to nodes in the graph under validation, stating which shapes are verified or violated, while verifying the targets of the validation process. We show in particular that the design of SHACL forces us to consider cases in which these assignments are partial, or, in other words, where the truth value of a constraint at some nodes of a graph may be left unknown.

Dealing with recursion also comes at a price, as validating an RDF graph against SHACL constraints is NP-hard in the size of the graph, and this lower bound still holds for constraints with stratified negation. Therefore we also propose a tractable approximation to the validation problem.

Link: https://doi.org/10.1007/978-3-030-00671-6_19

Faster and Smaller Two-Level Index for Network-Based Trajectories

Rodrigo Rivera, M. Andrea Rodríguez, Diego Seco. SPIRE 2018: 348-362. https://doi.org/10.1007/978-3-030-00479-8_28

Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the use of a compact data structure on the bottom level of these indexes. Our experimental evaluation shows that our approach is both faster and smaller than existing solutions.

Link: https://doi.org/10.1007/978-3-030-00479-8_28

Copyless cost-register automata: Structure, expressiveness, and closure properties

Filip Mazowiecki, Cristian Riveros, Copyless cost-register automata: Structure, expressiveness, and closure properties, Journal of Computer and System Sciences, Volume 100, 2019, Pages 1-29, ISSN 0022-0000. https://doi.org/10.1016/j.jcss.2018.07.002

Cost register automata (CRA) and its subclass, copyless CRA, were recently proposed by Alur et al. as a new model for computing functions over strings. We study some structural properties, expressiveness, and closure properties of copyless CRA. We show that copyless CRA is strictly less expressive than weighted automata and is not closed under reverse operation. To find a better class we impose restrictions on copyless CRA, which ends successfully with a new robust computational model that is closed under reverse and other extensions.

DOI https://doi.org/10.1016/j.jcss.2018.07.002

Tree Path Majority Data Structures

Travis Gagie, Meng He, Gonzalo Navarro. CoRR abs/1806.01804 (2018). https://arxiv.org/abs/1806.01804

We present the first solution to τ-majorities on tree paths. Given a tree of n nodes, each with a label from [1..σ], and a fixed threshold 0<τ<1, such a query gives two nodes u and v and asks for all the labels that appear more than τ⋅|Puv| times in the path Puv from u to v, where |Puv| denotes the number of nodes in Puv. Note that the answer to any query is of size up to 1/τ. On a w-bit RAM, we obtain a linear-space data structure with O((1/τ)log∗nloglogwσ) query time. For any κ>1, we can also build a structure that uses O(nlog[κ]n) space, where log[κ]n denotes the function that applies logarithm κ times to n, and answers queries in time O((1/τ)loglogwσ). The construction time of both structures is O(nlogn). We also describe two succinct-space solutions with the same query time of the linear-space structure. One uses 2nH+4n+o(n)(H+1) bits, where H≤lgσ is the entropy of the label distribution, and can be built in O(nlogn) time. The other uses nH+O(n)+o(nH) bits and is built in O(nlogn) time w.h.p.

Link: https://arxiv.org/abs/1806.01804

Weighted Shortest Paths for RDF Graphs

Gonzalo Tartari, Aidan Hogan: WiSP. VOILA@ISWC 2018: 37-52. http://ceur-ws.org/Vol-2187/paper4.pdf

An important aspect of exploratory search over graph data is to understand what paths connect a given pair of nodes. Since the resulting paths can be manifold, various works propose ranking paths likely to be of interest to a user; these methods often rely on enumerating all such paths (up to a fixed length or number) before ranking is applied. In this paper, we instead propose applying a shortest path search on weighted versions of the graph in order to directly compute the most relevant path(s) between two nodes without fixed-length bounds, further obviating the need to enumerate irrelevant paths. We investigate weightings based on node degree, PageRank and edge frequency, contrasting the paths produced by these schemes over the Wikidata graph and discussing performance issues. Finally we conduct a user study over Wikidata where evaluators assess the quality of the paths produced; though inter-rater consensus on which paths are of most interest is low, we achieve statistically significant results to suggest that users find the weighted shortest paths more interesting than the baseline shortest paths without weights.

Link: http://ceur-ws.org/Vol-2187/paper4.pdf

Reproducibility of Computational Environments for Scientific Experiments using Container-based Virtualization

Maximiliano Osorio, Carlos Buil Aranda, Hernán Vargas. SemSci@ISWC 2018: 43-51. http://ceur-ws.org/Vol-2184/paper-05.pdf

Experiment reproducibility is the ability to run an experiment with the introduction of changes to it and getting results that are consistent with the original ones. To allow reproducibility, the scientific community encourages researchers to publish descriptions of the these experiments. However, these recommendations do not include an automated way for creating such descriptions: normally scientists have to annotate their experiments in a semi automated way. In this paper we propose a system to automatically describe computational environments used in in-silico experiments. We propose to use Operating System (OS) virtualization (containerization) for distributing software experiments throughout software images and an annotation system that will allow to describe these software images. The images are a minimal version of an OS (container) that allow the deployment of multiple isolated software packages within it.

Link: http://ceur-ws.org/Vol-2184/paper-05.pdf

Efficient Evaluation and Static Analysis for Well-Designed Pattern Trees with Projection

Pablo Barceló, Markus Kröll, Reinhard Pichler, Sebastian Skritek. ACM Trans. Database Syst. 43(2): 8:1-8:44 (2018). https://doi.org/10.1145/3233983

Conjunctive queries (CQs) fail to provide an answer when the pattern described by the query does not exactly match the data. CQs might thus be too restrictive as a querying mechanism when data is semistructured or incomplete. The semantic web therefore provides a formalism—known as (projected) well-designed pattern trees (pWDPTs)—that tackles this problem: pWDPTs allow us to formulate queries that match parts of the query over the data if available, but do not ignore answers of the remaining query otherwise. Here we abstract away the specifics of semantic web applications and study pWDPTs over arbitrary relational schemas. Since the language of pWDPTs subsumes CQs, their evaluation problem is intractable. We identify structural properties of pWDPTs that lead to (fixed-parameter) tractability of various variants of the evaluation problem. We also show that checking if a pWDPT is equivalent to one in our tractable class is in 2EXPTIME. As a corollary, we obtain fixed-parameter tractability of evaluation for pWDPTs with such good behavior. Our techniques also allow us to develop a theory of approximations for pWDPTs.

Link: https://doi.org/10.1145/3233983

Pumping Lemmas for Weighted Automata

Filip Mazowiecki, Cristian Riveros. STACS 2018: 50:1-50:14. http://drops.dagstuhl.de/opus/volltexte/2018/8498/pdf/LIPIcs-STACS-2018-50.pdf. http://drops.dagstuhl.de/opus/volltexte/2018/8498/pdf/LIPIcs-STACS-2018-50.pdf

We present three pumping lemmas for three classes of functions definable by fragments of weighted automata over the min-plus semiring and the semiring of natural numbers. As a corollary we show that the hierarchy of functions definable by unambiguous, finitely-ambiguous, polynomiallyambiguous weighted automata, and the full class of weighted automata is strict for the minplus semiring.

Link: http://drops.dagstuhl.de/opus/volltexte/2018/8498/pdf/LIPIcs-STACS-2018-50.pdf

Implicit Representation of Bigranular Rules for Multigranular Data

Stephen J. Hegner, M. Andrea Rodríguez. DEXA (1) 2018: 372-389. https://doi.org/10.1007/978-3-319-98809-2_23

Domains for spatial and temporal data are often multigranular in nature, possessing a natural order structure defined by spatial inclusion and time-interval inclusion, respectively. This order structure induces lattice-like (partial) operations, such as join, which in turn lead to join rules, in which a single domain element (granule) is asserted to be equal to, or contained in, the join of a set of such granules. In general, the efficient representation of such join rules is a difficult problem. However, there is a very effective representation in the case that the rule is bigranular; i.e., all of the joined elements belong to the same granularity, and, in addition, complete information about the (non)disjointness of all granules involved is known. The details of that representation form the focus of the paper.

Link: https://doi.org/10.1007/978-3-319-98809-2_23

Tactical distribution in local funding: The value of an aligned mayor

Bernardo Lara E., Sergio Toro M., Tactical distribution in local funding: The value of an aligned mayor, European Journal of Political Economy, Volume 56, 2019, Pages 74-89, ISSN 0176-2680 https://doi.org/10.1016/j.ejpoleco.2018.07.006.

Using Chile as a case study for understanding tactical distribution under extensive controls on expenditure, this paper examines whether political motives affect the allocation of funds from the central government to localities. Collecting local-level data of two infrastructure funding programs and using the voting gap percentage between the coalition candidate and opposition competitors in a Sharp Regression Discontinuity methodology, we find causal evidence in favor of three hypotheses: (i) a coalition criterion influences the funding allocation to the local level; (ii) an electoral cycle exists in local funding; and (iii) the degree of coalition targeting varies based on a locality’s history of coalition alignment. In sum, the central government regards politically aligned mayors as valuable electoral assets, especially in municipalities historically aligned with the coalition.

DOI https://doi.org/10.1016/j.ejpoleco.2018.07.006

First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries

Pablo Barceló, Gerald Berger, Carsten Lutz, Andreas Pieris. AMW 2018 Cambiar por versión IJCAI. http://ceur-ws.org/Vol-2100/paper8.pdf

Ontology-based data access (OBDA) is a successful application of knowledge representation and reasoning technologies in information management systems. One premier goal is to facilitate access to data that is heterogeneous and incomplete. This is achieved via an ontology that enriches the user query, typically a union of conjunctive queries, with domain knowledge. It turned out that the ontology and the user query can be seen as two components of one composite query, called ontology-mediated query (OMQ).

The problem of answering OMQs is thus central to OBDA. There is a consensus that the required level of scalability in OMQ answering can be achieved by using standard database management systems. To this end, a standard approach used nowadays is query rewriting: the ontology O and the database query q are combined into a new query qO, the so-called rewriting, which gives the same answer as the OMQ consisting of O and q over all input databases. It is of course essential that the rewriting qO is expressed in a language that can be handled by standard database systems. The typical language that is considered is the class of first-order (FO) queries.

In this work, we focus on two central OMQ languages based on guarded and frontierguarded tuple-generating dependencies (TGDs), and we study the problem whether an OMQ is FO-rewritable, i.e, it can be equivalently expressed as a first-order query. Recall that a guarded (resp., frontier-guarded) TGD is a sentence of the form ∀x, ¯ y¯(φ(¯x, y¯) → ∃z ψ¯ (¯x, z¯)), where φ and ψ are conjunctions of relational atoms, and φ has an atom that contains all the variables (¯x ∪ y¯) (resp., x¯) [1, 8]. Our goal is to develop specially tailored techniques that allow us to understand the above non-trivial problem, and also to pinpoint its computational complexity. To this end, as we discuss below, we follow two different approaches. Our results can be summarized as follows:

-We first focus on the simpler OMQ language based on guarded TGDs and atomic queries, and, in Section 2, we provide a characterization of FO-rewritability that forms the basis for applying tree automata techniques.

-We then exploit, in Section 3, standard two-way alternating parity tree automata. In particular, we reduce our problem to the problem of checking the finiteness of the language of an automaton. The reduction relies on a refined version of the characterization of FO-rewritability established in Section 2. This provides a transparent solution to our problem based on standard tools, but it does not lead to an optimal result.

-Towards an optimal result, we use, in Section 4, a more sophisticated automata model, known as cost automata. In particular, we reduce our problem to the problem of checking the boundedness of a cost automaton. This allows us to show that FOrewritability for OMQs based on guarded TGDs and atomic queries is in 2EXPTIME, and in EXPTIME for predicates of bounded arity. The complexity analysis relies on an intricate result on the boundedness problem for a certain class of cost automata [5, 9].

-Finally, in Section 5, by using the results of Section 4, we provide a complete picture for the complexity of our problem, i.e., deciding whether an OMQ based on (frontier-)guarded TGDs and arbitrary (unions of) conjunctive queries is FO-rewritable.

Link: http://ceur-ws.org/Vol-2100/paper8.pdf

First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries

Pablo Barceló, Gerald Berger, Carsten Lutz, Andreas Pieris. IJCAI 2018: 1707-1713. https://doi.org/10.24963/ijcai.2018/236

We focus on ontology-mediated queries (OMQs) based on (frontier-)guarded existential rules and (unions of) conjunctive queries, and we investigate the problem of FO-rewritability, i.e., whether an OMQ can be rewritten as a first-order query. We adopt two different approaches. The first approach employs standard two-way alternating parity tree automata. Although it does not lead to a tight complexity bound, it provides a transparent solution based on widely known tools. The second approach relies on a sophisticated automata model, known as cost automata. This allows us to show that our problem is 2EXPTIME-complete. In both approaches, we provide semantic characterizations of FO-rewritability that are of independent interest.

Link: https://doi.org/10.24963/ijcai.2018/236

SynKit: LTL Synthesis as a Service

Alberto Camacho, Christian J. Muise, Jorge A. Baier, Sheila A. McIlraith. IJCAI 2018: 5817-5819. https://doi.org/10.24963/ijcai.2018/848

Automatic synthesis of software from specification is one of the classic problems in computer science. In the last decade, significant advances have been made in the synthesis of programs from specifications expressed in Linear Temporal Logic (LTL). LTL synthesis technology is central to a myriad of applications from the automated generation of controllers for Internet of Things devices, to the synthesis of control software for robotic applications. Unfortunately, the number of existing tools for LTL synthesis is limited, and using them requires specialized expertise. In this paper we present SynKit, a tool that offers LTL synthesis as a service. SynKit integrates a RESTful API and a web service with an editor, a solver, and a strategy visualizer.

Link: https://doi.org/10.24963/ijcai.2018/848

A Model of Distributed Query Computation in Client-Server Scenarios on the Semantic Web

Olaf Hartig, Ian Letter, Jorge Pérez. IJCAI 2018: 5259-5263. https://doi.org/10.24963/ijcai.2018/733

This paper provides an overview of a model for capturing properties of client-server-based query computation setups. This model can be used to formally analyze different combinations of client and server capabilities, and compare them in terms of various fine-grain complexity measures. While the motivations and the focus of the presented work are related to querying the Semantic Web, the main concepts of the model are general enough to be applied in other contexts as well.

Link: https://doi.org/10.24963/ijcai.2018/733

LTL Realizability via Safety and Reachability Games

Alberto Camacho, Christian J. Muise, Jorge A. Baier, Sheila A. McIlraith. IJCAI 2018: 4683-4691. https://doi.org/10.24963/ijcai.2018/651

In this paper, we address the problem of LTL realizability and synthesis. State of the art techniques rely on so-called bounded synthesis methods, which reduce the problem to a safety game. Realizability is determined by solving synthesis in a dual game. We provide a unified view of duality, and introduce novel bounded realizability methods via reductions to reachability games. Further, we introduce algorithms, based on AI automated planning, to solve these safety and reachability games. This is the the first complete approach to LTL realizability and synthesis via automated planning. Experiments illustrate that reductions to reachability games are an alternative to reductions to safety games, and show that planning can be a competitive approach to LTL realizability and synthesis.

Link: https://doi.org/10.24963/ijcai.2018/651

A More General Theory of Static Approximations for Conjunctive Queries

Pablo Barceló, Miguel Romero, Thomas Zeume. ICDT 2018: 7:1-7:22. http://dx.doi.org/10.4230/LIPIcs.ICDT.2018.7

Conjunctive query (CQ) evaluation is NP-complete, but becomes tractable for fragments of bounded hypertreewidth. If a CQ is hard to evaluate, it is thus useful to evaluate an approximation of it in such fragments. While underapproximations (i.e., those that return correct answers only) are well-understood, the dual notion of overapproximations that return complete (but not necessarily sound) answers, and also a more general notion of approximation based on the symmetric difference of query results, are almost unexplored. In fact, the decidability of the basic problems of evaluation, identification, and existence of those approximations, is open. We develop a connection with existential pebble game tools that allows the systematic study of such problems. In particular, we show that the evaluation and identification of overapproximations can be solved in polynomial time. We also make progress in the problem of existence of overapproximations, showing it to be decidable in 2EXPTIME over the class of acyclic CQs. Furthermore, we look at when overapproximations do not exist, suggesting that this can be alleviated by using a more liberal notion of overapproximation. We also show how to extend our tools to study symmetric difference approximations. We observe that such approximations properly extend under- and over-approximations, settle the complexity of its associated identification problem, and provide several results on existence and evaluation.

Link: http://dx.doi.org/10.4230/LIPIcs.ICDT.2018.7

G-CORE: A Core for Future Graph Query Languages

Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, Hannes Voigt. SIGMOD Conference 2018: 1421-1432. https://doi.org/10.1145/3183713.3190654

We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.

Link: https://doi.org/10.1145/3183713.3190654

Finite LTL Synthesis as Planning

Alberto Camacho, Jorge A. Baier, Christian J. Muise, Sheila A. McIlraith. ICAPS 2018: 29-38. https://aaai.org/ocs/index.php/ICAPS/ICAPS18/paper/view/17790

LTL synthesis is the task of generating a strategy that satisfies a Linear Temporal Logic (LTL) specification interpreted over infinite traces. In this paper we examine the problem of LTLf synthesis, a variant of LTL synthesis where the specification of the behaviour of the strategy we generate is interpreted over finite traces — similar to the assumption we make in many planning problems, and important for the synthesis of business processes and other system interactions of finite duration. Existing approaches to LTLf synthesis transform LTLf into deterministic finite-state automata (DFA) and reduce the synthesis problem to a DFA game. Unfortunately, the DFA transformation is worst-case double-exponential in the size of the formula, presenting a computational bottleneck. In contrast, our approach exploits non-deterministic automata, and we reduce the synthesis problem to a non-deterministic planning problem. We leverage our approach not only for strategy generation but also to generate certificates of unrealizability — the first such method for LTLf. We employ a battery of techniques that exploit the structure of the LTLf specification to improve the efficiency of our transformation to automata. We combine these techniques with lazy determinization of automata and on-the-fly state abstraction. We illustrate the effectiveness of our approach on a set of established LTL synthesis benchmarks adapted to finite LTL.

Link: https://aaai.org/ocs/index.php/ICAPS/ICAPS18/paper/view/17790

Containment for Rule-Based Ontology-Mediated Queries

Pablo Barceló, Gerald Berger, Andreas Pieris. PODS 2018: 267-279. https://doi.org/10.1145/3196959.3196963

Many efforts have been dedicated to identifying restrictions on ontologies expressed as tuple-generating dependencies (tgds), a.k.a. existential rules, that lead to the decidability of answering ontology-mediated queries (OMQs). This has given rise to three families of formalisms: guarded, non-recursive, and sticky sets of tgds. We study the containment problem for OMQs expressed in such formalisms, which is a key ingredient for solving static analysis tasks associated with them. Our main contribution is the development of specially tailored techniques for OMQ containment under the classes of tgds stated above. This enables us to obtain sharp complexity bounds for the problems at hand.

Link: https://doi.org/10.1145/3196959.3196963

Constant Delay Algorithms for Regular Document Spanners

Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, Domagoj Vrgoc. PODS 2018: 165-177. https://doi.org/10.1145/3196959.3196987

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants to extract from a text document, and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have good evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Towards this goal, we present a practical evaluation algorithm that allows constant delay enumeration of a spanner’s output after a precomputation phase that is linear in the document. While the algorithm assumes that the spanner is specified in a syntactic variant of variable set automata, we also study how it can be applied when the spanner is specified by general variable set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner, providing a fine grained analysis of the classes of document spanners that support efficient enumeration of their results.

Link: https://doi.org/10.1145/3196959.3196987

Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

Francisco Maturana, Cristian Riveros, Domagoj Vrgoc. PODS 2018: 125-136. https://doi.org/10.1145/3196959.3196968

Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with semistructured data, all language proposals introduced so far are designed to output relations, thus making them incapable of handling incomplete information. To remedy the situation, we propose to extend information extraction languages with the ability to use mappings, thus allowing us to work with documents which have missing or optional parts. Using this approach, we simplify the semantics of regex formulas and extraction rules, two previously defined methods for extracting information. We extend them with the ability to handle incomplete data, and study how they compare in terms of expressive power. We also study computational properties of these languages, focusing on the query enumeration problem, as well as satisfiability and containment.

Link: https://doi.org/10.1145/3196959.3196968

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

Teresa Bracamonte, Benjamin Bustos, Barbara Poblete, Tobias Schreck. Multimedia Tools Appl. 77(11): 13853-13889 (2018). https://doi.org/10.1007/s11042-017-4997-y

Since its invention, the Web has evolved into the largest multimedia repository that has ever existed. This evolution is a direct result of the explosion of user-generated content, explained by the wide adoption of social network platforms. The vast amount of multimedia content requires effective management and retrieval techniques. Nevertheless, Web multimedia retrieval is a complex task because users commonly express their information needs in semantic terms, but expect multimedia content in return. This dissociation between semantics and content of multimedia is known as the semantic gap. To solve this, researchers are looking beyond content-based or text-based approaches, integrating novel data sources. New data sources can consist of any type of data extracted from the context of multimedia documents, defined as the data that is not part of the raw content of a multimedia file. The Web is an extraordinary source of context data, which can be found in explicit or implicit relation to multimedia objects, such as surrounding text, tags, hyperlinks, and even in relevance-feedback. Recent advances in Web multimedia retrieval have shown that context data has great potential to bridge the semantic gap. In this article, we present the first comprehensive survey of context-based approaches for multimedia information retrieval on the Web. We introduce a data-driven taxonomy, which we then use in our literature review of the most emblematic and important approaches that use context-based data. In addition, we identify important challenges and opportunities, which had not been previously addressed in this area.

Link: https://doi.org/10.1007/s11042-017-4997-y

Report on the 2nd International Workshop on Recent Trends in News Information Retrieval (NewsIR’18)

Dyaa Albakour, David Corney, Julio Gonzalo, Miguel Martínez, Bárbara Poblete, Andreas Vlachos: Report on the 2nd International Workshop on Recent Trends in News Information Retrieval (NewsIR'18). SIGIR Forum 52(1): 140-146 (2018). https://doi.org/10.1145/3274784.3274799

The news industry has undergone a revolution in the past decade, with substantial changes continuing to this day. News consumption habits are changing due to the increase in the volume of news and the variety of sources. Readers need new mechanisms to cope with this vast volume of information in order to not only find a signal in the noise, but also to understand what is happening in the world given the multiple points of view describing events. These challenges in journalism relate to Information Retrieval (IR) and Natural Language Processing (NLP) fields such as: verification of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams; de-duplication of stories; and entity detection and disambiguation. Although IR and NLP have been applied to news for decades, the changing nature of the space requires fresh approaches and a closer collaboration with our colleagues from the journalism environment. Following the success of the previous version of the workshop (NewsIR’16), the goal of this workshop, held in conjunction with ECIR 2018, is to continue to stimulate such discussion between the communities and to share interesting approaches to solve real user problems. A total number of 19 submissions were received and reviewed, of which 12 were accepted for presentation. In addition to that, we had over 30 registered participants in the workshop who were pleased to attend the two keynote talks given by well-known experts in the field – Edgar Meij (from industry) and Peter Tolmie (from academia) and oral and poster presentations from the accepted papers. The workshop also included a breakout session to discuss ideas for a future data challenge in news IR and closed with a focused panel discussion to reflect on the day. In summary, several ideas were presented in the workshop on solving complex information needs in the news domain. In addition, the workshop concluded with suggestions of important challenges and shared tasks to work on as a community for News IR.

Link: https://doi.org/10.1145/3274784.3274799

Early Tracking of People’s Reaction in Twitter for Fast Reporting of Damages in the Mercalli Scale

Marcelo Mendoza, Bárbara Poblete, Ignacio Valderrama. HCI International 2018 (14) 2018: 247-257. https://doi.org/10.1007/978-3-319-91485-5_19

The Modified Mercalli Intensity Scale is a measure of the severity of an earthquake for a nonscientist. Since the Mercalli scale is based on perceived effects, it has a strong dependence on observers. Typically, these reports take time to be prepared and, as a consequence, Mercalli intensities are published hours after the occurrence of an earthquake. The National Seismological Center of Chile needs to provide a preliminary overview of the observed effects of an earthquake. This has motivated us to create a system for early tracking of people’s reaction in social networks to infer Mercalli intensities. By tracking people’s comments about the effects of an earthquake, a collection of Mercalli point estimates is retrieved at county level of granularity. We introduce the concept of Reinforced Mercalli support that combines Mercalli point estimates with social support, allowing to discard social unsupported estimates. Experimental results show that our proposal is accurate providing early Mercalli reports 30 min after an earthquake, detecting the maximum Mercalli intensity of an event with high accuracy in terms of mean absolute error (MAE).

Link: https://doi.org/10.1007/978-3-319-91485-5_19

Domain-Independent Detection of Emergency Situations Based on Social Activity Related to Geolocations

Hernán Sarmiento, Bárbara Poblete, Jaime Campos. WebSci 2018: 245-254. https://doi.org/10.1145/3201064.3201077

In general, existing methods for automatically detecting emergency situations using Twitter rely on features based on domain-specific keywords found in messages. This type of keyword-based methods usually require training on domain-specific labeled data, using multiple languages, and for different types of events (e.g., earthquakes, floods, wildfires, etc.). In addition to being costly, these approaches may fail to detect previously unexpected situations, such as uncommon catastrophes or terrorist attacks. However, collective mentions of certain keywords are not the only type of self-organizing phenomena that may arise in social media when a real-world extreme situation occurs. Just as nearby physical sensors become activated when stimulated, localized citizen sensors (i.e., users) will also react in a similar manner. To leverage this information, we propose to use self-organized activity related to geolocations to identify emergency situations. We propose to detect such events by tracking the frequencies, and probability distributions of the interarrival time of the messages related to specific locations. Using an off-the-shelf classifier that is independent of domain-specific features, we study and describe emergency situations based solely on location-based features in messages. Our findings indicate that anomalies in location-related social media user activity indeed provide information for automatically detecting emergency situations independent of their domain.

Link: https://doi.org/10.1145/3201064.3201077

What Should Entity Linking link?

Henry Rosales-Méndez, Barbara Poblete, Aidan Hogan. AMW 2018. http://ceur-ws.org/Vol-2100/paper10.pdf

Some decades have passed since the concept of “named entity” was used for the first time. Since then, new lines of research have emerged in this environment, such as linking the (named) entity mentions in a text collection with their corresponding knowledge-base entries.

However, this introduces problems with respect to a consensus on the definition of the concept of “entity” in the literature. This paper aims to highlight the importance of formalizing the concept of “entity” and the benefits it would bring to the Entity Linking community, in particular relating to the construction of gold standards for evaluation purposes.

Link: http://ceur-ws.org/Vol-2100/paper10.pdf

Towards a Robust Semantics for SHACL: Preliminary Discussion

Julien Corman, Juan L. Reutter, Ognjen Savkovic. AMW 2018. http://ceur-ws.org/Vol-2100/paper22.pdf

Validating RDF graphs against constraints has gained interest in recent years, due to the popularity of RDF and the growth of knowledge bases. SHACL, a constraint language for RDF, has recently become a W3C recommendation, with a specification detailing syntax, semantics and common use cases. Unfortunately, this (otherwise complete) specification does not cover validation against recursive constraints. This omission is important, because SHACL by design favors constraint references. We investigate the possibility of a formal semantics for SHACL which covers the recursive case, while being compliant with the current standard.

Link: http://ceur-ws.org/Vol-2100/paper22.pdf

The Property Graph Database Model

Renzo Angles. AMW 2018. http://ceur-ws.org/Vol-2100/paper26.pdf

Most of the current graph database systems have been designed to support property graphs. Surprisingly, there is no standard specification of the database model behind such systems. This paper presents a formal definition of the property graph database model. Specifically, we define the property graph data structure, basic notions of integrity constraints (e.g. graph schema), and a graph query language.

Link: http://ceur-ws.org/Vol-2100/paper26.pdf

A Simple, Efficient, Parallelizable Algorithm for Approximated Nearest Neighbors

Sebastián Ferrada, Benjamin Bustos, Nora Reyes. AMW 2018. http://ceur-ws.org/Vol-2100/paper3.pdf

The use of the join operator in metric spaces leads to what is known as a similarity join, where objects of two datasets are paired if they are somehow similar. We propose an heuristic that solves the 1-NN selfsimilarity join, that is, a similarity join of a dataset with itself, that brings together each element with its nearest neighbor within the same dataset. Solving the problem using a simple brute-force algorithm requires O(n 2 ) distance calculations, since it requires to compare every element against all others. We propose a simple divide-and-conquer algorithm that gives an approximated solution for the self-similarity join that computes only O(n 3 2 ) distances. We show how the algorithm can be easily modified in order to improve the precision up to 31% (i.e., the percentage of correctly found 1-NNs) and such that 79% of the results are within the 10-NN, with no significant extra distance computations. We present how the algorithm can be executed in parallel and prove that using Θ( √ n) processors, the total execution takes linear time. We end discussing ways in which the algorithm can be improved in the future.

Link: http://ceur-ws.org/Vol-2100/paper3.pdf

Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management

Dan Olteanu, Barbara Poblete. Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, Cali, Colombia, May 21-25, 2018. CEUR Workshop Proceedings 2100, CEUR-WS.org 2018. http://ceur-ws.org/Vol-2100/

AMW 2018, Alberto Mendelzon Workshop on Foundations of Data Management.

Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management Cali, Colombia, May 21-25, 2018.

Edited by
Dan Olteanu, University of Oxford, UK
Bárbara Poblete, University of Chile, Chile.

Link: http://ceur-ws.org/Vol-2100/

A Data-Driven Graph Schema

Larry González, Aidan Hogan. AMW 2018. http://ceur-ws.org/Vol-2100/paper23.pdf

Graph-based data models [1] have become increasingly common in data management scenarios that require flexibility beyond what is offered by traditional relational databases. Such flexibility is particularly important in Web scenarios, where potentially many users may be involved (either directly or indirectly) in the creation, management, and curation of data. An example of such a scenario is the Wikidata knowledge graph [2] where users can add new properties and types that can be used to define further data. The flip-side of flexibility is higher levels of heterogeneity. Conceptually understanding the current state of a knowledge graph – in terms of what data it contains, what it is missing, how it can be effectively queried, what has changed recently, etc. – is thus a major challenge: it is unclear how to distil an adequate, high-level description that captures an actionable overview of knowledge graphs. We thus need well-founded methodologies to make sense of knowledge graphs, where an obvious approach is to define some notion of schema for such graphs.

The traditional approach in the Semantic Web has been what Pham and Boncz [3] call the schema first approach, which defines the schema that the data should follow. The most established language for specifying such schemas is RDFS. An alternative to the schema first approach is the schema last approach [3], which foregoes an upfront schema and rather lets the data evolve naturally; thereafter, the goal is to understand what the legacy graph data contain by extracting highlevel summaries that characterise the graph, resulting in a data-driven schema. In this paper, we summarise recently published results [4] on a novel approach to compute a data-driven schema from knowledge graphs. We believe that such schemas are useful for understanding what a knowledge graph contains, and how it can be queried, among several other use-cases. Nevertheless, in this work we focus on the use-case of predicting how the knowledge graph will evolve in future versions, which could be used for measuring the time-to-live of cached SPARQL results, identifying missing properties for entities, etc.

Link: http://ceur-ws.org/Vol-2100/paper23.pdf

The Data Readiness Problem for Relational Databases

Rada Chirkova, Jon Doyle, Juan L. Reutter. AMW 2018. http://ceur-ws.org/Vol-2100/paper21.pdf

We consider the problem of determining whether organizations facing a new data-transformation task can avoid building a new transformation procedure from scratch by reusing their stored procedures. Because it can be difficult to obtain exact descriptions of what stored procedures do, our framework abstracts data-transforming tools as black-box procedures, in which a procedure description indicates the parts of the database that might be modified by the procedure and constraints on the states of the database that must hold before and after the application of this procedure.

In this paper we present our framework and study the problem of determining, given a database and a set of procedures, whether there is a sequence of procedures from this set such that their application to the database results in the satisfaction of a boolean query. This data readiness problem is undecidable in general, but we show decidability for a broad and realistic class of procedures.

Link: http://ceur-ws.org/Vol-2100/paper21.pdf

Semantics and Complexity of GraphQL

Olaf Hartig, Jorge Pérez. WWW 2018: 1155-1164. https://doi.org/10.1145/3178876.3186014

GraphQL is a recently proposed, and increasingly adopted, conceptual framework for providing a new type of data access interface on the Web. The framework includes a new graph query language whose semantics has been specified informally only. This has prevented the formal study of the main properties of the language. We embark on the formalization and study of GraphQL. To this end, we first formalize the semantics of GraphQL queries based on a labeled-graph data model. Thereafter, we analyze the language and show that it admits really efficient evaluation methods. In particular, we prove that the complexity of the GraphQL evaluation problem is NL-complete. Moreover, we show that the enumeration problem can be solved with constant delay. This implies that a server can answer a GraphQL query and send the response byte-by-byte while spending just a constant amount of time between every byte sent. Despite these positive results, we prove that the size of a GraphQL response might be prohibitively large for an internet scenario. We present experiments showing that current practical implementations suffer from this issue. We provide a solution to cope with this problem by showing that the total size of a GraphQL response can be computed in polynomial time. Our results on polynomial-time size computation plus the constant-delay enumeration can help developers to provide more robust GraphQL interfaces on the Web.

Link: https://doi.org/10.1145/3178876.3186014

Querying Wikimedia Images using Wikidata Facts

Sebastián Ferrada, Nicolás Bravo, Benjamín Bustos, Aidan Hogan. WWW (Companion Volume) 2018: 1815-1821. https://doi.org/10.1145/3184558.3191646

Despite its importance to the Web, multimedia content is often neglected when building and designing knowledge-bases: though descriptive metadata and links are often provided for images, video, etc., the multimedia content itself is often treated as opaque and is rarely analysed. IMGpedia is an effort to bring together the images of Wikimedia Commons (including visual information), and relevant knowledge-bases such as Wikidata and DBpedia. The result is a knowledge-base that incorporates similarity relations between the images based on visual descriptors, as well as links to the resources of Wikidata and DBpedia that relate to the image. Using the IMGpedia SPARQL endpoint, it is then possible to perform visuo-semantic queries, combining the semantic facts extracted from the external resources and the similarity relations of the images. This paper presents a new web interface to browse and explore the dataset of IMGpedia in a more friendly manner, as well as new visuo-semantic queries that can be answered using 6 million recently added links from IMGpedia to Wikidata. We also discuss future directions we foresee for the IMGpedia project.

Link: https://doi.org/10.1145/3184558.3191646

Building Knowledge Maps of Web Graphs

Valeria Fionda, Giuseppe Pirrò, Claudio Gutierrez. WWW (Companion Volume) 2018: 479-482. https://doi.org/10.1145/3184558.3186237

We research the problem of building knowledge maps of graph-like information. We live in the digital era and similarly to the Earth, the Web is simply too large and its interrelations too complex for anyone to grasp much of it through direct observation. Thus, the problem of applying cartographic principles also to digital landscapes is intriguing. We introduce a mathematical formalism that captures the general notion of map of a graph and enables its development and manipulation in a semi-automated way. We describe an implementation of our formalism on the Web of Linked Data graph and discuss algorithms that efficiently generate and combine (via an algebra) regions and maps. Finally, we discuss examples of knowledge maps built with a tool implementing our framework.

Link: https://doi.org/10.1145/3184558.3186237

Workshop on Linked Data on the Web co-located with The Web Conference 2018

Tim Berners-Lee, Sarven Capadisli, Stefan Dietze, Aidan Hogan, Krzysztof Janowicz, Jens Lehmann. LDOW@WWW 2018, Lyon, France April 23rd, 2018. CEUR Workshop Proceedings 2073, CEUR-WS.org 2018. https://dblp.org/db/conf/www/ldow2018.html

Tim Berners-Lee, Sarven Capadisli, Stefan Dietze, Aidan Hogan, Krzysztof Janowicz, Jens Lehmann. LDOW@WWW 2018, Lyon, France April 23rd, 2018. CEUR Workshop Proceedings 2073, CEUR-WS.org 2018.

Link: https://dblp.org/db/conf/www/ldow2018.html

Synthesizing Controllers: On the Correspondence Between LTL Synthesis and Non-deterministic Planning

Alberto Camacho, Jorge A. Baier, Christian J. Muise, Sheila A. McIlraith. Canadian Conference on AI 2018: 45-59. https://doi.org/10.1007/978-3-319-89656-4_4

Linear Temporal Logic ( 𝖫𝖳𝖫 ) synthesis can be understood as the problem of building a controller that defines a winning strategy, for a two-player game against the environment, where the objective is to satisfy a given 𝖫𝖳𝖫 formula. It is an important problem with applications in software synthesis, including controller synthesis. In this paper we establish the correspondence between 𝖫𝖳𝖫 synthesis and fully observable non-deterministic (FOND) planning. We study 𝖫𝖳𝖫 interpreted over both finite and infinite traces. We also provide the first explicit compilation that translates an 𝖫𝖳𝖫 synthesis problem to a FOND problem. Experiments with state-of-the-art 𝖫𝖳𝖫 FOND and synthesis solvers show automated planning to be a viable and effective tool for highly structured 𝖫𝖳𝖫 synthesis problems.

Link: https://doi.org/10.1007/978-3-319-89656-4_4

Automatically Generating Wikipedia Info-boxes from Wikidata

Tomás Sáez, Aidan Hogan. WWW (Companion Volume) 2018: 1823-1830. https://doi.org/10.1145/3184558.3191647

Info-boxes provide a summary of the most important meta-data relating to a particular entity described by a Wikipedia article. However, many articles have no info-box or have info-boxes with only minimal information; furthermore, there is a huge disparity between the level of detail available for info-boxes in English articles and those for other languages. Wikidata has been proposed as a central repository of facts to try to address such disparities, and has been used as a source of information to generate info-boxes. However, current processes still rely on human intervention either to create generic templates for entities of a given type or to create a specific info-box for a specific article in a specific language. As such, there are still many articles of Wikipedia without info-boxes but where relevant data are provided by Wikidata. In this paper, we investigate fully automatic methods to generate info-boxes for Wikipedia from the Wikidata knowledge graph. The primary challenge is to create ranking mechanisms that provide an intuitive prioritisation of the facts associated with an entity. We discuss this challenge, propose several straightforward metrics to prioritise information in info-boxes, and present an initial user evaluation to compare the quality of info-boxes generated by various metrics.

Link: https://doi.org/10.1145/3184558.3191647

TriAL: A Navigational Algebra for RDF Triplestores

Leonid Libkin, Juan L. Reutter, Adrián Soto, Domagoj Vrgoc. ACM Trans. Database Syst. 43(1): 5:1-5:46 (2018). https://doi.org/10.1145/3154385

Navigational queries over RDF data are viewed as one of the main applications of graph query languages, and yet the standard model of graph databases—essentially labeled graphs—is different from the triples-based model of RDF. While encodings of RDF databases into graph data exist, we show that even the most natural ones are bound to lose some functionality when used in conjunction with graph query languages. The solution is to work directly with triples, but then many properties taken for granted in the graph database context (e.g., reachability) lose their natural meaning.

Our goal is to introduce languages that work directly over triples and are closed, i.e., they produce sets of triples, rather than graphs. Our basic language is called TriAL, or Triple Algebra: it guarantees closure properties by replacing the product with a family of join operations. We extend TriAL with recursion and explain why such an extension is more intricate for triples than for graphs. We present a declarative language, namely a fragment of datalog, capturing the recursive algebra. For both languages, the combined complexity of query evaluation is given by low-degree polynomials. We compare our language with previously studied graph query languages such as adaptations of XPath, regular path queries, and nested regular expressions; many of these languages are subsumed by the recursive triple algebra. We also provide an implementation of recursive TriAL on top of a relational query engine, and we show its usefulness by running a wide array of navigational queries over real-world RDF data, while at the same time testing how our implementation compares to existing RDF systems.

Link: https://doi.org/10.1145/3154385

Characterising RDF data sets

Javier D. Fernández, Miguel A. Martínez-Prieto, Pablo de la Fuente Redondo, Claudio Gutiérrez. J. Information Science 44(2): 203-229 (2018). https://doi.org/10.1177/0165551516677945

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Link: https://doi.org/10.1177/0165551516677945

A Graph-Based Approach for Querying Protein-Ligand Structural Patterns

Renzo Angles, Mauricio Arenas. IWBBIO (1) 2018: 235-244. https://doi.org/10.1007/978-3-319-78723-7_20

In the context of protein engineering and biotechnology, the discovery and characterization of structural patterns is very relevant as it can give fundamental insights about protein structures. In this paper we present GSP4PDB, a bioinformatics web tool that lets the users design, search and analyze protein-ligand structural patterns inside the Protein Data Bank (PDB). The novel feature of GSP4PDB is that a protein-ligand structural pattern is graphically designed as a graph such that the nodes represent protein’s components and the edges represent structural relationships. The resulting graph pattern is transformed into a SQL query, and executed in a PostgreSQL database system where the PDB data is stored. The results of the search are presented using a textual representation, and the corresponding binding-sites can be visualized using a JSmol interface.

Link: https://doi.org/10.1007/978-3-319-78723-7_20

Learning to Leverage Microblog Information for QA Retrieval

Jose Miguel Herrera, Barbara Poblete, Denis Parra. ECIR 2018: 507-520. https://doi.org/10.1007/978-3-319-76941-7_38

Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among users. Although users tend to find good quality answers in cQA sites, they also engage in a significant volume of QA interactions in other platforms, such as microblog networking sites. This in part is explained because microblog platforms contain up-to-date information on current events, provide rapid information propagation, and have social trust.

Despite the potential of microblog platforms, such as Twitter, for automatic QA retrieval, how to leverage them for this task is not clear. There are unique characteristics that differentiate Twitter from traditional cQA platforms (e.g., short message length, low quality and noisy information), which do not allow to directly apply prior findings in the area. In this work, we address this problem by studying: (1) the feasibility of Twitter as a QA platform and (2) the discriminating features that identify relevant answers to a particular query. In particular, we create a document model at conversation-thread level, which enables us to aggregate microblog information, and set up a learning-to-rank framework, using factoid QA as a proxy task. Our experimental results show microblog data can indeed be used to perform QA retrieval effectively. We identify domain-specific features and combinations of those features that better account for improving QA ranking, achieving a MRR of 0.7795 (improving 62% over our baseline method). In addition, we provide evidence that our method allows to retrieve complex answers to non-factoid questions.

Link: https://doi.org/10.1007/978-3-319-76941-7_38

Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018)

Dyaa Albakour, David Corney, Julio Gonzalo, Miguel Martinez, Barbara Poblete, Andreas Valochas. CEUR Workshop Proceedings 2079, CEUR-WS.org 2018. http://ceur-ws.org/Vol-2079/

Dyaa Albakour, David Corney, Julio Gonzalo, Miguel Martinez, Barbara Poblete, Andreas Valochas: Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, March 26, 2018. CEUR Workshop Proceedings 2079, CEUR-WS.org 2018.

Link: http://ceur-ws.org/Vol-2079/

Graph Query Languages

Angles R., Reutter J., Voigt H. (2018) Graph Query Languages. In: Sakr S., Zomaya A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_75-1

In the Encyclopedia of Big Data Technologies:

A query language is a high-level computer language for the retrieval and modification of data held in databases or files. Query languages usually consist of a collection of operators which can be applied to any valid instances of the data structure types of a data model, in any combination desired.

In the context of graph data management, a graph query language (GQL) defines the way to retrieve or extract data which have been modeled as a graph and whose structure is defined by a graph data model. Therefore, a GQL is designed to support specific graph operations, such as graph pattern matching and shortest path finding.

LINK: https://doi.org/10.1007/978-3-319-63962-8_75-1

Graph Path Navigation

Arenas M., Barceló P., Libkin L. (2018) Graph Path Navigation. In: Sakr S., Zomaya A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_214-1

In the Encyclopedia of Big Data Technologies:

Navigational query languages for graph databases allow to recursively traverse the edges of a graph while checking for the existence of a path that satisfies certain regular conditions. The basic building block of such languages is the class of regular path queries (RPQs), which are expressions that compute the pairs of nodes that are linked by a path whose label satisfies a regular expression. RPQs are often extended with features that turn them more flexible for practical applications, e.g., with the ability to traverse edges in the backward direction (RPQs with inverses) or to express arbitrary patterns over the data (conjunctive RPQs).

LINK: https://doi.org/10.1007/978-3-319-63962-8_214-1

Containment of queries for graphs with data

Egor V. Kostylev, Juan L. Reutter, Domagoj Vrgoc. J. Comput. Syst. Sci. 92: 65-91 (2018) https://doi.org/10.1016/j.jcss.2017.09.005

We consider the containment problem for regular queries with memory and regular queries with data tests: two recently proposed query languages for graph databases that, in addition to allowing the user to ask topological queries, also track how the data changes along paths connecting various points in the database. Our results show that the problem is undecidable in general. However, by allowing only positive data comparisons we find natural fragments with better static analysis properties: the containment problem is PSpace -complete in the case of regular queries with data tests and ExpSpace -complete in the case of regular queries with memory.

Highlights: -Study of graph query languages that can deal with data values. -Containment problem for these languages is undecidable in general. -Containment is decidable if one focuses on languages that can only check for equalities. -Proofs make use of automata models.

Link: https://doi.org/10.1016/j.jcss.2017.09.005

Graph Data Models

Gutierrez C., Hidders J., Wood P.T. (2018) Graph Data Models. In: Sakr S., Zomaya A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-63962-8_81-1

DOI https://doi.org/10.1007/978-3-319-63962-8_81-1

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

Marcelo Mendoza, Sergio A. Velastin. 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, Chile, November 7-10, 2017, Proceedings. Lecture Notes in Computer Science 10657, Springer 2018, ISBN 978-3-319-75192-4. https://link.springer.com/book/10.1007%2F978-3-319-75193-1

This book constitutes the refereed post-conference proceedings of the 22nd Iberoamerican Congress on Pattern Recognition, CIARP 2017, held in Valparaíso, Chile, in November 2017.

The 87 papers presented were carefully reviewed and selected from 156 submissions. The papers feature research results in the areas of pattern recognition, image processing, computer vision, multimedia and related fields.

Link: https://link.springer.com/book/10.1007%2F978-3-319-75193-1

RDF Compression

Martínez-Prieto M.A., Fernández J.D., Hernández-Illera A., Gutiérrez C. (2018) RDF Compression. In: Sakr S., Zomaya A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-63962-8_62-1

RDF compression can be defined as the problem of encoding an RDF dataset using less bits than that required by text-based traditional serialization formats like RDF/XML, NTriples, or Turtle, among others. These savings immediately lead to more efficient storage (i.e., archival) and less transmission costs (i.e., less bits over the wire). Although this problem can be easily solved through universal compression (e.g., gzip or bzip2), optimized RDF-specific compressors take advantage of the particular features of RDF datasets (such as semantic redundancies) in order to save more bits or to provide retrieval operations on the compressed information. RDF self-indexes are focused on this latter task.

RDF self-indexes are RDF compressors that provide indexing features in a space close to that of the compressed dataset and can be accessed with no prior (or partial) decompression. These properties enhance scalability (i.e., less resources are required to serve semantic data) and speed up access as more information can be managed in higher levels of the memory hierarchy (typically, main memory or cache). In addition, efficient search algorithms have been proposed to resolve basic queries on top of self-indexed datasets. As a result, RDF self-indexes has been adopted as a core component of semantic search engines and lightweight Linked Data servers.

Finally, RDF stream compressors specifically focus on compressing a (continuous) stream of RDF data in order to improve exchange processes, typically in real-time. This constitutes a more recent trend that exploits different trade-offs between the space savings achieved by the compressor and the latency introduced in the compression/decompression processes.

This entry introduces basic notions of RDF compression, RDF self-indexing, and RDF stream compression and discusses how existing approaches deal with (and remove) redundant information in semantic datasets.

DOI https://doi.org/10.1007/978-3-319-63962-8_62-1

Feature-Based 3D Object Retrieval

Bustos B., Schreck T. (2017). In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_161-2

3D objects are an important type of data with many applications in domains such as Engineering and Computer Aided Design, Science, Simulation, Visualization, Cultural Heritage, and Entertainment. Technological progress in acquisition, modeling, processing, and dissemination of 3D geometry leads to the accumulation of large repositories of 3D objects. Consequently, there is a strong need to research and develop technology to support the effective retrieval of 3D object data from 3D repositories.

https://doi.org/10.1007/978-1-4899-7993-3_161-2