The great challenge of the black box: there is no silver bullet, but there are advances in explainability – Millennium Institute Foundational Research on Data

One of the most important challenges for the generation of computer scientists working with artificial intelligence models is to decipher the well-known "black box": computer systems whose internal workings are not visible and therefore cannot be audited.

To address this problem, Marcelo Arenas, an academic at IMC and DCC UC and researcher at IMFD, presented a proposal in his keynote speech entitled"From Explanations to Queries"at the IRIS-AI 25 conference to tackle the growing challenge of Explainable Artificial Intelligence (XAI). The solution lies in the development of an Explanability Query Language that allows users to formally combine and query different notions of explanation.

The challenge of the "black box model"

"There is currently great interest in developing methods to explain the predictions made by machine learning (ML) models, which often operate as 'black boxes'. These models produce answers or scores, but it is not known why they produce them. This difficulty has led to the introduction of a large number of different queries and measures or scores for explainability," notes Arenas, who emphasizes that formal explainability does not admit a "silver bullet"; there is no single notion of explanation that is universally considered the best. Given the proliferation of methods (such as abductive explanations, contrastive explanations, or SHAP values), the central idea is to consider explainability as an iterative process that requires the combination of different notions.

Innovation: A declarative query language

The main proposal is to provide users with a language that allows them to pose explainability as a query. This shifts the focus from developing a new efficient algorithm for each notion of explanation to optimizing the language itself.

This language must be declarative: The user must specify which notion of explainability they want to evaluate, not how to evaluate it, allowing the system to take care of optimization. It must also be based on database systems: It is based on well-known query languages, such as First-Order Logic and Relational Algebra. And finally, it must have a fixed formula: A notion of explainability must be represented by a fixed query that can be evaluated on models of any dimension.

The ML model is encoded as a database that stores instances and subsumption relations (to handle partial instances with undefined values).

Managing inherent complexity

The language allows for the expression of complex combinations of explainability queries. For example, one can ask whether there is a common abductive explanation for two different classifications, or an explanation that distinguishes one classification from another.

A key technical challenge is complexity. Because the databases representing these models can be of exponential size, evaluating queries is inherently difficult. The complexity target for this language is P raised to NP (within the polynomial hierarchy), which means that an efficient algorithm could use an SAT solver as an oracle to solve the problems, reflecting that certain explanation tasks have high intrinsic complexity.

Essentially, the research is at the forefront of database theory, applying it to provide a sharp and formal understanding of XAI issues.

This research was presented at the First International Symposium on AI Research and Industry (IRIS-AI), an interdisciplinary conference that brings together researchers and industry professionals working across the broad spectrum of fields covered by AI. The symposium aims to provide an open and engaging forum for discussion on the latest advances and emerging trends in AI.

Check out the presentation at https://www.youtube.com/watch?v=nB4RmO7ylno