ACM Web Conference 2025: Marcelo Mendoza presents AI method to improve information extraction in news stories
It was precisely in Sydney, Australia, where Marcelo Mendoza, DCC UC academic and CENIA principal investigator and IMFD research associate, presented two new techniques for language models such as GPT-4o, Claude and Gemini to extract key data from news and formulate questions and answers like a human reader. The study was developed in conjunction with: Hans Löbel, shared vacancy professor DCC - Transport UC and CENIA researcher, Brian Keith from Universidad Católica del Norte and PhD student Carlos Muñoz.
Thinking in a journalistic scenario, extracting key information from news articles - organized around the questions "Who", "What", "When", "Where", "Why" and "How" (5W1H) - has been a fundamental strategy in digital journalism to empower search systems. With the rise of large language models (LLM) - such as GPT (OpenAI), Gemini (Google) or Claude (Anthropic), among others - a renewed interest in their potential to more effectively perform information extraction tasks has emerged.

Marcelo Mendoza, presented at The ACM Web Conference 2025, a research entitled "Imitating Human Reasoning to Extract 5W1H in News", which was held from April 28 to May 2 in Sydney, Australia. The research proposes an approach that seeks to improve the automatic extraction of information in news utterances (5W1H), using language models and focusing particularly on their ability to mimic human reasoning.
The research introduces two new "Chain of Thought" (COT) techniques in AI models that have the ability to reason imitatively when performing complex tasks. The research proposes the use of extractive reasoning, which directs the language model (LLM) to identify and highlight relevant details directly in the text, and question-level reasoning, which guides the model to formulate and answer questions as a human reader would.
Experiments with state-of-the-art language modeling (LLM) showed that the proposed TOC techniques far outperform traditional extraction methods.
According to the declarations of the academic Marcelo Mendoza in the portal of the National Center for Artificial Intelligence, he states: "the results of this study have the potential to transform the way in which automatic systems process news, facilitating more precise searches and a better organization of information on the web".

