REmatch

2023–2025

REmatch is a cross-platform library developed in C++, Python, and JavaScript, designed for structured information extraction from plain text. It uses its own language called REQL (Regular Expressions Query Language), whose syntax is similar to traditional regular expressions, but with more powerful and declarative handling of capture variables.

REmatch

Project

REmatch is a new library (C++/Python/JavaScript) that implements REQL (Regular Expressions Query Language), a query language designed specifically for the efficient and comprehensive extraction of information from plain text documents. Unlike traditional RegEx libraries that focus on pattern matching, REmatch's main purpose is to return all possible matches for a given query, facilitating text analysis and data mining. Its engine, based on constant delay algorithms, guarantees high efficiency, even with a large volume of results.

Use cases

  • Text data extraction: The main purpose of the library is to extract information from plain text documents using its REQL (Regular Expressions Query Language) query language.
  • Text and corpus analysis: This is useful for text analysis tasks where you need to determine the context in which certain words appear. For example, extracting each proper noun along with the sentence in which it appears.
  • Extraction of unlimited or optional fields: REQL's MultiMatch feature allows you to capture lists of spans (text segments) in variables, which is useful for extracting an unlimited number or optional fields of data. For example, extracting an entire sentence and then a list of all the words within that sentence.

Team

  • Kyle Bossonney – Developer
  • Vicente Calisto – Developer
  • Gustavo Toro – Developer
  • Nicolás Van Sint Jan – Developer
  • Cristian Riveros – Professor (Millennium Institute Foundational Research on Data / Pontifical Catholic University of Chile)
  • Domagoj Vrgoč – Professor (Millennium Institute Foundational Research on Data / Pontifical Catholic University of Chile)

Documents

Paper REmatch