Marcelo Mendoza, Pablo Ormeño, Carlos Valle: «Ad-hoc information retrieval based on boosted latent Dirichlet allocated topics» (JCC 2018)

ABSTRACT: Latent Dirichlet Allocation (LDA) is a fundamental method in the text mining field. We propose strategies for topic and model selection based on LDA that exploits the semantic coherence of the topics inferred, boosting the quality of the models found. Then we study how our boosted topic models perform in ad-hoc information retrieval tasks. Experimental results in four datasets show that our proposal improves the quality of the topics found favoring document retrieval tasks. Our method outperforms traditional LDA-based methods showing that model selection based on semantic coherence is useful for document modeling and information retrieval tasks.

Date: November 6th, 2018. 15.40 h.

Venue: Universidad Andrés Bello, Antonio Varas 880, Providencia, Santiago.