Eliminating Incorrect Cross-Language Links in Wikipedia

Sujet: Wikipedia, cross-language links, multi-language information retrieval, [INFO] Computer Science [cs]
Auteur: Bennacer, Nacéra, Bugiotti, Francesca, Galicia, Jorge, Patricio, Mariana, Quercini, Gianluca
Résumé: Many Wikipedia articles that cover the same topic in different language editions are interconnected via cross-language links that enable the understanding of topics in multiple languages, as well as cross-language information retrieval applications. However, cross-language links are added manually by the users of Wikipedia and, as such, are often incorrect. In this paper, we propose an approach to automatically eliminate incorrect cross-language links based on the observation that groups of articles that are pairwise connected through cross-language links form independent connected components. For each incoherent component (i.e., one that contains two or more articles from the same language edition), our approach assigns a correctness score to its crosslinks and removes those with the lowest score to make the component coherent. The results of our evaluation on a snapshot of Wikipedia in 8 languages indicates that our approach shows quantitative promise.
Source: info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-319-68786-5_9
Editeur: HAL CCSD