Wiki Workshop 2025: Citation Index and Synthetic Quality Measure for Wikipedia (video)

At the Wiki Workshop 2025, the results of a scientific study were presented in which a comprehensive analysis of Wikipedia articles was conducted across 55 language editions and 18 thematic categories. The study introduced an original approach combining a citation index with a synthetic quality measure for articles.

To construct the citation index, 6.6 billion internal links (wikilinks) between Wikipedia pages were analyzed. This enabled the identification of the most important articles in each language edition and thematic area. Articles were assigned to topics based on their links to the open semantic knowledge base Wikidata, allowing for the selection of 18 key topics and the identification of the most cited articles within each. The study provided rankings of the Top 10, Top 25, and Top 100 most cited articles for every language and topic.

Presentation recording:

Simultaneously, the quality of over 47 million Wikipedia articles was evaluated using a synthetic quality measure that integrates such features as article length, the number and density of references, the count of images and sections, as well as the presence of templates indicating quality issues. This method allowed for the comparison of article quality even between language editions with differing quality standards. Both the calculated citation indices and article quality scores were made publicly available as open datasets: citation indices for Wikipedia articles on the Hugging Face platform, and quality scores on Kaggle.

The analysis revealed substantial differences in the quality and topical coverage among the various language editions of Wikipedia. The highest citation and quality indices were observed in the largest editions, such as the English and German Wikipedias, particularly in categories like cities, films, biographies, and universities. High quality scores were also recorded for the Catalan, Spanish, Korean, and Chinese editions. In less developed language editions, a notable drop in average article quality was observed as the analysis scope expanded to include more articles, indicating that high quality is often concentrated among the most cited entries.

The findings of this study provide the Wikipedia community and the academic environment with valuable insights into the strengths and weaknesses of individual language editions. The collected data and conclusions may inform targeted efforts to improve less developed Wikipedias, optimize editorial processes, and better monitor progress in content quality. Furthermore, the presented approach and released tools serve as a starting point for further, even more detailed, comparative research on Wikipedia on a global scale.

Future plans include extending the analysis to additional topics, more language editions, and new metrics such as page view statistics or the number of unique editors, which will allow for an even deeper understanding of the diversity, trends, and challenges faced by multilingual Wikipedia.

The paper, “Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics” is available open access. Authors of the work: Dr. Włodzimierz Lewoniewski, Prof. Krzysztof Węcel, Prof. Witold Abramowicz.

Wiki Workshop is an annual international conference organised by academic and expert communities studying Wikipedia and other Wikimedia Foundation projects. It seeks to foster the exchange of knowledge, experience, and research findings that support Wikipedia’s continued development and the improvement of its content. The 2025 edition was held online on 21–22 May 2025. Further details are available on the Wiki Workshop website: wikiworkshop.org.

Sources: kie.ue.poznan.pl, ue.poznan.pl

Polski
English
Русский