A scientific paper on the automatic identification of important sources of information on a certain topic in the multilingual Wikipedia based on the analysis of more than 230 million footnotes (references) has been published on the Elsevier publishing house website. As part of this research, various models of automatic assessment of information sources were presented, which take into account the frequency of occurrence of the researched sources, the popularity of content from Wikipedia editors and readers.
To conduct research, Wikipedia articles have been divided into 70 topics of varying levels of abstraction covering areas such as culture, geography, history, society, science, technology, engineering, and mathematics. With information about references extracted from each Wikipedia articles, one can examine how well individual Wikipedia topics offer verifiable information in different language versions. The figure below shows reference density (RpA – References per Article) values for each of Wikipedia’s 70 topics and 42 language versions.
In addition scientific sources of information were identified, which allowed to determine the differences between the language versions in terms of the value of the Sci score. For example, in the most extensive English version of Wikipedia, the share of scientific sources of information is about 2.6%, in the Polish version – 0.76%, in Russian – 1.19%, in German – 1.2%, in French – 1, 12%, Spanish – 1.44%, Chinese – 0.74%, Japanese – 1.08%, Arabic – 2.86%.
The results of the scientific research were presented at the KES 2022 conference. The publication is available at: doi.org/10.1016/j.procs.2022.09.387