During the 10th National Scientific Conference of Professor Zbigniew Czerwiński “Mathematics and informatics serving economics” was presented a paper on the automatic identification of reliable sources of information on various topics in the multilingual Wikipedia. The conference was held online on September 24.
For over 20 years, Wikipedia has enabled tens of millions of Internet users from all over the world to contribute to knowledge on various topics in over 300 language versions. Currently, this encyclopedia has more than 57 million articles written by approximately 98 million users. At the same time, the number of edits is over 3 billion.
Various facts described in Wikipedia articles must be based on verifiable and published (preferably publicly available) source materials. When editing an article, authors should make sure that all the opinions of the majority and significant minorities appearing in these materials have been taken into account. Information unsupported by or poor quality sources may be removed from Wikipedia. Additionally, each Wikipedia language version may have its own set of criteria that determine what reliable sources are. It should be noted, however, that these criteria may be interpreted differently depending on the user and context.
As with the evaluation of the quality of information, the evaluation of the reliability of a source is a subjective concept. Often times, a person editing a Wikipedia article has to make a decision about the reliability of a particular reference, which most often refers to a website. In some (usually the most developed) language versions of Wikipedia you can find lists of reliable or questionable sources of information (e.g. in the English version). However, such lists contain less than 1,000 items, when we have over a billion different websites that could potentially be a source of information. Additionally, these incomplete lists need to be updated regularly as the reputation or reliability of the same source may change over time. In addition, the criteria of credibility of sources may change over time separately for each language version of Wikipedia. Besides, the reliability of the same source depends not only on the language, but also on the subject of the information being verified.
The presented results of scientific research show that on the basis of the analysis of open data from Wikipedia, it is possible to automate the process of assessing the reliability of sources in each language version of Wikipedia with the separation of the subject of information.
More details can be found on the website of the 10th conference “Mathematics and informatics serving economics”.