On September 7-9 the international conference KES 2022 (26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems) took place. During this event, scientific esearch on the identification of important information sources in various topics and language versions of Wikipedia was presented. This year’s edition of the KES conference was organized in Verona (Italy) in a hybrid mode.
Information in Wikipedia should be based on verifiable sources. However, not every website can act as a source of information in this encyclopedia. Rules of the Wikipedia says that information in its articles should be created on the basis of credible, independent, published content with a good reputation for factuality and accuracy. However, credibility is a subjective concept, and the reputation of the same source can be evaluated by different criteria depending on the person (or group of people). Thus, each language version of Wikipedia may have its own rules or criteria for how a web site must be evaluated before it can be used as an information source. Therefore, the credibility of the same source in Wikipedia depends on the topic and language version. Additionally, the credibility (reputation) rating of the same website may change over time.
As part of the presented research, over 230 million references in articles of various language versions of Wikipedia were analyzed. For example, the most developed English Wikipedia contains over 70 million references, then the Polish version of Wikipedia – over 7.5 million references to various sources of information. Unique websites have been identified on the basis of the references metadata. In the case of the English version, the number of such websites exceeded 1.7 million, then for the Polish version of Wikipedia you can find over 200,000 unique websites.
Additionally, references to scientific publications were identified. This made it possible to obtain the “Sci” index, which shows the frequency of occurrence of scientific sources of information within the analyzed language version of Wikipedia. Then the Wikipedia articles were divided into topics and using different models for reliability assessesment of the sources a comparative analysis of websites in different topics and language versions was carried out.