Wikipedia has long been based on one of its most important principles – the neutral point of view (NPOV). In theory, this means that every article should present facts without bias. In practice, however, achieving full neutrality is a major challenge, especially across millions of articles written by people from all over the world.
A recently published scientific paper entitled “Cross-Topic Sentiment Analysis of Wikipedia Articles: A Comparative Study of AI Models” shows how artificial intelligence can help analyze this issue. The authors of the publication: Włodzimierz Lewoniewski, Milena Stróżyna, Izabela Czumałowska, Aleksandra Wojewoda, Krzysztof Węcel. The researchers examined around 7 million articles from the English-language Wikipedia, seeking to answer the question: is the language used in these texts truly neutral?
At first glance, one might think it is enough to check whether a text contains positive or negative words. However, the problem is much more complex. Wikipedia articles are long and cover multiple threads; their style varies depending on the field, such as politics or quantum physics; and they sometimes describe controversial topics. This means that subtle differences in wording can suggest bias, even when there are no obvious emotionally charged words.
As part of the research, several different approaches to language analysis were used:
- lexicon-based models, such as TextBlob and VADER, which rely on predefined word lists,
- modern language models based on transformer architecture, such as RoBERTa and DistilBERT.
The results show that Wikipedia’s neutrality is not uniform; it varies across subject areas. The choice of model can significantly affect the assessment of a text, and in the case of long and complex articles, it is necessary to combine evaluations from smaller fragments. The findings may also have practical applications. For example, they could support better quality control on Wikipedia: automated systems could identify passages that deviate from neutrality, helping editors correct them more quickly. Another application is the fight against disinformation: similar methods can be used to analyze online articles and detect biased or manipulative content. Such technologies can help internet users better understand when a text is objective and when it is trying to influence their opinion.
One of the most important outcomes of the research is a publicly available dataset, released on Hugging Face, which contains sentiment scores assigned by different models to around 7 million English-language Wikipedia articles. Supplementary materials have also been made available, providing a clearer understanding of how the analysis was conducted.
The scientific paper was presented at the IJCAI 2025 conference. The publication is available under DOI: 10.1007/978-3-032-18920-2_34.
This research is supported by the project “OpenFact – artificial intelligence tools for verification of the veracity of information sources and fake news detection” (INFOSTRATEG-I/0035/2021-00), granted within the INFOSTRATEG I program of the National Center for Research and Development, under the topic: Verifying information sources and detecting fake news.