Neutral Point of View (NPOV) is one of the core principles of Wikipedia. According to this guideline, content should be presented objectively, taking into account all significant perspectives without favoring any particular one.
Due to the open nature of Wikipedia editing, maintaining this principle poses a significant challenge – editors may unintentionally transfer their own biases and emotions into articles. As a result, methods enabling the automatic monitoring of content neutrality are becoming increasingly important.
Sentiment Analysis
Sentiment analysis is a machine learning technique commonly used to study opinions in social media, reviews, or comments. It allows for the classification of content as positive, neutral, or negative. However, it is most often applied to short, informal texts, which makes its direct application to encyclopedic articles more difficult. Wikipedia entries, in contrast, are longer and more formally structured.
Research Presentation at IJCAI 2025
At the IJCAI 2025 (34th International Joint Conference on Artificial Intelligence), held in Montreal, Canada, in August 2025, the paper “Cross-Topic Sentiment Analysis of Wikipedia Articles: A Comparative Study of AI Models” was presented. Authors of the study: Dr. Włodzimierz Lewoniewski, Dr. Milena Stróżyna, Izabela Czumałowska, Aleksandra Wojewoda, Prof. Krzysztof Węcel.
The study reported the results of analyzing nearly 7 million English Wikipedia articles using four sentiment analysis models: lexicon-based approaches (TextBlob, VADER) and transformer-based models (RoBERTa, DistilBERT).
The findings revealed that:
- the sentiment of Wikipedia articles varies significantly depending on the topic,
- different models produce divergent sentiment evaluations, highlighting the importance of tool selection in such analyses,
- it is possible to develop practical tools to support large-scale monitoring of neutrality in Wikipedia articles.
Additionally, the researchers released a dataset on the Hugging Face platform containing sentiment classification results of Wikipedia articles produced by the tested models.
Significance of the Study
This work contributes to the development of methods for the automatic assessment of Wikipedia content quality. Unlike earlier studies that focused on short text fragments, the presented approach covers the entire English Wikipedia. This makes it possible to systematically monitor compliance with the Neutral Point of View principle on one of the world’s most important knowledge platforms.
The proposed methodology may also be applied to assessing the quality and reliability of other online resources, which is crucial in the context of combating misinformation and ensuring access to trustworthy knowledge on the Internet.