In the 23rd Edition of the competition of the Scientific Society of Business Informatics (Polish abbr. “NTIE”), people associated with the Department of Information Systems at Poznań University of Economics and Business were awarded for the best diploma theses in the field of information systems.
In the group of doctoral dissertations, the third place went to Dr. Włodzimierz Lewoniewski for the work entitled “The method of comparing and enriching information in multilingual wikis based on the analysis of their quality“. The thesis supervisor was Prof. Witold Abramowicz, the auxiliary supervisor was Prof. Krzysztof Węcel.
Quality Assessment and Information Enrichment in Multilingual Wikipedia
Wikipedia currently has over 54 million articles in over 300 languages. Despite its popularity, this online encyclopedia is often criticized for its low quality of information. However, depending on the topic and the language version, you can find valuable content there. Using machine learning algorithms and the semantic representation of Wikipedia in other knowledge bases (for example DBpedia), it is possible to automatically compare this information in different language versions and select the best (of the highest quality).
As part of the doctoral dissertation, tools were developed that determined the values of measures based on data in various formats and with the use of various sources. Scientific research has analyzed data with a total volume of over 10 terabytes and over a billion values of quality measures have been determined in various Wikipedia language versions. Experiments have shown that in local topics, information of the highest quality is usually placed in the appropriate language version. For example, information about Polish cities is usually the best in the Polish-language version of Wikipedia.
On the basis of local and international topics, quality models were also built to evaluate a particularly important part of Wikipedia articles – infoboxes, which are usually placed at the top of the article and contain the most important information about the subject. In this case, measuring popularity can help in assessing the quality of infoboxes. It is related to the fact that some users may quickly notice outdated or incorrect information. Therefore, if the article is popular in this language – then corrections can be faster. Presented in the PhD thesis models can be used to automatically enrich the different language versions of Wikipedia. Part of the research was carried out using data from DBpedia.
More information about quality assessment and information enrichment in Wikipedia can be found in scientific publications.