Features of Wikipedia Articles and Their Extraction Methods for Automatic Information Quality Assessment

This article presents and classifies features that can be extracted from Wikipedia articles for the purpose of automatic information quality assessment. Based on a state of the art analysis and our own experiments, specific measures for various aspects of quality have been defined. Additionally, an extraction method for various sources of features has been proposed. The links between articles in various languages offer opportunities for the comparison and verification of the quality of information delivered by wikipedians. The elaborated model can be used for the relative quality assessment of data contained in the structural parts of Wikipedia articles, namely infoboxes.

