New research examines how the structure of Wikipedia can help automatically identify content related to science fiction and fantasy. The results show that although Wikipedia contains enormous amounts of machine-readable data, interpreting these signals is not straightforward.
At first glance, answering the question “which Wikipedia articles are about science fiction or fantasy?” might seem simple. In practice, however, it turns out to be much more difficult. The boundaries between these genres are fluid, and many works combine elements from different traditions — from mythology and horror to dystopia and magical realism.
Wikipedia is not only a collection of article texts. It is also a complex ecosystem of connections and metadata that can be analyzed at scale. The most important elements include:
- categories assigned to articles (e.g., “science fiction novels”),
- wikilinks, i.e., internal links between articles,
- structured data from Wikidata describing the type of object (e.g., novel, film, fictional character),
- WikiProject tags, which are labels created by Wikipedia’s editor communities.
In the publication titled “Science Fiction and Fantasy in Wikipedia: Exploring Structural and Semantic Cues“, public Wikipedia data dumps were used to examine different signals that may indicate whether an article is related to speculative fiction. The results may be useful for several communities. For example, researchers in digital humanities can analyze the development of literary genres and popular culture on a global scale. In addition, the Wikipedia community can use these findings to identify gaps in article tagging or in the structure of categories.
Research on the structure of Wikipedia has implications beyond the study of fantasy itself. Automatically recognizing topics in articles can support cultural and literary studies, large-scale analyses of popular culture data, the development of artificial intelligence tools, and improved search and recommendation systems in digital knowledge platforms.