In recent years, many scholars have analyzed the underrepresentation and the misrepresentation of non-Western people in several resources largely adopted for research in Natural Language Processing. For example, it has been found that Wikipedia contains bias about women and minority groups, since most Wikipedia contributors are white Western males (cfr: Field, A., 2021). In addition, people belonging to ethnic minorities and non-Western people are significantly less covered on this encyclopedia (cfr.: Adams, J. et al, 2019).
The Under-Represented Writers (URW) Project is an attempt to explore and mitigate the under-representation of non-Western writers in the digital landscape by providing a set of resources:
- The Under-Represented Writers (URW) and Under-Represented Books (URB) ontologies, designed to encode biographical information about writers and information about their works;
- a Knowedge Graph (KG), namely a dataset that includes knowledge gathered from several online resources: Wikipedia, Wikidata, Open Library, Goodreads, and Google Books.
- A set of strategies and resources for the extraction of biographical events from raw-text biographies gathered from writers’ Wikipedia pages.