University of Cambridge > Talks.cam > NLIP Seminar Series > Mitigating Gender Bias in Morphologically Rich Languages

Mitigating Gender Bias in Morphologically Rich Languages

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Andrew Caines.

Gender bias exists in corpora of all of the world’s languages: the bias is a function what people talk about, not of the grammar of a language. For this reason, data-driven systems in NLP that are trained on this data will inherit such bias. Evidence of bias can be found in all sorts of NLP technologies: word vectors, language models, coreference systems and even machine translation. Most of the research done to mitigate gender bias in natural language corpora, however, has focused solely on English. For instance, in an attempt to remove gender bias in English corpora, NLP practitioners often augment corpora by swapping gendered words: i.e., if “he is a smart doctor” appears, add the sentence “she is a smart doctor” to the corpus as well before training a model. The broader research question asked in this talk is the following: How can we mitigate gender bias in corpora from any of the world’s languages, not just in English? As an example, the simple swapping heuristic for English will not generalize to most of the world’s languages. Indeed, such a solution would not even apply to German, since it marks gender on both nouns and adjectives and requires gender agreement throughout a sentence. In the context of German, this task is far more complicated: mapping “er ist ein kluger Arzt” to “sie ist eine kluge Ärztin” requires more than simply swapping “er” with “sie” and “Arzt” with “Ärztin”—one also has to modify the article (“ein”) and the adjective (“klug”). In this talk, we present a machine-learning solution to this problem: we develop a novel neural random field that generates such sentence-to-sentence transformations, enforcing agreement with respect to gender. We explain how to perform inference and morphological reinflection to generate such transformations without any labeled training examples. Empirically, we illustrate that the model manages to reduce gender bias in corpora without sacrificing grammaticality with a novel metric of gender bias. Additionally, we discuss concrete applications to coreference resolution and machine translation.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity