BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Semantics derived automatically from language corpora necessarily 
 contain human biases - Arvind Narayanan\, Princeton University
DTSTART:20161011T130000Z
DTEND:20161011T140000Z
UID:TALK67474@talks.cam.ac.uk
CONTACT:Laurent Simon
DESCRIPTION:*Abstract:*\nJoint work with Aylin Caliskan-Islam and Joanna J
 . Bryson\n\nArtificial intelligence and machine learning are in a period o
 f astounding growth. However\, there are concerns that these technologies 
 may be used\, either with or without intention\, to perpetuate the prejudi
 ce and unfairness that unfortunately characterizes many human institutions
 . Here we show for the first time that human-like semantic biases result f
 rom the application of standard machine learning to ordinary language---th
 e same sort of language humans are exposed to every day. We replicate a sp
 ectrum of standard human biases as exposed by the Implicit Association Tes
 t and other well-known psychological studies. We replicate these using a w
 idely used\, purely statistical machine-learning model---namely\, the GloV
 e word embedding---trained on a corpus of text from the Web. Our results i
 ndicate that language itself contains recoverable and accurate imprints of
  our historic biases\, whether these are morally neutral as towards insect
 s or flowers\, problematic as towards race or gender\, or even simply veri
 dical\, reflecting the status quo for the distribution of gender with resp
 ect to careers or first names. These regularities are captured by machine 
 learning along with the rest of semantics. In addition to our empirical fi
 ndings concerning language\, we also contribute new methods for evaluating
  bias in text\, the Word Embedding Association Test (WEAT) and the Word Em
 bedding Factual Association Test (WEFAT). Our results have implications no
 t only for AI and machine learning\, but also for the fields of psychology
 \, sociology\, and human ethics\, since they raise the possibility that me
 re exposure to everyday language can account for the biases we replicate h
 ere.\n\nLink to paper: https://arxiv.org/abs/1608.07187\n\n*Bio:*\nArvind 
 Narayanan is an Assistant Professor of Computer Science at Princeton. He l
 eads the Princeton Web Transparency and Accountability Project to uncover 
 how companies collect and use our personal information. Narayanan also lea
 ds a research team investigating the security\, anonymity\, and stability 
 of cryptocurrencies as well as novel applications of blockchains. He co-cr
 eated a Massive Open Online Course as well as a textbook on Bitcoin and cr
 yptocurrency technologies. His doctoral research showed the fundamental li
 mits of de-identification\, for which he received the Privacy Enhancing Te
 chnologies Award.\n\nNarayanan is an affiliated faculty member at the Cent
 er for Information Technology Policy at Princeton and an affiliate scholar
  at Stanford Law School's Center for Internet and Society. 
LOCATION:LT2\, Computer Laboratory\, William Gates Building
END:VEVENT
END:VCALENDAR