Adversarial Explanations - You Shouldn't Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods
- đ¤ Speaker: Botty Dimanov (University of Cambridge)
- đ Date & Time: Tuesday 09 June 2020, 13:15 - 14:15
- đ Venue: Online on Teams
Abstract
Transparency of algorithmic systems has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME (Ribeiro et al 2016) even suggests that model explanations can answer the question ``Why should I trust you?’’ Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explanation methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this explanation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation methods to check model fairness.
Series This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.
Included in Lists
- All Talks (aka the CURE list)
- Artificial Intelligence Research Group Talks (Computer Laboratory)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Department of Computer Science and Technology talks and seminars
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- Martin's interesting talks
- ndk22's list
- ob366-ai4er
- Online on Teams
- PhD related
- rp587
- School of Technology
- Speech Seminars
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 09 June 2020, 13:15-14:15