Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Interpretability - the myth, questions, and some answers

Add to your list(s) Download to your calendar using vCal

Been Kim, Google Brain
Monday 10 September 2018, 11:00-12:00
Engineering Department, Board Room on the 2nd Floor.

If you have a question about this talk, please contact Adrian Weller.

NOTE LOCATION : 2ND FLOOR BOARD ROOM

In this talk, I will provide an overview of my work on interpretability from the past couple of years. I will talk about 1) our studies on factors that influence how humans understand explanations from machine learning models, 2) building inherently interpretable models with and without human-in-the-loop, 3) improving interpretability when you already have a model (post-training interpretability) and 4) our work on ways to test and evaluate interpretability methods.

Among them, I will take a deeper dive in one of my recent works – testing with concept activation vectors (TCAV) – a post-training interpretability method for complex models, such as neural networks. This method provides an interpretation of a neural net’s internal state in terms of human-friendly, high-level concepts instead of low-level input features. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use concept activation vectors (CAVs) as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result—for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Interpretability - the myth, questions, and some answers

This talk is included in these lists:

Other lists

Other talks