Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Approaches to avoiding negative side effects

Add to your list(s) Download to your calendar using vCal

Adrià Garriga Alonso (University of Cambridge)
Wednesday 30 May 2018, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions.

If you have a question about this talk, please contact Adrià Garriga Alonso.

In this session we will learn about several approaches to avoiding negative side-effects, from the papers:

“Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Zhang et al. 2018 (emphasis mine)
“Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017

The first paper’s approach is reasonably efficient to compute. However, it only applies to discrete-state factored MDPs, the human feedback it requires probably doesn’t scale great, and it doesn’t account for all kinds of positive or negative side effects.

The approaches from the second paper are less immediately applicable and difficult to compute. Both provide some insights, and we will base our discussion of how to improve side-effect measures on them.

Relevant papers:

“Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Shun Zhang, Edmund H. Durfee, and Satinder Singh, 2018, https://web.eecs.umich.edu/~baveja/Papers/ijcai-2018.pdf

“Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017 https://arxiv.org/abs/1705.10720

“AI Safety Gridworlds”, Leike et al. 2017, https://arxiv.org/abs/1711.09883

“Concrete Problems in AI Safety”, Amodei et al. 2016 https://arxiv.org/abs/1606.06565

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Approaches to avoiding negative side effects

This talk is included in these lists:

Other lists

Other talks