Approaches to avoiding negative side effects
- 👤 Speaker: Adrià Garriga Alonso (University of Cambridge)
- 📅 Date & Time: Wednesday 30 May 2018, 17:00 - 18:30
- 📍 Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions
Abstract
In this session we will learn about several approaches to avoiding negative side-effects, from the papers:- “Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Zhang et al. 2018 (emphasis mine)
- “Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017
The first paper’s approach is reasonably efficient to compute. However, it only applies to discrete-state factored MDPs, the human feedback it requires probably doesn’t scale great, and it doesn’t account for all kinds of positive or negative side effects.
The approaches from the second paper are less immediately applicable and difficult to compute. Both provide some insights, and we will base our discussion of how to improve side-effect measures on them.
Relevant papers:
“Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Shun Zhang, Edmund H. Durfee, and Satinder Singh, 2018, https://web.eecs.umich.edu/~baveja/Papers/ijcai-2018.pdf
“Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017 https://arxiv.org/abs/1705.10720
“AI Safety Gridworlds”, Leike et al. 2017, https://arxiv.org/abs/1711.09883
“Concrete Problems in AI Safety”, Amodei et al. 2016 https://arxiv.org/abs/1606.06565
Series This talk is part of the Engineering Safe AI series.
Included in Lists
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions
- Chris Davis' list
- Engineering Safe AI
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 30 May 2018, 17:00-18:30