University of Cambridge > Talks.cam > Engineering Safe AI > Approaches to avoiding negative side effects

Approaches to avoiding negative side effects

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adrià Garriga Alonso.

In this session we will learn about several approaches to avoiding negative side-effects, from the papers:
  • “Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Zhang et al. 2018 (emphasis mine)
  • “Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017

The first paper’s approach is reasonably efficient to compute. However, it only applies to discrete-state factored MDPs, the human feedback it requires probably doesn’t scale great, and it doesn’t account for all kinds of positive or negative side effects.

The approaches from the second paper are less immediately applicable and difficult to compute. Both provide some insights, and we will base our discussion of how to improve side-effect measures on them.

Relevant papers:

“Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes”, Shun Zhang, Edmund H. Durfee, and Satinder Singh, 2018, https://web.eecs.umich.edu/~baveja/Papers/ijcai-2018.pdf

“Low Impact Artificial Intelligences”, Armstrong and Levinstein 2017 https://arxiv.org/abs/1705.10720

“AI Safety Gridworlds”, Leike et al. 2017, https://arxiv.org/abs/1711.09883

“Concrete Problems in AI Safety”, Amodei et al. 2016 https://arxiv.org/abs/1606.06565

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity