University of Cambridge > Talks.cam > Engineering Safe AI > Counterargument to CIRL, and Safely Interruptible Agents

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Counterargument to CIRL, and Safely Interruptible Agents

Download to your calendar using vCal

Adrià Garriga Alonso (University of Cambridge)
Wednesday 06 December 2017, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions.

If you have a question about this talk, please contact Adrià Garriga Alonso .

Cooperative Inverse Reinforcement Learning (CIRL) is a game with a robot R and human H, in which R tries to maximise H’s reward while not knowing it. R is incentivised to shut down on H’s suggestion, since that provides information about the H’s reward function. However, Carey (2017) shows that, if R and H do not share the same prior for the reward, R may remain incorrigible. Carey then makes a case for forced interruptibility. We will talk about Carey’s examples and the strength of the case for forced interruptibility.

Orseau and Armstrong (2016) provide a formal notion of satisfactory learning under forced interruptions. Then they show how Q-learning satisfies it, and SARSA and AIXI -with-exploration can be modified to satisfy it. We will go over the proof outlines and discuss their implications for corrigibility.

Reading list:

Ryan Carey. 2017. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI].

Laurent Orseau and Stuart Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

Slides: https://valuealignment.ml/talks/2017-12-06-interruptibility.pdf

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Counterargument to CIRL, and Safely Interruptible Agents

📅 Download to calendar (vCal)

👤 Speaker: Adrià Garriga Alonso (University of Cambridge)
📅 Date & Time: Wednesday 06 December 2017, 17:00 - 18:30
📍 Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions

Questions? Contact Adrià Garriga Alonso

Abstract

Reading list:

Ryan Carey. 2017. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI].

Laurent Orseau and Stuart Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

Slides: https://valuealignment.ml/talks/2017-12-06-interruptibility.pdf

Series This talk is part of the Engineering Safe AI series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Counterargument to CIRL, and Safely Interruptible Agents

This talk is included in these lists:

Counterargument to CIRL, and Safely Interruptible Agents

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Counterargument to CIRL, and Safely Interruptible Agents

This talk is included in these lists:

Other lists

Other talks

Counterargument to CIRL, and Safely Interruptible Agents

Abstract

Included in Lists