University of Cambridge > Talks.cam > Engineering Safe AI > 'Off-Switch Games' and Corrigibility

'Off-Switch Games' and Corrigibility

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adrià Garriga Alonso.

By default, an AI system will have an incentive to prevent humans from switching it off, or otherwise interfering in its operation, as this would prevent it from maximising its reward. An AI system is ‘corrigible’ if it has an incentive to accept human corrections. Inverse Reinforcement Learning (IRL) can help mitigate this problem in some cases, but there is disagreement as to whether IRL can guarantee corrigibility in all cases.

Papers: https://arxiv.org/abs/1611.08219 https://intelligence.org/files/Corrigibility.pdf https://intelligence.org/2017/08/31/incorrigibility-in-cirl/

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity