AI safety papers

This page keeps track of AI safety papers I am reading, to help me remember why I want to read a paper, where I get stuck, etc.

The “Status/lessons” column tracks where I am in the paper and what I didn’t understand, what background I seem to be missing, etc.

The “Source/motivation” column tracks how I came across the paper and why I want to read the paper. (These two things are often connected, so I combined the column.)

Title Status/lessons Source/motivation
“Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” At statement of theorem 4.1. Decided I wasn’t comfortable enough with game theory (2019-01-09). I’ve seen this paper mentioned a bunch.
“Logical Induction” I read the beginning parts of this paper twice and watched Andrew Critch’s talk on YouTube. I am slowly digesting the definitions and so forth. This seems to be one of MIRI’s big results, so I want to understand it. I think I originally decided to read it because I wanted to understand decision theory better.
“AI safety via debate” I finished reading the paper (2019-01-04, 2019-01-05). I think I need to know more about computational complexity (to appreciate the debate hierarchy analogy) and about machine learning in general I wanted to understand the Paul/OpenAI approach better.
“Supervising strong learners by amplifying weak experts” I finished reading the paper (2019-01-05). I think I need more familiarity with machine learning to appreciate the paper. I wanted to understand the Paul/OpenAI approach better.