AI safety papers
This page keeps track of AI safety papers I am reading, to help me remember why I want to read a paper, where I get stuck, etc.
The “Status/lessons” column tracks where I am in the paper and what I didn’t understand, what background I seem to be missing, etc.
The “Source/motivation” column tracks how I came across the paper and why I want to read the paper. (These two things are often connected, so I combined the column.)
Title | Status/lessons | Source/motivation |
---|---|---|
“Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” | At statement of theorem 4.1. Decided I wasn’t comfortable enough with game theory (2019-01-09). | I’ve seen this paper mentioned a bunch. |
“Logical Induction” | I read the beginning parts of this paper twice and watched Andrew Critch’s talk on YouTube. I am slowly digesting the definitions and so forth. | This seems to be one of MIRI’s big results, so I want to understand it. I think I originally decided to read it because I wanted to understand decision theory better. |
“AI safety via debate” | I finished reading the paper (2019-01-04, 2019-01-05). I think I need to know more about computational complexity (to appreciate the debate hierarchy analogy) and about machine learning in general | I wanted to understand the Paul/OpenAI approach better. |
“Supervising strong learners by amplifying weak experts” | I finished reading the paper (2019-01-05). I think I need more familiarity with machine learning to appreciate the paper. | I wanted to understand the Paul/OpenAI approach better. |