AI safety papers

View source | View history | Atom feed for this file

Creation date: 2019-01-10
Last substantive revision date: 2019-01-10
Last modification date: 2022-05-03
Generated on: 2025-03-16

This page keeps track of AI safety papers I am reading, to help me remember why I want to read a paper, where I get stuck, etc.

The “Status/lessons” column tracks where I am in the paper and what I didn’t understand, what background I seem to be missing, etc.

The “Source/motivation” column tracks how I came across the paper and why I want to read the paper. (These two things are often connected, so I combined the column.)

Title	Status/lessons	Source/motivation
“Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence”	At statement of theorem 4.1. Decided I wasn’t comfortable enough with game theory (2019-01-09).	I’ve seen this paper mentioned a bunch.
“Logical Induction”	I read the beginning parts of this paper twice and watched Andrew Critch’s talk on YouTube. I am slowly digesting the definitions and so forth.	This seems to be one of MIRI’s big results, so I want to understand it. I think I originally decided to read it because I wanted to understand decision theory better.
“AI safety via debate”	I finished reading the paper (2019-01-04, 2019-01-05). I think I need to know more about computational complexity (to appreciate the debate hierarchy analogy) and about machine learning in general	I wanted to understand the Paul/OpenAI approach better.
“Supervising strong learners by amplifying weak experts”	I finished reading the paper (2019-01-05). I think I need more familiarity with machine learning to appreciate the paper.	I wanted to understand the Paul/OpenAI approach better.

External links

My daily updates blog for my AI safety learning