-Should be clear. What does it mean to condition the prior. The relationship between the input and the training data.

In the limit, it converges to … reiterate that this is interesting because it looks different in the limit than a utility maximizer.

Maybe at the end of this, note that this section talks about conditioning on the prior mechanistically with toy examples. But with GPT you’re doing something a lot more semantically. Can’t be as precise because of many factors of uncertainty with natural language - can only constrain so much? It’s based on extrapolation, and its one of the things that makes a semantic simulator qualitatively different than BC/IL because the generator you’re learning, the physics of the training data, is not constrained to reiterate subsets of the data or filtered versions of it. It can actually generate new trajectories based on the physics it infers from the training data.

Is it fair to call this out of distribution? Kind of unclear term. If you have general enough data, its everything in language distribution… what would a perfect simulator? Open questions.

An interesting analogy w/r/t the generalization and physics thing… when you take a hologram it allows you, if you reconstruct one of the parts of light, it can reconstruct the entire light field of the original setup. If oyu take the hologram of a lens it will look like a zone plate. But it allows you to do more than that - the zone plate will act like a lens, and you can use it as a lens because it learns the physics not just a snapshot. A simulator is doing that too, the mechanism is maybe the same maybe different, but because it only sees these examples it learns a physics that can simulate counterfactuals. A hologram isn’t just a. Recording of a light field, it’s actually a. Simulation of a lens.

Nice intuition: zooming into a smaller region of the space of possibilities. Wave function visualization. Moving version of zoom,

Quantifying curation