- left off by saying: here are a whole bunch of things that should be capable of with GPT, now that we know we’re carving things out.
- In this section, we’re going to zoom in on the relationship between the input and the training data, to show how prompting can help us get different trajectories out of the training data.
- GPT learns a prior. Prompting is conditioning the prior
-Should be clear. What does it mean to condition the prior. The relationship between the input and the training data.
In the limit, it converges to … reiterate that this is interesting because it looks different in the limit than a utility maximizer.
Maybe at the end of this, note that this section talks about conditioning on the prior mechanistically with toy examples. But with GPT you’re doing something a lot more semantically. Can’t be as precise because of many factors of uncertainty with natural language - can only constrain so much? It’s based on extrapolation, and its one of the things that makes a semantic simulator qualitatively different than BC/IL because the generator you’re learning, the physics of the training data, is not constrained to reiterate subsets of the data or filtered versions of it. It can actually generate new trajectories based on the physics it infers from the training data.
Is it fair to call this out of distribution? Kind of unclear term. If you have general enough data, its everything in language distribution… what would a perfect simulator? Open questions.
An interesting analogy w/r/t the generalization and physics thing… when you take a hologram it allows you, if you reconstruct one of the parts of light, it can reconstruct the entire light field of the original setup. If oyu take the hologram of a lens it will look like a zone plate. But it allows you to do more than that - the zone plate will act like a lens, and you can use it as a lens because it learns the physics not just a snapshot. A simulator is doing that too, the mechanism is maybe the same maybe different, but because it only sees these examples it learns a physics that can simulate counterfactuals. A hologram isn’t just a. Recording of a light field, it’s actually a. Simulation of a lens.
Nice intuition: zooming into a smaller region of the space of possibilities. Wave function visualization. Moving version of zoom,
Quantifying curation
- A simulator is trained to predict the next token in various contexts, not in a void. “Input” context.
-
This context determines what (simulacra) the simulator simulates.
- For a very general simulator like GPT, it can select between “different” simulacra.
- Even though it can simulate many different environments, agents, etc., its constrained to act as this particular simulacrum.
- A reddit comment and a scientific article are both natural language but otherwise governed by very different dynamics
- (even) for GPT-infinity the context can select between properties like the effective beliefs and knowledge of the simulacra. GPT-infinity knows everything that can be inferred from the dataset, but it will still pretend to be stupid when simulating a student.
- A simulator doesn’t have to be general/”corrigible”, but being corrigible is an interesting property evoked by the word “simulator” which characterizes systems like GPT
- But most generally it is a constraint on the input context, which is not limited to choosing between what we would think of as distinct entities.
-
When a simulator generates rollouts, “actions” are appended in some way to the input. Thus updating its belief state each step. The simulation of a simulacrum is an iteratively updating context, hallucinated according to the transition rules of a simulator.
-
Ways to think of the prompt:
-
Constraint or “filter” on world model
-
An evolving “state” vector - simulacrum
-
Code interpreted by the simulator
-
Working memory
-
As it is simulating, a simulator is performing inference, trying to guess what comes next given the context. The nature of simulacra can thus be “controlled” by the context. What is the nature of this control, and what are its limits?
- The nature of this “control” is subtractive. Unlike
- For example
-
When thinking about the prompt as an evolving state, it may be the easiest to think of scope of control as being over the “current or past state” of the simulation. However, the context can exert control on any dimension that appears in the context of the training data, some of which may play a different metaphysical role.
- Types of control (non exhaustive):
- “current state” — the environment, the characters present and the contents of their minds
- metadata — a flag that selects context, like name of website
- future — the outcome/consequence of the trajectory
- semantic future conditioning
- actual future conditioning (due to structure of training data)
-
Future conditioning is one way to simulate goal directed processes, even when the base simulator is not goal-directed.
- DT example
- DT’s trajectories are random — not goal-directed at all.
- Thus trajectories it generates, faithful to the training data, do not intrinsically prefer any patterns...
- but there will be patterns in expected remaining reward. Outcome.
- By constraining context for high-reward outcomes we can extract high-scoring trajectories
- which exhibit goal directed behaviors
- and are competitive for the current state of RL!
- Scott Garabrant’s definition of agency
- Have we created an agent? Have we created an optimizer? Is this different from other ways of creating agents, like RL, and how?
- It is fundamentally different from RL, in that it doesn’t aim to create expected utility maximizers or “optimal” agents. Behavior is different in the limit of capability.
- Although the outcome may be trajectories which reliably accomplish a goal, they are actually acts of inference given a prior, not optimization relative to a goal.
- Biased DT example
- Grid example
- Conditioned inference converges to the Bayesian expectation, which may not assign the highest probability to the optimal trajectory for accomplishing/ensuring the conditioned outcome. The prior weigh in, and can overwhelm “optimality”.
- Utility maximization is answering the question of “how do i best accomplish this goal?” Whereas a conditional simulation answers the question “given that this is the outcome, what happened?”
- Consequences of this
- Normativity can be preserved
- Unlikely to take unlikely actions (unless condition is very unlikely)
- May not be optimal
- Not good for some kinds of goals in some kinds of simulators
- Carving out “agents” subtractively from a simulator which embodies a complicated world model (a model of normativity) allows us to extract goal-directed processes that act in a “reasonable” way (constrained by the prior), at the expense of optimality.
- Implications for Goodhart, complexity/fragility of value
- Problem in AI: we can only come up with a simple description of what we want. “Accomplish X”. But what we want isn’t actually simple; it exists against the ground of “normal reality” which we don’t want to discard altogether
- Subtractive specification allows you to start with a reality and add constraints. A simple constraint will not make the simulation become simple.