Background and shared assumptions

Meta: Over the past few months, we've held a seminar series on the Simulators theory by janus. As the theory is actively under development, the purpose of the series is to discover central structures and open problems. Our aim with this sequence is to share some of our discussion with a broader audience, and to encourage new research on the questions we uncover. Below, we outline the broader rationale and shared assumptions of the participants of the seminar.

Aligning AI is a crucial task that needs to be addressed as AI systems rapidly become more capable. The core part of the alignment problem involves "deconfusing," which entails conceptual work and engineering, identifying unknown unknowns, and transitioning from philosophy to mathematics to algorithms to engineering. The problem is complex because we have to reason about something that doesn't yet exist. However, this does not mean that we should ignore evidence as it emerges. It is essential to carefully consider the GPT paradigm as it is being developed and implemented. One feasible-seeming approach is "accelerating alignment," which involves leveraging AI as it is developed to help solve the challenging problems of alignment. This is not a novel idea, as it has been previously suggested in concepts such as seed AI, nanny AI, and iterated amplification and distillation (IDA).

A fruitful way to think about GPT is
- GPT is a simulator (i.e. a model trained with predictive loss on a self-supervised dataset)
- The entities simulated by GPT are simulacra (agentic or non-agentic; different objective than the simulator)
Terminology has appropriate connotations
- GPT is not (per se) an oracle, genie, agentic, …
- All GPT “cares about” is simulating/modeling the training distribution
  - log-loss is a proper scoring rule
Simulator framing facilitates thinking about
- inner (mis)alignment
- sources of danger
- limit of self-supervised learning
- How exactly does simulation work?
  - greedy or /w planning
  - role of uncertainty
  - Simulation and physical law
  - Evidential simulation
- initial conditions and boundary conditions
  - conditional simulations
  - conditioned inference and subtractive specification
  - Conditioning Generative Models
- How can we use a simulator to solve alignment?
  - what are pitfalls?
  - Strategy For Conditioning Generative Models
- What can/should we do today?
  - Hidden Incentives for Auto-Induced Distributional Shift

cop