Framework / Open Problems

Open Problems

Unresolved questions in the framework — what remains open, what is partially resolved, and what would close each gap. This page updates as experiments complete and adversarial review sessions identify new problems.

Open

in progress

Resolved

Memory / Viscosity Distinguishability

CriticalOpen

Memory and viscosity are defined as distinct navigator properties — memory is path history encoded in weights, viscosity is current Hessian eigenvalue spectrum. In a frozen deployed model these are not independently measurable. The Pythia checkpoint series is the only currently available experimental setup that can attack this problem: compare Hessian eigenvalue structure across checkpoints that differ only in training stage. If viscosity at checkpoint T fully predicts behavior at checkpoint T+n independent of path, memory is not an independent variable.

What would close this

Pythia multi-scale checkpoint experiment (Priority 4 in experiment queue). Run coupling probe and PyHessian at multiple checkpoints across multiple scales. If coupling at T+n always outperforms T in predicting final behavior, memory is redundant as a separate variable.

Li et al. (2018); Goodfellow et al. (2014)

PyHessian Confirmation of Coupling → Viscosity Collapse

CriticalOpen

The Skywork qualifier collapse hierarchy — that coupling causally determines viscosity via the Hessian eigenvalue spectrum — is a mathematical argument confirmed by a 2×2 Hessian demonstration. The empirical test requires computing actual off-diagonal Hessian entries and actual eigenvalue spectrum in GPT-2, then measuring whether the attention correlation proxy correlates with the eigenvalue spectrum. Until this is done, the collapse hierarchy is unconfirmed empirically.

What would close this

PyHessian run on GPT-2 small (Priority 1 in experiment queue). Per-layer correlation between attention probe values and Hessian eigenvalue density. If correlation > 0.8 across layers, proxy is valid. If low or non-monotonic, proxy and formal coupling are distinct quantities.

Sagun et al. (2017); Dauphin et al. (2014); PyHessian — github.com/amirgholami/PyHessian

Restoring Force for Harmonics

HighIn progress

Harmonics requires a restoring force to produce oscillation around a reference state. Curvature eigenvalues describe local geometry but do not themselves generate a restoring force. Two partial resolutions exist: (1) Millidge (2023) — flat basin dominance is volumetric, not mechanical. SGD ends up in flat regions because they occupy exponentially more parameter space volume. This explains why training ends up in flat regions but not oscillatory behavior. (2) Active inference — Friston's free energy minimization produces oscillatory dynamics around posterior modes via bidirectional prediction-error loop. Intrinsic oscillatory mechanism, not regularization. Whether either bridges to formal frequency-matching between training update rhythm and Hessian eigenvalue spectrum remains open.

What would close this

Formal bridge from active inference oscillatory mechanism to gradient descent loss landscape context. Question to put to Millidge: does the active inference oscillatory mechanism translate? Connection between predictive coding E/M step alternation and Hessian eigenvalue frequency-matching.

Millidge (2023) flat minima volume; Friston active inference; Smith (2018) superconvergence; Kirkpatrick et al. (2017) EWC

Archaeological Signal vs Out-of-Distribution Distinction

HighOpen

The framework claims high-perplexity regions in a deployed model are archaeologically significant — evidence of unresolved training territory. But perplexity is also high for out-of-distribution inputs the model simply hasn't seen, rare n-gram combinations, domain shifts, and adversarial inputs. No formal criterion currently exists to distinguish 'the model visited this territory and couldn't resolve it' from 'the model hasn't seen this for reasons unrelated to training topology.'

What would close this

Pythia checkpoint comparison (Priority 4). Archaeological signal should stabilize over training checkpoints as the model repeatedly encounters and partially resolves the territory. OOD noise should shift randomly across checkpoints. A falsifiable criterion: perplexity trajectory over training that distinguishes the two cases.

Song et al. (2024) prediction before plasticity; Nalisnick et al. (2019) do deep generative models know what they don't know

Basin Connectivity

MediumOpen

None of the seven navigator qualifiers can describe basin connectivity — the minimum loss barrier between two parameter-space locations. All seven qualifiers are locally defined at a point. Basin connectivity is a global property. The framework cannot describe whether two basins are separated by a low or high barrier, which matters for understanding whether training in one basin preserves or destroys signal from another.

What would close this

Addition of a basin connectivity qualifier. Formal definition: B(θA,θB) = min_φ max_t L(φ(t)) − max(L(θA),L(θB)). Empirical measurement requires linear mode connectivity analysis (Entezari et al., 2022).

Entezari et al. (2022) role of permutation invariance; Draxler et al. (2018) essentially no barriers

Symmetry Orbits

MediumOpen

All seven navigator qualifiers are constant across weight-space symmetry orbits — permutation of neurons within a layer produces functionally identical models at different parameter-space locations. The ablation drift vector is particularly damaged by this: the measured displacement may be entirely within the symmetry orbit rather than reflecting genuine functional change.

What would close this

Scope clarification: explicit statement that the framework describes equivalence classes of weight configurations, not specific parameter vectors. Formal bound on symmetry orbit size included in ablation drift vector discussion.

Entezari et al. (2022); Kornblith et al. (2019) CKA

Phase Transitions / Grokking

MediumOpen

Grokking demonstrates catastrophic behavioral change while the loss surface remains smooth — a phase transition in representational geometry invisible to loss-surface descriptors. The framework describes weight-space geometry, not representational geometry. Grokking is therefore invisible to the current vocabulary.

What would close this

Scope boundary acknowledgment: the framework explicitly does not describe representational geometry. Addition of a note that phase transitions in representational geometry may occur independently of loss-surface topology changes.

Power et al. (2022) grokking; Neel Nanda et al. (2023) progress measures for grokking

Heisenberg Conjugacy — Formal Grounding

ResolvedResolved

The terrain/navigator conjugacy was originally proposed as a formal analog to the Heisenberg uncertainty principle. Skywork adversarial review (April 2026) correctly identified that the formal uncertainty principle requires non-commuting operators in a Hilbert space — no equivalent non-commutativity exists in the loss landscape.

What would close this

Resolved. Reframed as structural analogy with genuine methodological content: stationarity and movement are incompatible measurement conditions. Not formally derivable from loss landscape geometry but real and useful as a methodological constraint.

Skywork adversarial review, April 2026