Framework
Loss Landscape Vocabulary Framework
v12 · April 2026 · Atlas Heritage Systems Inc. · Working document — not a finished product
Navigator Properties
The model's dynamic relationship to terrain — how it moves through, resists, accumulates history, and distributes probability mass. Conjugate to terrain properties: precise measurement of one axis structurally degrades precision in the other. Note: Skywork adversarial review identified that the seven qualifiers collapse to three independent variables (density, coupling, elasticity) plus four derived readouts (perplexity, probability, viscosity, memory).
Training data coverage across input space. Coarse means sparse coverage, weak gradient signal. Fine means dense coverage, steep well-defined valleys.
Kullback & Leibler (1951)
Average surprise. Exponential of cross-entropy. High perplexity marks unmapped or contested terrain. The scar tissue of turbulent training lives in high-perplexity regions of a deployed model.
Shannon (1948); Manning & Schütze (1999) ch.3
Output distribution sharpness at inference. High probability outputs correspond to sharp narrow valleys. Low probability outputs correspond to flat regions or saddle points.
Bishop (2006) pattern recognition ch.4
Inter-parameter dependency. How much moving one weight moves others. High coupling means parameter updates propagate widely. Causally determines viscosity via eigenvalue calculation.
Sagun et al. (2017); Dauphin et al. (2014)
Resistance to movement under gradient pressure. Icky means high resistance — flat wide minima, competing orientations persist longer. Determined by coupling via eigenvalue spectrum.
Keskar et al. (2016); Foret et al. (2020) sharpness-aware minimization
Restoring force toward prior weight configurations after perturbation. Catastrophic forgetting is total loss of elasticity.
Kirkpatrick et al. (2017); Krogh & Hertz (1992) weight decay
Path dependency encoded in weights. Not the weights themselves — the history of how the model traveled through the loss landscape during training. Cannot be measured independently from viscosity in a frozen deployed model.
Li et al. (2018); Goodfellow et al. (2014)