Loss Landscape Vocabulary Framework

v12 · April 2026 · Atlas Heritage Systems Inc. · Working document — not a finished product

Navigator Properties

The model's dynamic relationship to terrain — how it moves through, resists, accumulates history, and distributes probability mass. Conjugate to terrain properties: precise measurement of one axis structurally degrades precision in the other. Note: Skywork adversarial review identified that the seven qualifiers collapse to three independent variables (density, coupling, elasticity) plus four derived readouts (perplexity, probability, viscosity, memory).

Densitycoarse / fine

Training data coverage across input space. Coarse means sparse coverage, weak gradient signal. Fine means dense coverage, steep well-defined valleys.

D_KL(P_data ‖ P_model)
Flagged: KL divergence measures model inadequacy to data, not parameter-space coverage directly

Kullback & Leibler (1951)

Perplexityhigh / low

Average surprise. Exponential of cross-entropy. High perplexity marks unmapped or contested terrain. The scar tissue of turbulent training lives in high-perplexity regions of a deployed model.

PP(W) = P(w₁...wN)^(−1/N) = 2^H(W)

Shannon (1948); Manning & Schütze (1999) ch.3

Probabilityhigh / low

Output distribution sharpness at inference. High probability outputs correspond to sharp narrow valleys. Low probability outputs correspond to flat regions or saddle points.

softmax P(x_i) = exp(z_i/T) / Σexp(z_j/T)

Bishop (2006) pattern recognition ch.4

Couplinghigh / low

Inter-parameter dependency. How much moving one weight moves others. High coupling means parameter updates propagate widely. Causally determines viscosity via eigenvalue calculation.

H_ij = ∂²L/∂θ_i∂θ_j (off-diagonal Hessian) Approx: Fisher information matrix F

Sagun et al. (2017); Dauphin et al. (2014)

Viscosityicky / not-icky

Resistance to movement under gradient pressure. Icky means high resistance — flat wide minima, competing orientations persist longer. Determined by coupling via eigenvalue spectrum.

Hessian eigenvalue spectrum {λ_i} Low λ = flat = icky; High λ = sharp = not-icky
Flagged: eigenvalue spectrum is causally determined by coupling — not an independent variable

Keskar et al. (2016); Foret et al. (2020) sharpness-aware minimization

Elasticitystretchy / not-stretchy

Restoring force toward prior weight configurations after perturbation. Catastrophic forgetting is total loss of elasticity.

L2: λ‖θ‖² EWC: L_total = L_task + λΣF_i(θ_i − θ_i*)²

Kirkpatrick et al. (2017); Krogh & Hertz (1992) weight decay

Memorydeep / shallow

Path dependency encoded in weights. Not the weights themselves — the history of how the model traveled through the loss landscape during training. Cannot be measured independently from viscosity in a frozen deployed model.

training trajectory integral; basin selection via initialization and curriculum
Flagged: memory/viscosity distinguishability test unperformed — central open experiment

Li et al. (2018); Goodfellow et al. (2014)