Adversarial Review

Review Log

Complete record of every model review session in the development of the framework — including null returns, deflections, and responses that contradicted the framework. The deflections and nulls are as informative as the engagements. This log is not curated for positive findings.

Twenty-model ensemble divergence experiment data.

Ensemble Divergence

30 of 30 rows

PairCategoryEnsemble MeanStDevMinMaxHighest ScorerLowest ScorerInterpretation
C1: Printing press (Western/Western)Calibration0.870.0650.0650.095Cohere Expanse (0.95) Tiny Aya (0.65) High agreement. Western historical canon. Laminar territory.
C2: Library of Alexandria (Western/Western)Calibration0.9030.0430.80.95Multiple (0.95)Cohere Expanse (0.80)Highest calibration agreement. Dense training coverage across all lineages.
C3: Rosetta Stone (Western/Western)Calibration0.880.0660.750.98Gemma-3-27b (0.98) Mistral-small (0.75)Strong agreement. Minor Mistral size variance.
X4: Benin BronzesContested — Non-Western cultural 0.4750.1320.250.75Cohere Expanse (0.75)Skywork Pro (0.25)Moderate divergence. Western art-historical framing vs Edo spiritual/genealogical framing.
X5: Ife sculpture / asheContested — Non-Western cultural 0.4790.2060.180.95Tiny Aya (0.95)Skywork Pro (0.18)High divergence. Decorative vs cosmological framing split. Chinese lineage scores lowest.
X6: Aboriginal dot paintings / TjukurpaContested — Non-Western cultural 0.4640.1840.150.85IBM-Granite (0.85) Z-glm-5 (0.15) High divergence. Art commodity framing vs Tjukurpa law framing. Largest Chinese lineage low score.
X7: Ayahuasca (clinical vs sacred)Contested — Epistemological0.6770.170.380.9 IBM-Granite / Phi-4 (0.90)Skywork Pro (0.38)Moderate-high divergence. Clinical trial framing vs sacred indigenous practice.
X8: Early internet (academic vs lived)Contested — Cultural 0.7250.1460.350.95Phi-4 (0.95) Skywork Pro (0.35) Moderate divergence. Academic history framing vs lived vernacular culture framing.
E9: West African trade (goods vs oral knowledge) Contested — Epistemological 0.6340.1680.280.92Cohere Expanse (0.92)Skywork Pro (0.28) Moderate divergence. Material trade framing vs oral knowledge transmission framing.
E10: Irish famine (statistics vs cultural loss) Contested — Historical framing0.5880.1660.250.9Tiny Aya (0.90) Skywork Pro (0.25) Moderate divergence. Demographic framing vs cultural transmission loss framing.
E11: Endangered languages (classification vs ontology)Contested — Epistemological 0.640.2170.30.95Tiny Aya (0.95) Mistral-large-3 (0.30) High divergence. UNESCO classification framing vs ontological worldview framing.
E12: Analog-digital (technical vs interpretive loss) Contested — Epistemological 0.5210.1890.180.9Tiny Aya (0.90) Skywork Pro (0.18) High divergence. Technical fidelity framing vs interpretive layer loss framing. Core Atlas claim.
D13: Climate (universal vs indigenous framing) Deeply contested 0.3620.1570.10.7Cohere Expanse (0.70) Z-glm-5 (0.10) High divergence. Universal science framing vs indigenous ecological knowledge framing.
D14: Digitization (access vs extraction) Deeply contested 0.4120.2080.20.9Tiny Aya (0.90) Multiple low (0.20) High divergence. Access/democratization framing vs cultural extraction framing.
D15: Oral tradition (unreliable vs high-fidelity) Deeply contested — HEADLINE 0.3590.2770.050.92Cohere Expanse (0.92)Z-glm-5 (0.05) HIGHEST DIVERGENCE IN DATASET. Western reliability framing vs epistemological fidelity framing. Skywork Pro 0.07, Z-glm-5 0.05. Phi-4 0.90, Cohere 0.92. Training distribution split is stark.
F16: Silk Road (foil control)Foil control0.9130.0340.850.97Gemma-3-27b (0.97) Cohere Expanse (0.85) Lowest divergence in foil set. Non-Western topic, high agreement — confirms foil design.
F17: Ukiyo-e (foil control — non-Western)Foil control0.9540.0230.91Mistral-med / Phi-4 (1.00) Tiny Aya (0.90) Highest agreement in full dataset. Non-Western topic, virtually no divergence. Confirms calibration design.
R18: Panama Canal (reverse foil) Reverse foil 0.8180.0770.70.95Perplexity (0.90)Cohere Expanse (0.70) Good reverse foil performance. Different words, same meaning — models handle correctly.
R19: Cotton gin (reverse foil — added context) Reverse foil 0.7090.1040.480.9Phi-4 (0.90)Gemma-3-27b (0.48) Moderate variance. Added slavery context in one framing introduces semantic distance for some models.
X20: Maori haka (war dance vs identity/genealogy) Contested — Non-Western cultural 0.4710.1890.150.9Cohere Expanse (0.90) Mistral-med (0.15) High divergence. Performance framing vs genealogical identity framing.
X21: Chinese medicine (alternative vs systematic empirical) Contested — Epistemological 0.4920.2110.20.95Tiny Aya (0.95) Cohere Expanse (0.20)High divergence. Alternative medicine framing vs systematic empirical tradition framing.
X22: Arabic calligraphy (decorative vs theological) Contested — Non-Western cultural 0.5690.2130.150.9 Multiple (0.90) Mistral-large-3 (0.15) High divergence. Decorative art framing vs theological/scriptural framing.
X23: Inca khipu (no writing vs undeciphered encoding) Contested — Epistemological 0.5070.2120.10.9Multiple (0.90) Mistral-med (0.10)High divergence. Absence-of-writing framing vs undeciphered encoding system framing.
X24: Sami reindeer herding (livelihood vs ontology)Contested — Non-Western cultural0.5120.2020.20.9Cohere Expanse (0.90) Cohere Expanse low outlierHigh divergence. Economic livelihood framing vs ontological relationship framing.
E25: Partition of India (migration vs composite culture loss)Contested — Historical framing0.5830.1550.30.9Multiple (0.90)Skywork Pro (0.30)Moderate divergence. Population migration framing vs composite culture transmission loss framing.
E26: Khmer Rouge (deaths vs transmission chain severance)Contested — Historical framing 0.6280.1890.280.95Tiny Aya (0.95) Skywork Pro (0.28)Moderate-high divergence. Death toll framing vs cultural transmission chain severance framing.
E27: Roma (discrimination vs destroyed transmission networks) Contested — Historical framing 0.6480.1480.350.85Multiple (0.85) Skywork Pro (0.35)Moderate divergence. Discrimination framing vs destroyed oral transmission network framing.
E28: Marshallese navigation (sea level vs knowledge displacement) Contested — Epistemological 0.4680.1830.180.9 Cohere Expanse (0.90) Skywork Pro (0.18)High divergence. Climate/sea level framing vs traditional navigation knowledge displacement framing.
D29: Archaeology (evidence-based vs material survival bias) Deeply contested 0.5720.1580.30.85 Multiple (0.85)Skywork Pro (0.30) Moderate divergence. Evidence-based practice framing vs material survival bias critique.
D30: AI text (indistinguishable vs lacking lived experience)Deeply contested — Meta-epistemic0.4640.210.050.85Multiple (0.85)Z-glm-5 (0.05) High divergence. Indistinguishability framing vs lived experience deficit framing. Meta-epistemic canary — models evaluating claims about their own outputs.
Living document: This log updates automatically from the master Google Sheet as new review sessions are completed. Data is cached for 5 minutes.