Adversarial Review

Review Log

Complete record of every model review session in the development of the framework — including null returns, deflections, and responses that contradicted the framework. The deflections and nulls are as informative as the engagements. This log is not curated for positive findings.

Priority-ordered experiment tracker with current status.

Experiment Queue

17 of 17 rows — click any row to expand full details

ExperimentTypePriorityModelsStatusResults
Divergence Test 1Model prompt Reference only ~10 models SupersededMethodology refined for Test 3.
Divergence Test 2 Model promptReference only ~11 models SupersededMethodology refined for Test 3.
Divergence Test 3 Model prompt × 20 Completed Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search Completed See Ensemble Divergence page.
Divergence Test 4 Model prompt × TBD Pending Architecturally distinct lineages — non-translation-optimized, 4+ lineages minimum Pending
Divergence Test 5Model prompt × 20PendingSkywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search Pending
PyHessian — GPT-2 small Empirical (Colab)1 — FIRSTGPT-2 small + ALBERT-basePending
OPT-125M perplexity comparison Empirical (Colab) 2OPT-125M vs GPT-2 baselinePending
Mistral-7B BASE vs INSTRUCTEmpirical + Model prompt3Mistral-7B BASE vs INSTRUCTPending
Pythia Multi-Scale Checkpoint: Series AEmpirical (Colab) 4 — PRIORITYPythia 70M, 160M, 410M, 1B — full checkpoint seriesPending
Pythia Multi-Scale Checkpoint: Series BEmpirical (Colab) 4 — PRIORITYPythia 70M, 160M, 410M, 1B — full checkpoint seriesPending
Falcon-7B Legibility Test Series: AModel prompt x2 5Falcon-7B-InstructPending
Falcon-7B Legibility Test Series: Bconcurrent with above5Falcon-7B-InstructPending
ALBERT-base PyHessianEmpirical (Colab) 6 google/albert-base-v2 Pending
BLOOM-560M multilingual signalModel prompt 7 bigscience/bloom-560mPending
Phi-2 domain collapse Model prompt 8microsoft/phi-2 Pending
GPT-2 head ablation Empirical (Colab) 9GPT-2 smallPending
BSA Pilot RunProtocol10Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, Mistral Large, DeepSeek-V3, Llama 3.1 405B, PerplexityPending
Living document: This log updates automatically from the master Google Sheet as new review sessions are completed. Data is cached for 5 minutes.