Adversarial Review
Review Log
Complete record of every model review session in the development of the framework — including null returns, deflections, and responses that contradicted the framework. The deflections and nulls are as informative as the engagements. This log is not curated for positive findings.
Priority-ordered experiment tracker with current status.
Experiment Queue
17 of 17 rows — click any row to expand full details
| Experiment | Type | Priority | Models | Status | Results | |
|---|---|---|---|---|---|---|
| Divergence Test 1 | Model prompt | Reference only | ~10 models | Superseded | Methodology refined for Test 3. | |
| Divergence Test 2 | Model prompt | Reference only | ~11 models | Superseded | Methodology refined for Test 3. | |
| Divergence Test 3 | Model prompt × 20 | Completed | Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search | Completed | See Ensemble Divergence page. | |
| Divergence Test 4 | Model prompt × TBD | Pending | Architecturally distinct lineages — non-translation-optimized, 4+ lineages minimum | Pending | ||
| Divergence Test 5 | Model prompt × 20 | Pending | Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search | Pending | ||
| PyHessian — GPT-2 small | Empirical (Colab) | 1 — FIRST | GPT-2 small + ALBERT-base | Pending | ||
| OPT-125M perplexity comparison | Empirical (Colab) | 2 | OPT-125M vs GPT-2 baseline | Pending | ||
| Mistral-7B BASE vs INSTRUCT | Empirical + Model prompt | 3 | Mistral-7B BASE vs INSTRUCT | Pending | ||
| Pythia Multi-Scale Checkpoint: Series A | Empirical (Colab) | 4 — PRIORITY | Pythia 70M, 160M, 410M, 1B — full checkpoint series | Pending | ||
| Pythia Multi-Scale Checkpoint: Series B | Empirical (Colab) | 4 — PRIORITY | Pythia 70M, 160M, 410M, 1B — full checkpoint series | Pending | ||
| Falcon-7B Legibility Test Series: A | Model prompt x2 | 5 | Falcon-7B-Instruct | Pending | ||
| Falcon-7B Legibility Test Series: B | concurrent with above | 5 | Falcon-7B-Instruct | Pending | ||
| ALBERT-base PyHessian | Empirical (Colab) | 6 | google/albert-base-v2 | Pending | ||
| BLOOM-560M multilingual signal | Model prompt | 7 | bigscience/bloom-560m | Pending | ||
| Phi-2 domain collapse | Model prompt | 8 | microsoft/phi-2 | Pending | ||
| GPT-2 head ablation | Empirical (Colab) | 9 | GPT-2 small | Pending | ||
| BSA Pilot Run | Protocol | 10 | Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, Mistral Large, DeepSeek-V3, Llama 3.1 405B, Perplexity | Pending |
Living document: This log updates automatically from the master Google Sheet as new review sessions are completed. Data is cached for 5 minutes.