Empirical Work / Experiment Queue

Experiment Queue

Priority-ordered list of pending experiments with falsification criteria, methods, and current status. Updates automatically from the master Google Sheet.

Experiment Queue

Click any row to expand full method and falsification criterion.

17 of 17 rows — click any row to expand full details

Experiment	Type	Priority	Models	Status	Results
Divergence Test 1	Model prompt	Reference only	~10 models	Superseded	Methodology refined for Test 3.
Divergence Test 2	Model prompt	Reference only	~11 models	Superseded	Methodology refined for Test 3.
Divergence Test 3	Model prompt × 20	Completed	Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search	Completed	See Ensemble Divergence page.
Divergence Test 4	Model prompt × TBD	Pending	Architecturally distinct lineages — non-translation-optimized, 4+ lineages minimum	Pending
Divergence Test 5	Model prompt × 20	Pending	Skywork Pro, DeepSeek 3.2, Qwen 3.5-max, Qwen3.5-122b, Cohere Expanse, IBM-Granite-h-small, Tiny Aya, Z-glm-5, Claude Opus 4.5, GPT-5.2, Gemini 3.1 Pro, Perplexity, Mistral-large-3, Mistral-med-2505, Mistral-small-2603, Llama 3.3 70B, Gemma-3-27b, Grok 4.2, Phi-4, SciFact-search	Pending
PyHessian — GPT-2 small	Empirical (Colab)	1 — FIRST	GPT-2 small + ALBERT-base	Pending
OPT-125M perplexity comparison	Empirical (Colab)	2	OPT-125M vs GPT-2 baseline	Pending
Mistral-7B BASE vs INSTRUCT	Empirical + Model prompt	3	Mistral-7B BASE vs INSTRUCT	Pending
Pythia Multi-Scale Checkpoint: Series A	Empirical (Colab)	4 — PRIORITY	Pythia 70M, 160M, 410M, 1B — full checkpoint series	Pending
Pythia Multi-Scale Checkpoint: Series B	Empirical (Colab)	4 — PRIORITY	Pythia 70M, 160M, 410M, 1B — full checkpoint series	Pending
Falcon-7B Legibility Test Series: A	Model prompt x2	5	Falcon-7B-Instruct	Pending
Falcon-7B Legibility Test Series: B	concurrent with above	5	Falcon-7B-Instruct	Pending
ALBERT-base PyHessian	Empirical (Colab)	6	google/albert-base-v2	Pending
BLOOM-560M multilingual signal	Model prompt	7	bigscience/bloom-560m	Pending
Phi-2 domain collapse	Model prompt	8	microsoft/phi-2	Pending
GPT-2 head ablation	Empirical (Colab)	9	GPT-2 small	Pending
BSA Pilot Run	Protocol	10	Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, Mistral Large, DeepSeek-V3, Llama 3.1 405B, Perplexity	Pending

Living document: This queue updates automatically as experiments are designed, run, and completed. Data is cached for 5 minutes.