Method / Prompt Framing Best Practices
Prompt Framing Best Practices
Developed from extended multi-model adversarial review sessions, framework development across 27+ revision cycles, and empirical observation of model failure modes across GPT, Claude, Gemini, Mistral, DeepSeek, Grok, Llama, Skywork, and Perplexity Sonar.
Living document — updates as practice develops.
1. Fundamentals
A prompt is not a question sent to an oracle. It is a field that magnetizes the model's output distribution toward a region of its probability landscape. Every word in a prompt is weighting the next token prediction toward or away from specific regions. You are not asking. You are steering.
Context injection loads prior information into the model's active window before the prompt fires. It shifts the probability landscape the prompt then operates on. Injecting the wrong context is worse than injecting no context — it pre-magnetizes the model toward the wrong attractor before your prompt arrives.
The stylistic register of your prompt activates corresponding probability regions in the model. A politely worded request activates the helpful-elaboration gradient. A directive, closed prompt activates the task-completion gradient. Choose the register that serves the output you need, not the register that feels socially comfortable.
2. Do's
Negations are processed as activations. Lead with the positive specification.
The closing line weights the context window toward the final register encountered. Close in the task vocabulary, not social convention. 'The lossyscape will be here' keeps weighting toward the framework domain. 'Thanks, this was helpful' shifts to social-assistant register.
Models weight earlier tokens more heavily. Put the core task in the first sentence. Put exclusions and format requirements after.
'Find where this is wrong' produces better adversarial responses than 'evaluate this critically.' Falsification is a specific cognitive task. Evaluation is a category that includes praise, summary, and extension-suggestion.
'List three alternative explanations, each in one sentence' will produce more usable output than asking for content and hoping the model structures it correctly.
'Given that you were trained on WebText, where are your highest-uncertainty domains?' produces more specific output than 'where are you uncertain?'
First prompt: establish domain and vocabulary. Second: establish specific task. Third: the actual work. Dumping all context in one block lets the model pattern-match to the most statistically salient portion and ignore the rest.
Explicitly marking contested claims and open problems prevents the model from treating working assumptions as established facts.
Divergence between models is as informative as the content of any single response. A finding that appears across multiple independent lineages is more likely to reflect something real.
Interpretation is lossy. Paste verbatim first. Interpret second. Keep them in separate columns or files.
3. Don'ts
'I hope you can help me with this' activates the agreement-and-validation gradient before you've stated the task. Start with the task.
'Evaluate this critically' is a category that includes compliments, suggestions, summary, and extensions. 'Find where this breaks' is a specific falsification operation.
These activate the validation register. Ask the model to identify what in the existing literature most closely resembles it instead.
A 10,000 token injection for a 500-token task means 9,500 tokens of noise competing with your signal.
'What would need to be true for this analogy to hold formally, and what evidence would falsify it?' is better than 'Is this analogy correct?'
'Would you like me to explore X?' means the model is waiting for a low-resistance gradient. Close the exit.
A model can produce a grammatically perfect, confidently stated response that is factually wrong. Fluency is not a signal of correctness.
GPT, Claude, Gemini, and Mistral train on overlapping corpora. Agreement is correlated evidence from similar distributions, not independent confirmation.
A model that produces a confident, fluent deflection will pass as engagement if you are not paying full attention. Spread sessions. Read responses twice.
Record the failed version and failure mode before moving to a revision. The sequence of attempts is data.
4. Context Injection Packages
Domain vocabulary (50-150 words), task specification (1-3 sentences), exclusion constraints (1-5 items), assessment standard (optional, 2-4 sentences).
Background motivation, history of the project, social framing, open-ended invitations, version history. None of these help the model do the task.
1. Domain vocabulary / key terms. 2. The specific material. 3. Task specification. 4. Exclusion constraints. 5. Assessment standard. Do not put the task last.
Simple adversarial review: 500-1500 tokens. Complex framework review: 1500-4000 tokens. Full document review: 4000-8000 tokens. Maximum useful: ~8000 tokens. Beyond this, attention dilution degrades engagement quality.
5. Register Fidelity
Generic closings shift the model's context weighting toward the social register at exactly the moment it matters most — just before any follow-up prompt. Close in the vocabulary of the task. 'The lossyscape will be here' keeps the domain active. 'Thanks for your help' closes in social convention.
Opening with 'Hello' or 'I hope you're doing well' activates the social-assistant register before you've stated anything. The model's first tokens are already in the wrong gradient. Start with the task or domain vocabulary.
Over a long multi-session project, register drift is cumulative. Each session that opens with social preamble and closes with social convention moves the model toward the helpful-elaboration gradient. Read prior findings before starting a new session to re-anchor domain vocabulary.