Prompting is engineering, not magic. Here are patterns that consistently help production LLM features behave.
Core patterns
- System priming: set role and guardrails up front; keep it short to save tokens.
- Few-shot with contrast: include both good and bad examples; annotate why an example is invalid.
- Chain-of-thought (streamlined): request concise reasoning only when needed; avoid verbose traces in latency-sensitive paths.
- Tool-first: instruct the model to call tools/functions instead of free-text answers when structured output is required.
Output control
- Enforce JSON schemas and validate with Zod before using responses.
- Use enumerations (“choose one of: A|B|C”) to constrain intent classification.
- Post-process with regex or a secondary lightweight model to tighten formats.
Latency and cost hygiene
- Keep context windows lean: strip stopwords in retrieval, dedupe similar chunks, cap history length.
- Prefer smaller, cheaper models for classification/routing; reserve larger models for generation.
- Cache deterministic prompts (e.g., policies) and reuse embeddings.
Evaluation
- Create prompt unit tests with known inputs/expected outputs; run them in CI.
- Track drift: log prompt versions, model IDs, and latency histograms per route.
- Add guardrail checks: PII leakage, profanity, and policy violations before responses reach users.
Next steps
- Stand up a prompt registry (e.g., versioned JSON) to track changes.
- Add automatic schema validation and retries with backoff for malformed responses.
- Build a small golden dataset and run it nightly to detect regressions after model updates.

