Prompt engineering is the quickest method to control model’s behaviour1. Therefore, should be strongly considered before finetuning.

Important

Always check the model’s card for guidance on prompting. Some models benefit from one technique, while other may decrease their performance, e.g. Chain of Thought in OpenAI’s o1.

The quickest way start designing the prompt is to use a prompt generator.

General content

A good prompt is similar to Standard Operating Procedure that uses 5W2H framework - is specific about the end goal, provides detailed step by step instruction, and provides additional context.

Sequence matters

Put context as the first element of the prompt and the query at the very bottom of it.

Examples

You may try to improve the performance by including one or a few examples (3-5) of the task and the result. Make sure that the examples are diverse enough, include edge cases, and challenges.

Chain of though (CoT) and reflection

If the task requires advanced analysis, consider prompting for thinking of the steps to take, and reflect on the output. You may ask for example prompt for Think step-by-step before performing each task and reflect on the result.

You can also provide explicit steps in the thinking to be taken or the elements of the reflection.

You can also check model outputs by asking to it explain the reasoning step by step.

This approach can be helpful in reaching conclusions about others’ reasoning by first asking the model to work out its own reasoning and compare it with the reasoning to be checked.

Grounding the response

Try to ground the response in the context by asking to quote word-for word the relevant parts of the context before performing the task.

It may be beneficial to ask the model to only use the knowledge from the provided context, and not from its general knowledge.

You may also try to ask to cite relevant quotes and sources.

Note that you can ask the model to verify the grounding after the response is generated, in the following prompt.

Allow for not knowing

Consider explicitly stating that if the answer is not known, the model should say “I don’t know”.

Structure

Providing a structure in the prompt may increase the performance as well. Consider using XML, Markdown, or JSON to structure the prompt. Example of parts of a structure (XML):

  • <document></document>
  • <examples></examples>
  • <thinking></thinking>
  • <instructions></instructions>
  • <formatting></formatting>
  • <query></query>
  • <answer></answer>

You may refer to specific sections in your structure by mentioning the name of it, e.g. Consider <criteria> in making your decision.

Role

Adding a role in the prompt may increase the quality of the output, e.g. You are a financial controller.. If possible, put this into a system part of the prompt.

Prefilled response

Prefilling part of the response may also improve the performance on the task. E.g.

{
    "object": null.
    "reason": "The reason for this object is"

Note

The above example shows that you can steer the model to produce JSON.

You can also add a role in the prefilled response. e.g. [Financial Controller].

Chaining of prompts

If the task requires many steps, consider splitting it into multiple prompts and pass the output from one to the other.

Best-of-N verification

You can send the same prompt to the model multiple times and select the answer that appears most often.

Summarization and filtering of context

If the task depends on the previous responses, you may try to distill (summarize) or filter responses to keep the focus or stay within context length. You may need to do this recursively to produce summaries of summaries.

Sometimes in order to summarize a section, knowledge of the previous sections is required to understand the context. You may prepend a running summary before the section to be summarized.

Check for missing content

The model may sometimes stop generating output prematurely, especially when working with large amounts of context. You can simply ask if anything is missing from the response.

Footnotes

  1. On the other hand, the quickest way to decrease cost and latency is to change the model to a smaller one.