In the field of prompt engineering, controlling the output of a large language model (LLM) is crucial for creating reliable and predictable applications. A prompt stop sequence is a specific string of text or characters that signals the model to halt text generation. This powerful mechanism acts as a "stop sign," allowing a developer to precisely define the boundaries of a response and prevent the AI from generating irrelevant, repetitive, or excessive text. For example, in a question-and-answer format, using "Question:" as a stop sequence prevents the AI from generating a follow-up question after it has provided an answer.
Developer-Defined Stop Sequences
The most common way to control response termination is by providing custom stop sequences through an API. These are developer-defined strings that force the model to cease generation as soon as they are encountered. This technique is essential for enforcing a clean prompt format and ensuring the AI's output adheres to a desired prompt structure. The stop sequence itself is not included in the final output, making it a clean way to truncate text. Most APIs allow for multiple stop sequences to be defined in a single request.
| Custom Marker | Use Case | Outcome |
|---|---|---|
\n\n (double newline) |
Extracting a single paragraph. | The model stops after the first paragraph, preventing further text generation. |
User: |
Simulating a chat conversation. | Prevents the AI from "role-playing" the user and generating the next turn in the dialogue. |
### |
Separating distinct examples in few-shot prompting. | The model provides an answer and stops before creating another example, ensuring a concise response. |
Model-Native Termination: The EOS Token
Beyond custom markers, models have an inherent mechanism for stopping: the End-of-Sequence (EOS) token. This special token like <|endoftext|> is part of the model's vocabulary from its initial model training. The model learns to generate the EOS token when it determines a thought or response is logically complete. This allows the model to self-terminate naturally without an explicit stop sequence from the developer. However, relying solely on the EOS token can be unpredictable, as the model may not always generate it at the desired point.
Technical Constraints as Safeguards
In addition to semantic stop sequences, several technical parameters function as hard limits to control generation. While not as nuanced as stop sequences, these constraints are critical for managing costs, preventing runaway generation, and ensuring system stability. It is a best practice to use these as a fallback, even when implementing stop sequences.
| Mechanism | Description | Outcome |
|---|---|---|
| Max Tokens | A variable setting like maximum length that dictates the maximum number of tokens in the output. |
Forces a hard stop once the token count is reached, which can cut off responses mid-sentence. |
| Context Window | The total memory limit of the model for both the input prompt and the generated output. | If the conversation exceeds this limit, generation may fail or be truncated to prevent memory overflow. |
| Repetition Penalty | An algorithmic setting that penalizes the model for repeating words or phrases. | Discourages the model from getting stuck in repetitive loops, which can help prevent hallucinations and guide it toward a natural conclusion. |