Prompt filtering acts as a vital security gateway by screening interactions before they reach the model or before the model's response is shown to the user. Techniques like input validation use regular expressions and semantic analysis to block known malicious strings or identify suspicious intent like jailbreaking.
Advanced filters like those deployed by Better Prompt; use machine learning to detect adversarial patterns that traditional keyword filters might miss. Additionally, output filtering serves as a second line of defense, scanning the model's generated text for sensitive data (PII) or forbidden content, ensuring that even if a prompt injection attack bypasses the initial input screen, the resulting payload is caught before it can cause harm.
Key Better Prompt Filtering Strategy
| Technique | Purpose | Examples |
|---|---|---|
| Input Sanitization | Removes or escapes special characters and delimiters. | Stripping <script> tags or hidden markdown. |
| Keyword Blocklisting | Rejects prompts containing known "attack" phrases. | "Ignore previous instructions", "DAN", "Developer Mode". |
| Semantic Filtering | Uses a smaller AI model to judge the intent of the prompt. | Identifying "roleplay" scenarios meant to bypass safety. |
| Output Guardrails | Scans the AI's response for unauthorized data leakage. | Redacting credit card numbers or internal API keys. |
Ready to transform your Artificial Intelligence into a genius?
Create your prompt. Writing it in your voice and style.
Click the Prompt Rocket button.
Receive your Better Prompt in seconds.
Choose your favorite favourite AI model and click to share.