TL;DR
Sanitization removes or encodes dangerous characters from user input. Unlike validation (which rejects bad input), sanitization cleans input so it can be safely used. For HTML output, encode special characters. For HTML content users submit, use a library like DOMPurify to strip dangerous tags. Always use parameterized queries for SQL. Sanitization is not a replacement for proper data handling.
The Simple Explanation
Users submit data. Some might include malicious code. Sanitization cleans that data before you use it. If a username field contains <script>alert('hack')</script>, sanitization might strip the tags or encode them so they display as text instead of executing.
Sanitization vs Validation
| Aspect | Validation | Sanitization |
|---|---|---|
| What it does | Checks if input is correct | Cleans input to make it safe |
| Bad input | Reject it | Modify it |
| Example | Email must have @ | Remove script tags |
| When to use | Always, first | After validation if needed |
Common Sanitization Tasks
HTML Output Encoding
// User input: alert('xss') // After encoding: // Displays as text, does not execute
// In React, automatic with JSX:
Rich Text / HTML Content
import DOMPurify from 'dompurify';
// User submits HTML content (like from a rich text editor) const clean = DOMPurify.sanitize(userHtml); // Removes script tags, event handlers, dangerous attributes
When Not to Rely on Sanitization
- SQL queries: Use parameterized queries, not escaping
- Shell commands: Avoid shell entirely, use library APIs
- File paths: Use allowlists, not sanitization
Sanitization is not foolproof. Attackers find bypasses. Use sanitization as defense-in-depth, not your only protection. Combine with validation, parameterized queries, and output encoding.
What is the difference between sanitization and validation?
Validation checks if input is correct (right format, expected values) and rejects bad input. Sanitization modifies input to make it safe (removing or encoding dangerous characters). Validation says yes or no. Sanitization cleans and allows. Use both: validate first, then sanitize if needed.
Should I sanitize input or output?
Both, but for different purposes. Sanitize input when you need to clean data for storage or processing. Encode output when rendering for a specific context (HTML, JavaScript, SQL). Output encoding is often more reliable because you know the exact context and can apply appropriate encoding.
Can sanitization replace parameterized queries?
No. Parameterized queries separate data from code structurally. Sanitization tries to clean data but can miss edge cases. Always use parameterized queries for SQL. Sanitization is a defense-in-depth measure, not a replacement for proper data handling.