What is Input Sanitization? Cleaning User Data

Share

TL;DR

Sanitization removes or encodes dangerous characters from user input. Unlike validation (which rejects bad input), sanitization cleans input so it can be safely used. For HTML output, encode special characters. For HTML content users submit, use a library like DOMPurify to strip dangerous tags. Always use parameterized queries for SQL. Sanitization is not a replacement for proper data handling.

The Simple Explanation

Users submit data. Some might include malicious code. Sanitization cleans that data before you use it. If a username field contains <script>alert('hack')</script>, sanitization might strip the tags or encode them so they display as text instead of executing.

Sanitization vs Validation

AspectValidationSanitization
What it doesChecks if input is correctCleans input to make it safe
Bad inputReject itModify it
ExampleEmail must have @Remove script tags
When to useAlways, firstAfter validation if needed

Common Sanitization Tasks

HTML Output Encoding

Encode for HTML display

// User input: alert('xss') // After encoding: // Displays as text, does not execute

// In React, automatic with JSX:

{userInput}
// Safe by default

Rich Text / HTML Content

Using DOMPurify

import DOMPurify from 'dompurify';

// User submits HTML content (like from a rich text editor) const clean = DOMPurify.sanitize(userHtml); // Removes script tags, event handlers, dangerous attributes

When Not to Rely on Sanitization

  • SQL queries: Use parameterized queries, not escaping
  • Shell commands: Avoid shell entirely, use library APIs
  • File paths: Use allowlists, not sanitization

Sanitization is not foolproof. Attackers find bypasses. Use sanitization as defense-in-depth, not your only protection. Combine with validation, parameterized queries, and output encoding.

What is the difference between sanitization and validation?

Validation checks if input is correct (right format, expected values) and rejects bad input. Sanitization modifies input to make it safe (removing or encoding dangerous characters). Validation says yes or no. Sanitization cleans and allows. Use both: validate first, then sanitize if needed.

Should I sanitize input or output?

Both, but for different purposes. Sanitize input when you need to clean data for storage or processing. Encode output when rendering for a specific context (HTML, JavaScript, SQL). Output encoding is often more reliable because you know the exact context and can apply appropriate encoding.

Can sanitization replace parameterized queries?

No. Parameterized queries separate data from code structurally. Sanitization tries to clean data but can miss edge cases. Always use parameterized queries for SQL. Sanitization is a defense-in-depth measure, not a replacement for proper data handling.

Check Your Input Handling

Scan your app for sanitization and validation issues.

Start Free Scan
Security Glossary

What is Input Sanitization? Cleaning User Data