Quoting And Encoding: Best Practices For Handling Untrusted Strings

Escaping Untrusted Inputs to Prevent Security Issues

Why Escaping Untrusted Strings Matters

Failing to properly escape untrusted strings can lead to injection attacks, where malicious input is interpreted as code or commands by an application. Real-world examples include SQL injection attacks that obtain unauthorized database access, cross-site scripting (XSS) vulnerabilities that execute scripts in a victim’s browser, and operating system command injection by concatenating strings into commands. These common weaknesses can compromise confidentiality, integrity, and availability of systems and data, with far-reaching business and legal implications.

Injection Attacks Overview

Injection attacks exploit the mishandling of untrusted inputs by treating them as executable code or commands. For example, an application might insert user input directly into an SQL query. By adding extra SQL syntax in their input, attackers can change the meaning of a query to obtain or modify unauthorized data. Similarly, passing untrusted strings without modification into operating system commands can give attackers execution privileges. XSS works by outputting malicious script payloads to web pages viewed by other users. In each case, the root cause is failing to properly delimit and escape untrusted strings before use in sensitive contexts.

Real-World Examples of Vulnerabilities

Major data breaches have resulted from injection attacks. In 2022, Uber disclosed a breach where an insider obtained credentials to access internal systems through social engineering. The attacker exploited an SQL injection flaw to query financial databases and exfiltrate user data. Dating back to 2009, SQL injection has been implicated in compromises impacting well-known companies from Equifax to Home Depot. Applications coded in PHP have proven particularly prone to injection, with WordPress and vBulletin XSS and command injection providing recent examples. Attackers actively scan for injection flaws and quickly weaponize them once discovered.

Table of Contents

Costs of Failures to Escape Properly

Exploited injection vulnerabilities impose substantial costs, both direct and indirect. Breaches lead to investigation and remediation expenses, legal and regulatory fines, and loss of customer trust. Equifax allocated $400 million just for breach-related costs in the year following its 2017 SQL injection-enabled compromise. Security researchers estimated the total long-term costs of major breaches over the past decade to exceed $2.7 trillion collectively. While intrinsic risks accompany retaining and transmitting data, proper encoding and escaping of untrusted strings provides a readily available layer of defense against advanced attacks using injection techniques.

Methods for Escaping Untrusted Strings

Using Prepared Statements for Databases

Prepared statements are the standard technique for safe insertion of external values into database queries via application code. Also referred to as parameterized queries, the database compiles prepared statements into efficient access plans, while treating data values passed in separately without interpretation. Binding variables to the prepared templates substitutes safely-encoded parameters into queries. Leading APIs, frameworks and ORMs provide built-in prepared statement features. Enforcing their use consistently thwarts complex SQL injection attack payloads by avoiding direct string concatenation into dangerous contexts.

Parameterizing OS Command Invocations

Similar to prepared statements for queries, functions invoking operating system commands should rely on predefined templates with separate argument arrays for safe parameterization. Environment variables provide another injection vector within shell commands. User inputs should undergo careful validation and encoding before admission into environment variables referenced insecurely within command strings. Where possible, developers should leverage platform-specific APIs that handle escaping properly. For one-off tasks, using language functions to scrub arguments and explicitly escape special characters also reduces the risks of concatenated inputs exploited through command injection techniques.

Encoding Special Characters in Outputs

Cross-site scripting flaws arise when applications fail to encode special characters on output, enabling injected scripts to execute in browsers viewing the pages. When dynamically generating HTML, PDFs, emails and other outputs, context-specific encoding that escapes <, >, &, “, ‘ and related characters foils XSS attack payloads from persisting into the rendered content. OWASP provides details on the escaping required to neutralize XSS vectors for various interpreter contexts. Auto-escaping templates and frameworks that contextualize outputs provide automation around output encoding defenses. Fortunately, a little escaping goes a long way towards blocking errant script injections through an application’s outputs.

Encoding Strings for Safety

HTML Encoding to Prevent XSS

HTML escaping encodes special characters into safe entity references like < that render harmlessly in browsers. For example, tagging suspect inputs as safe via frameworks may lead to vulnerability, whereas consistent escaping provides ongoing protection. Context-aware escaping functions like PHP’s htmlentities() and similar utilities in other languages handle the details reliably. Automated templating assists developers in properly utilizing context-specific encoding within the application stack. Encoding too early in workflows risks double-encoding, so escaping should target output contexts. HTML entity encoding manages valid data while blocking XSS attack vectors coveting those channels.

Percent Encoding for URLs and Queries

Percent encoding escapes non-alphanumeric characters into %xx hex representations. It protects against injection of unauthorized parameters into URLs, cookies, and API requests by encoding reserved special characters. Encoding converts spaces into %20, slashes into %2f, and so on as defined in web standards. JavaScript functions like encodeURI() handle edge cases around permitted unescaped chars. Striking the right balance with encoding timing remains important to prevent double-encoding but still block attacks. By percent encoding all external data flows into URL parameters and query strings, applications keep malicious inputs from altering their interpretation and linking behaviors.

Encoding for Filesystem Paths

Directory traversals sneak attack payloads into filesystem calls by encoding slash and backslash characters that access unauthorized files elsewhere on hosts. Fortunately, language APIs usually provide filepath validation and encoding options. Otherwise, allowing only whitelisted characters, stripping known malicious sequences, and escaping special characters all help restrict untrusted inputs from illegally targeting the local filesystem. Path normalization converts wonky encodings into canonical representations, while still safely handling unicode and spaces through escaping. Explicit validation reminds developers to apply combo encoding and filters against dubious filepath inputs prone to masking attacks.

Best Practices for Handling Untrusted Strings

Validate and Sanitize All Inputs

As the initial consumer of untrusted data, application input handling merits top tier defenses. Validating correct formats, lengths, and encoding; typecasting; and sanitizing should occur before acceptance. Scrubbing bad encodings, illegal characters, and known malicious byte sequences in protocols positions later deeper encoding to succeed. Normalizing and canonicalizing data to application-usable formats is also beneficial. Good input hygiene blocks injection payloads outright while salvaging valid inputs, flowing cleaned values into proper parameterized handling below.

Parameterize Instead of Concatenating

Executable strings require isolation from embedded external data via templating not concatenation. Prepared statements encapsulate database queries, with parameters tightly limited in reach. Similarly, OS commands should draw arguments solely from whitelisted arrays. Templating schemes work well for file path construction with validation. With boundaries defending injected payloads, encoding measures withstand attacks. Concatenation gives attackers control points to spoof context and insert payloads. Parameterization denies that, managing valid data without exposures.

Context Matters: Choose Encoding Method Wisely

Encodings operate on a specific interpreter context, escaping characters with significance to match. HTML encoding protects browsers but not databases or shells. Examining context flows in apps reveals the sinks requiring defense via encoding. For example, XSS targets browser rendering, requiring output encoding there. Path traversal arises in file access contexts instead. Encoding explicitly for sink context defends against vulnerabilities enabled there specifically. Encoding too early risks double encoding later. Matching encoding to context keeps interpretation consistent.

Use Framework Utilities Whenever Possible

Web frameworks and platform APIs incorporate robust built-in protections against injection designed for that environment. Utilizing framework escaping and validation functions raises the security posture considerably over ad hoc defenses. Context-aware templating encodes outputs while defending templates themselves by avoiding raw inclusion Syntax errors can still sneak past frameworks, however input filtering merits reliance on frameworks difficult to bypass. Extending framework defenses via input sanitization and operationalizing data flows offer a promising security blend with rapid development.

Example Code Snippets for Proper Encoding

Prepared SQL Statements

$stmt = $dbConnection->prepare('SELECT * FROM users WHERE id = ?');
$stmt->bind_param('i', $userId);
$stmt->execute();

Shell Command Parameterization

$username = escapeShellArg($userInput); 
$command = array('grep', '-i', $username, '/etc/passwd');
$process = new Process($command);
$process->run();

JavaScript Output Encoding

let blogTitle = DOMPurify.sanitize(title); 
let div = document.createElement('div');
div.innerText = blogTitle;
document.body.appendChild(div);

Common Mistakes to Avoid

Blindly Trusting User Input

Failing to recognize sources of untrusted strings lies at the root of injection dangers. Assume all inputs as contaminated until validation occurs. Neutralize payloads through escaping at sinks. Avoid raw inclusion into executable contexts. Distrust inputs, constrain copies to templated safe usage.

Forgetting to Encode for Specific Contexts

Encodings target an language or interpreter to escape correctly. XSS requires HTML entity encoding to block browser injections since DB encodings won’t help. Path traversals also need specialized handling. Identify data flows into vulnerable sinks and add tailored context encoding defenses. Otherwise injections manipulate intended logic.

Disabling Escaping for Convenience

Circumventing framework escaping or whitelisting to ease operations enables mistakes going undetected. Maintain default defenses, even if inconvenient initially. Rejecting dangerous chars should prompt redesign, not weakening. Disabling protective measures recklessly courts disaster. Keep context-aware encoding activated without exception.