Skip to content

Feature Request: Add ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW Variables #3501

@fzipi

Description

@fzipi

Summary

ModSecurity v2, v3 (libmodsecurity3), and Coraza all automatically URL-decode the values stored in the ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES collections before exposing them to rules — a behaviour inherited from Apache's argument parsing. These are often called "cooked" variables.

When a rule author applies t:urlDecodeUni or t:urlDecode on top of a cooked variable, double URL decoding occurs. This request proposes the addition of four complementary "raw" variablesARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW — that expose the original, un-decoded form of each argument, giving rule authors a reliable foundation to reason about encoding without hidden side effects.

This is a direct follow-up to issue #2118 (originally filed in the SpiderLabs/ModSecurity tracker) and includes a prototype patch authored by @airween for ModSecurity v2:
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1


Problem Description

1. Variables are silently pre-decoded

When ModSecurity parses a URL-encoded query string or application/x-www-form-urlencoded body, the values stored in the standard argument collections are the already URL-decoded forms. For example, given:

GET /search?q=%3Cscript%3E HTTP/1.1

The value stored in ARGS:q is <script>, not the original %3Cscript%3E.

This behaviour mirrors how Apache and Nginx expose parsed arguments to modules, and has historically been a design goal. However, it creates a significant problem at the rule-writing level.

2. t:urlDecodeUni causes double decoding

Many CRS rules — and custom rules written by operators — apply t:urlDecodeUni to catch evasion attempts that rely on URL encoding:

SecRule ARGS "@rx (?i)<script" \
    "id:12345,phase:2,block,t:none,t:urlDecodeUni"

Since ARGS is already decoded, t:urlDecodeUni runs on an already-decoded value. This means:

What the attacker sends ARGS value (after engine decoding) After t:urlDecodeUni
%3Cscript%3E <script> <script> ✅ (correct)
%253Cscript%253E (double-encoded) %3Cscript%3E <script> ✅ (correct by accident)
%25script %script %script ✅ (correct)
search%20term search term search term ✅ (correct)

So far, so good. But consider a case from issue #807 in the CRS repository where an Apache user submitted a password containing a literal % sign:

What the user sends Apache decodes %25% ARGS value After t:urlDecodeUni
Secret%2500 Secret%00 Secret%00 Secret\x00 ❌ False positive — null byte!

The user submitted %25 (the URL encoding of %) and Apache decoded it to %. ModSecurity then received %00 in ARGS and t:urlDecodeUni interpreted it as a null byte — even though the original input was entirely benign.

3. Security consequences of double decoding

The double-decoding problem has three distinct security consequences:

a) False Positives

Legitimate percent-signs in arguments (passwords, tokens, search queries containing %) can be treated as URL-encoded sequences after the engine has already decoded them, triggering incorrect rule matches.

b) Detection Gaps (False Negatives)

Rule authors who know about the pre-decoding may deliberately omit t:urlDecodeUni to avoid double decoding. This correctly avoids false positives but introduces a detection gap: a single-URL-encoded evasion attempt (e.g., %3Cscript%3E for <script>) would not be caught by those rules, because the engine hands <script> to the rule and the rule does not apply any further normalization.

c) Inconsistency Across Deployments

The degree of pre-decoding depends on the integration layer:

Integration ARGS pre-decoded?
ModSecurity + Apache ✅ Yes (Apache decodes query strings)
ModSecurity + Nginx ✅ Yes (Nginx decodes query strings)
libmodsecurity3 standalone ⚠️ Depends on host application
Coraza + Caddy ✅ Yes
Coraza + Envoy (proxy-wasm) ✅ Yes

This means a rule that behaves correctly on Apache may behave differently in a standalone libmodsecurity3 deployment, making portable rule development unreliable.


Proposed Solution

Introduce four new variable collections that expose the original, pre-parsing form of each argument:

New Variable Corresponding Existing Variable Contents
ARGS_GET_RAW ARGS_GET Raw query string argument values (URL-encoded form)
ARGS_POST_RAW ARGS_POST Raw application/x-www-form-urlencoded body values
ARGS_RAW ARGS Union of ARGS_GET_RAW and ARGS_POST_RAW
ARGS_NAMES_RAW ARGS_NAMES Raw argument names (URL-encoded form)

The new variables must:

  1. Preserve the original encoded form exactly as it arrived in the HTTP request, before any URL decoding by the engine or host web server.
  2. Support the same key-based access syntax as their counterparts: ARGS_RAW:fieldname, ARGS_GET_RAW:q, etc.
  3. Support the same exclusion/inclusion syntax: !ARGS_RAW:safe_field, ARGS_RAW:/regex/.
  4. Populate at the same phase as their counterparts (phase 2 for POST body, phase 1 for GET args available in headers phase).
  5. Not be pre-decoded by the engine — ever. The variable must contain the literal bytes from the wire.

Prototype

@airween produced a working prototype patch for ModSecurity v2 in 2019:
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1

This demonstrates the implementation is feasible and provides a concrete reference for the v3 implementation.


Usage Example

With the new variables, rule authors can write encoding-aware rules without ambiguity:

Detecting URL-encoded XSS evasion against the raw value

# Use ARGS_RAW to inspect the pre-decoded value.
# t:urlDecodeUni is now safe to apply because ARGS_RAW has NOT been pre-decoded.
SecRule ARGS_RAW "@rx (?i)%3[cC]script" \
    "id:20001,phase:2,block,t:none,\
    msg:'URL-encoded XSS in raw argument',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"

Detecting double URL encoding

# After one round of decoding on the RAW value, look for still-encoded sequences.
# This correctly detects %253C (double-encoded <) without false positives.
SecRule ARGS_RAW "@rx %25[0-9a-fA-F]{2}" \
    "id:20002,phase:2,block,t:none,t:urlDecodeUni,\
    msg:'Double URL Encoding Detected',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"

OWASP CRS integration

The OWASP CRS currently uses ARGS with t:urlDecodeUni in many detection rules. With ARGS_RAW available, CRS could either:

  1. Switch detection rules to use ARGS_RAW + t:urlDecodeUni — the transformation is then semantically meaningful and non-redundant regardless of the host integration.
  2. Add complementary rules using ARGS_RAW for encoding-specific detections (double encoding, encoding of dangerous characters) without touching existing rules.

This resolves a long-standing portability issue in CRS rule development.


Impact Assessment

Backward Compatibility

No breaking change. The new variables are additive. All existing rules using ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES continue to work exactly as before.

Performance

The raw values are available during HTTP request parsing before URL decoding — storing them requires one additional string copy per argument, comparable to the overhead of the existing collection. No additional parsing or transformation is required.

Rule Portability

With ARGS_RAW available as a first-class variable, rule authors can write transformations that are unambiguous regardless of the integration layer. t:urlDecodeUni on ARGS_RAW means exactly one round of URL decoding, always, on every engine, on every web server.


Related Issues and References


Requested Action

  1. Implement ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW in libmodsecurity3, using the airween v2 patch as a reference implementation.
  2. Update the ModSecurity Reference Manual to document the new variables, their semantics, and guidance on when to use them vs. the cooked variables.
  3. Coordinate with the Coraza project to ensure the same variables are implemented with consistent semantics.
  4. Notify the OWASP CRS team once the feature is available so CRS rules can be updated to take advantage of the improved encoding-aware variable access.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions