-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Summary
ModSecurity v2, v3 (libmodsecurity3), and Coraza all automatically URL-decode the values stored in the ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES collections before exposing them to rules — a behaviour inherited from Apache's argument parsing. These are often called "cooked" variables.
When a rule author applies t:urlDecodeUni or t:urlDecode on top of a cooked variable, double URL decoding occurs. This request proposes the addition of four complementary "raw" variables — ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW — that expose the original, un-decoded form of each argument, giving rule authors a reliable foundation to reason about encoding without hidden side effects.
This is a direct follow-up to issue #2118 (originally filed in the SpiderLabs/ModSecurity tracker) and includes a prototype patch authored by @airween for ModSecurity v2:
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1
Problem Description
1. Variables are silently pre-decoded
When ModSecurity parses a URL-encoded query string or application/x-www-form-urlencoded body, the values stored in the standard argument collections are the already URL-decoded forms. For example, given:
GET /search?q=%3Cscript%3E HTTP/1.1
The value stored in ARGS:q is <script>, not the original %3Cscript%3E.
This behaviour mirrors how Apache and Nginx expose parsed arguments to modules, and has historically been a design goal. However, it creates a significant problem at the rule-writing level.
2. t:urlDecodeUni causes double decoding
Many CRS rules — and custom rules written by operators — apply t:urlDecodeUni to catch evasion attempts that rely on URL encoding:
SecRule ARGS "@rx (?i)<script" \
"id:12345,phase:2,block,t:none,t:urlDecodeUni"Since ARGS is already decoded, t:urlDecodeUni runs on an already-decoded value. This means:
| What the attacker sends | ARGS value (after engine decoding) |
After t:urlDecodeUni |
|---|---|---|
%3Cscript%3E |
<script> |
<script> ✅ (correct) |
%253Cscript%253E (double-encoded) |
%3Cscript%3E |
<script> ✅ (correct by accident) |
%25script |
%script |
%script ✅ (correct) |
search%20term |
search term |
search term ✅ (correct) |
So far, so good. But consider a case from issue #807 in the CRS repository where an Apache user submitted a password containing a literal % sign:
| What the user sends | Apache decodes %25 → % |
ARGS value |
After t:urlDecodeUni |
|---|---|---|---|
Secret%2500 |
Secret%00 |
Secret%00 |
Secret\x00 ❌ False positive — null byte! |
The user submitted %25 (the URL encoding of %) and Apache decoded it to %. ModSecurity then received %00 in ARGS and t:urlDecodeUni interpreted it as a null byte — even though the original input was entirely benign.
3. Security consequences of double decoding
The double-decoding problem has three distinct security consequences:
a) False Positives
Legitimate percent-signs in arguments (passwords, tokens, search queries containing %) can be treated as URL-encoded sequences after the engine has already decoded them, triggering incorrect rule matches.
b) Detection Gaps (False Negatives)
Rule authors who know about the pre-decoding may deliberately omit t:urlDecodeUni to avoid double decoding. This correctly avoids false positives but introduces a detection gap: a single-URL-encoded evasion attempt (e.g., %3Cscript%3E for <script>) would not be caught by those rules, because the engine hands <script> to the rule and the rule does not apply any further normalization.
c) Inconsistency Across Deployments
The degree of pre-decoding depends on the integration layer:
| Integration | ARGS pre-decoded? |
|---|---|
| ModSecurity + Apache | ✅ Yes (Apache decodes query strings) |
| ModSecurity + Nginx | ✅ Yes (Nginx decodes query strings) |
| libmodsecurity3 standalone | |
| Coraza + Caddy | ✅ Yes |
| Coraza + Envoy (proxy-wasm) | ✅ Yes |
This means a rule that behaves correctly on Apache may behave differently in a standalone libmodsecurity3 deployment, making portable rule development unreliable.
Proposed Solution
Introduce four new variable collections that expose the original, pre-parsing form of each argument:
| New Variable | Corresponding Existing Variable | Contents |
|---|---|---|
ARGS_GET_RAW |
ARGS_GET |
Raw query string argument values (URL-encoded form) |
ARGS_POST_RAW |
ARGS_POST |
Raw application/x-www-form-urlencoded body values |
ARGS_RAW |
ARGS |
Union of ARGS_GET_RAW and ARGS_POST_RAW |
ARGS_NAMES_RAW |
ARGS_NAMES |
Raw argument names (URL-encoded form) |
The new variables must:
- Preserve the original encoded form exactly as it arrived in the HTTP request, before any URL decoding by the engine or host web server.
- Support the same key-based access syntax as their counterparts:
ARGS_RAW:fieldname,ARGS_GET_RAW:q, etc. - Support the same exclusion/inclusion syntax:
!ARGS_RAW:safe_field,ARGS_RAW:/regex/. - Populate at the same phase as their counterparts (phase 2 for POST body, phase 1 for GET args available in headers phase).
- Not be pre-decoded by the engine — ever. The variable must contain the literal bytes from the wire.
Prototype
@airween produced a working prototype patch for ModSecurity v2 in 2019:
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1
This demonstrates the implementation is feasible and provides a concrete reference for the v3 implementation.
Usage Example
With the new variables, rule authors can write encoding-aware rules without ambiguity:
Detecting URL-encoded XSS evasion against the raw value
# Use ARGS_RAW to inspect the pre-decoded value.
# t:urlDecodeUni is now safe to apply because ARGS_RAW has NOT been pre-decoded.
SecRule ARGS_RAW "@rx (?i)%3[cC]script" \
"id:20001,phase:2,block,t:none,\
msg:'URL-encoded XSS in raw argument',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"Detecting double URL encoding
# After one round of decoding on the RAW value, look for still-encoded sequences.
# This correctly detects %253C (double-encoded <) without false positives.
SecRule ARGS_RAW "@rx %25[0-9a-fA-F]{2}" \
"id:20002,phase:2,block,t:none,t:urlDecodeUni,\
msg:'Double URL Encoding Detected',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"OWASP CRS integration
The OWASP CRS currently uses ARGS with t:urlDecodeUni in many detection rules. With ARGS_RAW available, CRS could either:
- Switch detection rules to use
ARGS_RAW+t:urlDecodeUni— the transformation is then semantically meaningful and non-redundant regardless of the host integration. - Add complementary rules using
ARGS_RAWfor encoding-specific detections (double encoding, encoding of dangerous characters) without touching existing rules.
This resolves a long-standing portability issue in CRS rule development.
Impact Assessment
Backward Compatibility
No breaking change. The new variables are additive. All existing rules using ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES continue to work exactly as before.
Performance
The raw values are available during HTTP request parsing before URL decoding — storing them requires one additional string copy per argument, comparable to the overhead of the existing collection. No additional parsing or transformation is required.
Rule Portability
With ARGS_RAW available as a first-class variable, rule authors can write transformations that are unambiguous regardless of the integration layer. t:urlDecodeUni on ARGS_RAW means exactly one round of URL decoding, always, on every engine, on every web server.
Related Issues and References
- ModSecurity issue 'urlDecode|urlDecodeUni' transformations replaces the decoded strings #2118 — Original report and CRS team discussion: 'urlDecode|urlDecodeUni' transformations replaces the decoded strings #2118
- CRS issue Use of uninitialized value $includedir in concatenation (.) or string at (eval 9) line 1. #807 — False positive from double decoding of literal
%: Using a URL Encoded Percent sign, followed by hex digits other than 20-7e produces a false positive SpiderLabs/owasp-modsecurity-crs#807 - CRS PR libinjection sync to v3.8.0 #578 — Discussion about
urlDecodeUnion ARGS collections: Add urlDecodeUni() operation to ARG/ARGS_NAMES SpiderLabs/owasp-modsecurity-crs#578 - airween's v2 prototype patch: https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1
- Coraza parallel feature request: Add
ARGS_RAW,ARGS_GET_RAW,ARGS_POST_RAW, andARGS_NAMES_RAWCollections corazawaf/coraza#1491
Requested Action
- Implement
ARGS_RAW,ARGS_GET_RAW,ARGS_POST_RAW, andARGS_NAMES_RAWin libmodsecurity3, using the airween v2 patch as a reference implementation. - Update the ModSecurity Reference Manual to document the new variables, their semantics, and guidance on when to use them vs. the cooked variables.
- Coordinate with the Coraza project to ensure the same variables are implemented with consistent semantics.
- Notify the OWASP CRS team once the feature is available so CRS rules can be updated to take advantage of the improved encoding-aware variable access.