Feature Request: Add `ARGS_RAW`, `ARGS_GET_RAW`, `ARGS_POST_RAW`, and `ARGS_NAMES_RAW` Variables

## Summary

ModSecurity v2, v3 (libmodsecurity3), and Coraza all automatically URL-decode the values stored in the `ARGS`, `ARGS_GET`, `ARGS_POST`, and `ARGS_NAMES` collections before exposing them to rules — a behaviour inherited from Apache's argument parsing. These are often called **"cooked" variables**.

When a rule author applies `t:urlDecodeUni` or `t:urlDecode` on top of a cooked variable, **double URL decoding** occurs. This request proposes the addition of four complementary **"raw" variables** — `ARGS_RAW`, `ARGS_GET_RAW`, `ARGS_POST_RAW`, and `ARGS_NAMES_RAW` — that expose the original, un-decoded form of each argument, giving rule authors a reliable foundation to reason about encoding without hidden side effects.

This is a direct follow-up to issue #2118 (originally filed in the SpiderLabs/ModSecurity tracker) and includes a prototype patch authored by [@airween](https://github.com/airween) for ModSecurity v2:  
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1

---

## Problem Description

### 1. Variables are silently pre-decoded

When ModSecurity parses a URL-encoded query string or `application/x-www-form-urlencoded` body, the values stored in the standard argument collections are the **already URL-decoded** forms. For example, given:

```
GET /search?q=%3Cscript%3E HTTP/1.1
```

The value stored in `ARGS:q` is `<script>`, not the original `%3Cscript%3E`.

This behaviour mirrors how Apache and Nginx expose parsed arguments to modules, and has historically been a design goal. However, it creates a significant problem at the rule-writing level.

### 2. `t:urlDecodeUni` causes double decoding

Many CRS rules — and custom rules written by operators — apply `t:urlDecodeUni` to catch evasion attempts that rely on URL encoding:

```apache
SecRule ARGS "@rx (?i)<script" \
    "id:12345,phase:2,block,t:none,t:urlDecodeUni"
```

Since `ARGS` is already decoded, `t:urlDecodeUni` runs on an **already-decoded value**. This means:

| What the attacker sends | `ARGS` value (after engine decoding) | After `t:urlDecodeUni` |
|---|---|---|
| `%3Cscript%3E` | `<script>` | `<script>` ✅ (correct) |
| `%253Cscript%253E` (double-encoded) | `%3Cscript%3E` | `<script>` ✅ (correct by accident) |
| `%25script` | `%script` | `%script` ✅ (correct) |
| `search%20term` | `search term` | `search term` ✅ (correct) |

So far, so good. But consider a case from issue #807 in the CRS repository where an Apache user submitted a password containing a literal `%` sign:

| What the user sends | Apache decodes `%25` → `%` | `ARGS` value | After `t:urlDecodeUni` |
|---|---|---|---|
| `Secret%2500` | `Secret%00` | `Secret%00` | `Secret\x00` ❌ False positive — null byte! |

The user submitted `%25` (the URL encoding of `%`) and Apache decoded it to `%`. ModSecurity then received `%00` in `ARGS` and `t:urlDecodeUni` interpreted it as a null byte — even though the original input was entirely benign.

### 3. Security consequences of double decoding

The double-decoding problem has **three distinct security consequences**:

#### a) False Positives
Legitimate percent-signs in arguments (passwords, tokens, search queries containing `%`) can be treated as URL-encoded sequences after the engine has already decoded them, triggering incorrect rule matches.

#### b) Detection Gaps (False Negatives)
Rule authors who know about the pre-decoding may deliberately *omit* `t:urlDecodeUni` to avoid double decoding. This correctly avoids false positives but introduces a detection gap: a single-URL-encoded evasion attempt (e.g., `%3Cscript%3E` for `<script>`) would not be caught by those rules, because the engine hands `<script>` to the rule and the rule does not apply any further normalization.

#### c) Inconsistency Across Deployments
The degree of pre-decoding depends on the integration layer:

| Integration | ARGS pre-decoded? |
|---|---|
| ModSecurity + Apache | ✅ Yes (Apache decodes query strings) |
| ModSecurity + Nginx | ✅ Yes (Nginx decodes query strings) |
| libmodsecurity3 standalone | ⚠️ Depends on host application |
| Coraza + Caddy | ✅ Yes |
| Coraza + Envoy (proxy-wasm) | ✅ Yes |

This means a rule that behaves correctly on Apache may behave differently in a standalone libmodsecurity3 deployment, making portable rule development unreliable.

---

## Proposed Solution

Introduce four new variable collections that expose the **original, pre-parsing form** of each argument:

| New Variable | Corresponding Existing Variable | Contents |
|---|---|---|
| `ARGS_GET_RAW` | `ARGS_GET` | Raw query string argument values (URL-encoded form) |
| `ARGS_POST_RAW` | `ARGS_POST` | Raw `application/x-www-form-urlencoded` body values |
| `ARGS_RAW` | `ARGS` | Union of `ARGS_GET_RAW` and `ARGS_POST_RAW` |
| `ARGS_NAMES_RAW` | `ARGS_NAMES` | Raw argument *names* (URL-encoded form) |

The new variables must:

1. **Preserve the original encoded form** exactly as it arrived in the HTTP request, before any URL decoding by the engine or host web server.
2. **Support the same key-based access syntax** as their counterparts: `ARGS_RAW:fieldname`, `ARGS_GET_RAW:q`, etc.
3. **Support the same exclusion/inclusion syntax**: `!ARGS_RAW:safe_field`, `ARGS_RAW:/regex/`.
4. **Populate at the same phase** as their counterparts (phase 2 for POST body, phase 1 for GET args available in headers phase).
5. **Not be pre-decoded** by the engine — ever. The variable must contain the literal bytes from the wire.

### Prototype

[@airween](https://github.com/airween) produced a working prototype patch for ModSecurity v2 in 2019:  
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1

This demonstrates the implementation is feasible and provides a concrete reference for the v3 implementation.

---

## Usage Example

With the new variables, rule authors can write encoding-aware rules without ambiguity:

### Detecting URL-encoded XSS evasion against the raw value

```apache
# Use ARGS_RAW to inspect the pre-decoded value.
# t:urlDecodeUni is now safe to apply because ARGS_RAW has NOT been pre-decoded.
SecRule ARGS_RAW "@rx (?i)%3[cC]script" \
    "id:20001,phase:2,block,t:none,\
    msg:'URL-encoded XSS in raw argument',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"
```

### Detecting double URL encoding

```apache
# After one round of decoding on the RAW value, look for still-encoded sequences.
# This correctly detects %253C (double-encoded <) without false positives.
SecRule ARGS_RAW "@rx %25[0-9a-fA-F]{2}" \
    "id:20002,phase:2,block,t:none,t:urlDecodeUni,\
    msg:'Double URL Encoding Detected',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"
```

### OWASP CRS integration

The OWASP CRS currently uses `ARGS` with `t:urlDecodeUni` in many detection rules. With `ARGS_RAW` available, CRS could either:

1. **Switch detection rules to use `ARGS_RAW` + `t:urlDecodeUni`** — the transformation is then semantically meaningful and non-redundant regardless of the host integration.
2. **Add complementary rules using `ARGS_RAW`** for encoding-specific detections (double encoding, encoding of dangerous characters) without touching existing rules.

This resolves a long-standing portability issue in CRS rule development.

---

## Impact Assessment

### Backward Compatibility
No breaking change. The new variables are additive. All existing rules using `ARGS`, `ARGS_GET`, `ARGS_POST`, and `ARGS_NAMES` continue to work exactly as before.

### Performance
The raw values are available during HTTP request parsing before URL decoding — storing them requires one additional string copy per argument, comparable to the overhead of the existing collection. No additional parsing or transformation is required.

### Rule Portability
With `ARGS_RAW` available as a first-class variable, rule authors can write transformations that are **unambiguous regardless of the integration layer**. `t:urlDecodeUni` on `ARGS_RAW` means exactly one round of URL decoding, always, on every engine, on every web server.

---

## Related Issues and References

- **ModSecurity issue #2118** — Original report and CRS team discussion: https://github.com/owasp-modsecurity/ModSecurity/issues/2118
- **CRS issue #807** — False positive from double decoding of literal `%`: https://github.com/SpiderLabs/owasp-modsecurity-crs/issues/807
- **CRS PR #578** — Discussion about `urlDecodeUni` on ARGS collections: https://github.com/SpiderLabs/owasp-modsecurity-crs/pull/578
- **airween's v2 prototype patch**: https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1
- **Coraza parallel feature request**: https://github.com/corazawaf/coraza/issues/1491

---

## Requested Action

1. Implement `ARGS_RAW`, `ARGS_GET_RAW`, `ARGS_POST_RAW`, and `ARGS_NAMES_RAW` in libmodsecurity3, using the airween v2 patch as a reference implementation.
2. Update the ModSecurity Reference Manual to document the new variables, their semantics, and guidance on when to use them vs. the cooked variables.
3. Coordinate with the Coraza project to ensure the same variables are implemented with consistent semantics.
4. Notify the OWASP CRS team once the feature is available so CRS rules can be updated to take advantage of the improved encoding-aware variable access.


What the attacker sends	`ARGS` value (after engine decoding)	After `t:urlDecodeUni`
`%3Cscript%3E`	`<script>`	`<script>` ✅ (correct)
`%253Cscript%253E` (double-encoded)	`%3Cscript%3E`	`<script>` ✅ (correct by accident)
`%25script`	`%script`	`%script` ✅ (correct)
`search%20term`	`search term`	`search term` ✅ (correct)

New Variable	Corresponding Existing Variable	Contents
`ARGS_GET_RAW`	`ARGS_GET`	Raw query string argument values (URL-encoded form)
`ARGS_POST_RAW`	`ARGS_POST`	Raw `application/x-www-form-urlencoded` body values
`ARGS_RAW`	`ARGS`	Union of `ARGS_GET_RAW` and `ARGS_POST_RAW`
`ARGS_NAMES_RAW`	`ARGS_NAMES`	Raw argument names (URL-encoded form)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add `ARGS_RAW`, `ARGS_GET_RAW`, `ARGS_POST_RAW`, and `ARGS_NAMES_RAW` Variables #3501

Summary

Problem Description

1. Variables are silently pre-decoded

2. `t:urlDecodeUni` causes double decoding

3. Security consequences of double decoding

a) False Positives

b) Detection Gaps (False Negatives)

c) Inconsistency Across Deployments

Proposed Solution

Prototype

Usage Example

Detecting URL-encoded XSS evasion against the raw value

Detecting double URL encoding

OWASP CRS integration

Impact Assessment

Backward Compatibility

Performance

Rule Portability

Related Issues and References

Requested Action

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integration	ARGS pre-decoded?
ModSecurity + Apache	✅ Yes (Apache decodes query strings)
ModSecurity + Nginx	✅ Yes (Nginx decodes query strings)
libmodsecurity3 standalone	⚠️ Depends on host application
Coraza + Caddy	✅ Yes
Coraza + Envoy (proxy-wasm)	✅ Yes

Feature Request: Add ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW Variables #3501

Description

Summary

Problem Description

1. Variables are silently pre-decoded

2. t:urlDecodeUni causes double decoding

3. Security consequences of double decoding

a) False Positives

b) Detection Gaps (False Negatives)

c) Inconsistency Across Deployments

Proposed Solution

Prototype

Usage Example

Detecting URL-encoded XSS evasion against the raw value

Detecting double URL encoding

OWASP CRS integration

Impact Assessment

Backward Compatibility

Performance

Rule Portability

Related Issues and References

Requested Action

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Add `ARGS_RAW`, `ARGS_GET_RAW`, `ARGS_POST_RAW`, and `ARGS_NAMES_RAW` Variables #3501

2. `t:urlDecodeUni` causes double decoding