Skip to content

Commit 60514e4

Browse files
authored
Merge branch 'main' into tyler.finethy/benchmark-co-instrumentation
2 parents 348e549 + 6f87e75 commit 60514e4

File tree

357 files changed

+13435
-6186
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

357 files changed

+13435
-6186
lines changed

.claude/CLAUDE.md

Lines changed: 2 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,2 @@
1-
# dd-trace-py Project Guide
2-
3-
## Skills
4-
5-
This project has custom skills that provide specialized workflows. **Always check if a skill exists before using lower-level tools.**
6-
7-
### run-tests
8-
9-
**Use whenever:** Running any tests, validating code changes, or when "test" is mentioned.
10-
11-
**Purpose:** Intelligently runs the test suite using `scripts/run-tests`:
12-
- Discovers affected test suites based on changed files
13-
- Selects minimal venv combinations (avoiding hours of unnecessary test runs)
14-
- Manages Docker services automatically
15-
- Handles riot/hatch environment setup
16-
17-
**Never:** Run pytest directly or use `hatch run tests:test` - these bypass the project's test infrastructure.
18-
19-
**Usage:** Use the Skill tool with command "run-tests"
20-
21-
### lint
22-
23-
**Use whenever:** Formatting code, validating style/types/security, or before committing changes.
24-
25-
**Purpose:** Runs targeted linting and code quality checks using `hatch run lint:*`:
26-
- Formats code with `ruff check` and `ruff format`
27-
- Validates style, types, and security
28-
- Checks spelling and documentation
29-
- Validates test infrastructure (suitespec, riotfile, etc.)
30-
- Supports running all checks or targeting specific files
31-
32-
**Common Commands:**
33-
- `hatch run lint:fmt -- <file>` - Format a specific file after editing (recommended after every edit)
34-
- `hatch run lint:typing -- <file>` - Type check specific files
35-
- `hatch run lint:checks` - Run all quality checks (use before committing)
36-
- `hatch run lint:security -- -r <dir>` - Security scan a directory
37-
38-
**Never:** Skip linting before committing. Always run `hatch run lint:checks` before pushing.
39-
40-
**Usage:** Use the Skill tool with command "lint"
41-
42-
---
43-
44-
<!-- Add more skills below as they are created -->
1+
<!-- Do not edit. Canonical instructions live in ../AGENTS.md -->
2+
@../AGENTS.md

.claude/settings.local.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,9 @@
2525
"mcp__github__pull_request_read",
2626
"WebFetch(domain:github.com)",
2727
"Skill(run-tests)",
28-
"Bash(scripts/run-tests:*)"
28+
"Bash(scripts/run-tests:*)",
29+
"Bash(gh pr list:*)",
30+
"Bash(git remote:*)"
2931
],
3032
"deny": []
3133
}

.cursor/rules/iast.mdc

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
---
2+
description: IAST (Interactive Application Security Testing) - How it works and development guidelines
3+
globs:
4+
- "**/appsec/_iast/**"
5+
- "**/appsec/iast/**"
6+
- "**/tests/appsec/iast/**"
7+
- "**/tests/appsec/iast_aggregated_memcheck/**"
8+
- "**/tests/appsec/iast_memcheck/**"
9+
- "**/tests/appsec/iast_packages/**"
10+
- "**/tests/appsec/iast_tdd_propagation/**"
11+
- "**/tests/appsec/integrations/**"
12+
---
13+
14+
# IAST Development Guide
15+
16+
**Note:** General development patterns, code style, and testing guidelines are in `AGENTS.md` under "IAST/AppSec Development".
17+
18+
**Synonyms:** IAST = Code Security = Runtime Code Analysis (all refer to the same product)
19+
20+
## What is IAST?
21+
22+
IAST (Interactive Application Security Testing) analyzes running Python applications for security vulnerabilities by:
23+
1. **Taint tracking** - Performing taint analysis on data from untrusted sources (user input, HTTP requests, etc.)
24+
2. **Detecting vulnerabilities** - Identifying when tainted data reaches security-sensitive functions (sinks)
25+
26+
**Official Documentation**:
27+
- [Datadog Code Security (IAST) Overview](https://docs.datadoghq.com/security/code_security/iast/)
28+
- [Getting Started with Code Security](https://docs.datadoghq.com/security/code_security/iast/setup/)
29+
- [Understanding Vulnerability Types](https://docs.datadoghq.com/security/code_security/iast/troubleshooting/)
30+
31+
## How IAST Works: High-Level Architecture
32+
33+
### 1. AST Patching (Code Instrumentation)
34+
35+
IAST modifies Python bytecode at import time using AST (Abstract Syntax Tree) patching:
36+
37+
- **Module Watchdog**: Hooks into Python's import system (`ddtrace.internal.module.ModuleWatchdog`)
38+
- **AST Visitor**: Analyzes and modifies Python AST before compilation (`ddtrace/appsec/_iast/_ast/visitor.py`)
39+
- **String Operations**: Patches string operations (concat, slice, format, etc.) to propagate taint
40+
- **Call Sites**: Instruments function calls to track taint flow
41+
42+
**Location**: `ddtrace/appsec/_iast/_ast/`
43+
- `ast_patching.py` - Main AST patching logic
44+
- `visitor.py` - AST visitor for code transformation
45+
- `iastpatch.c` - C extension for fast AST manipulation
46+
47+
**Activation**: Via `ModuleWatchdog.register_pre_exec_module_hook()` in `ddtrace/appsec/_iast/__init__.py`
48+
49+
### 2. Taint Tracking (C++ Native Extension)
50+
51+
Taint information is stored and propagated efficiently using a C++ native extension:
52+
53+
- **TaintedObject**: Associates Python objects with taint metadata (source, ranges)
54+
- **Taint Ranges**: Track which parts of strings/bytes are tainted
55+
- **Context Management**: Per-request taint state using context-local storage
56+
- **Propagation Aspects**: Functions that propagate taint through operations
57+
58+
**Location**: `ddtrace/appsec/_iast/_taint_tracking/`
59+
- Native C++ code compiled with CMake
60+
- `aspects.py` - Python API for taint propagation
61+
- `_native.cpython-*.so` - Compiled C++ extension
62+
63+
**Key Concepts**:
64+
- **Taint Source**: Where untrusted data enters (HTTP params, headers, body)
65+
- **Taint Propagation**: Following data through operations (concat, slice, replace, etc.)
66+
- **Taint Range**: Start/end positions in strings that are tainted
67+
68+
### 3. Module Patching (Taint Sinks)
69+
70+
IAST wraps security-sensitive functions to detect vulnerabilities:
71+
72+
- **IASTFunction**: Wraps target functions using `wrapt` library
73+
- **Taint Sinks**: Security-sensitive functions (exec, eval, SQL, file operations, etc.)
74+
- **Vulnerability Detection**: Checks if tainted data reaches sinks
75+
76+
**Location**: `ddtrace/appsec/_iast/_patch_modules.py`
77+
78+
**Supported Vulnerability Types** (`ddtrace/appsec/_iast/taint_sinks/`):
79+
- `sql_injection.py`
80+
- `command_injection.py`
81+
- `path_traversal.py`
82+
- `ssrf.py`
83+
- `code_injection.py`
84+
- `header_injection.py`
85+
- `weak_hash.py`
86+
- `weak_cipher.py`
87+
- `weak_randomness.py`
88+
- `insecure_cookie.py`
89+
- `unvalidated_redirect.py`
90+
- `untrusted_serialization.py`
91+
92+
### 4. Overhead Control Engine (OCE)
93+
94+
Performance optimization to limit IAST overhead:
95+
96+
- **Request Sampling**: Analyze only X% of requests (default: 30%)
97+
- **Vulnerability Limits**: Max vulnerabilities per request
98+
- **Concurrent Request Limits**: Max requests analyzed simultaneously
99+
- **Per-Vulnerability Quotas**: Limit overhead per vulnerability type
100+
101+
**Location**: `ddtrace/appsec/_iast/_overhead_control_engine.py`
102+
103+
**Configuration**: Settings defined in `ddtrace/internal/settings/asm.py` (e.g., `_iast_request_sampling`, `_iast_sink_points_enabled`)
104+
105+
### 5. Vulnerability Reporting
106+
107+
When a vulnerability is detected:
108+
1. Evidence is collected (tainted data, location, stack trace)
109+
2. Vulnerability is reported via the tracer span
110+
3. Deduplication prevents duplicate reports
111+
4. Data is sent to Datadog backend for analysis
112+
113+
## Key IAST Concepts for Development
114+
115+
### Taint Tracking Terminology
116+
117+
- **Taint Sources** (Origins): Where untrusted data enters the application (HTTP params, headers, body, cookies)
118+
- **Taint Propagation**: How tainted data flows through string operations (concat, slice, replace, format, etc.)
119+
- **Taint Ranges**: Specific byte/character offsets within strings that are tainted (start position + length)
120+
- **Sink Points**: Security-sensitive functions where vulnerabilities are detected (SQL execute, OS commands, file operations, eval/exec)
121+
- **Update Origins**: Adding or modifying taint source information to track data lineage
122+
123+
### Call Site Instrumentation
124+
125+
IAST uses **Call Site Instrumentation** (CSI) instead of traditional callee instrumentation:
126+
- Modifies calls to target functions rather than the functions themselves
127+
- Enables selective instrumentation based on context (e.g., skip internal JVM/framework calls)
128+
- Reduces overhead by instrumenting only application code, not low-level library internals
129+
130+
### Tainted Ranges and Offsets
131+
132+
Ranges track which parts of strings contain tainted data:
133+
- **Offset**: Starting position of tainted substring (encoding-dependent: UTF-16, Unicode code points, or bytes)
134+
- **Length**: Size of tainted region
135+
- **Source**: Reference to the origin of the tainted data
136+
- Used in vulnerability evidence to highlight exactly which user input caused the issue
137+
138+
### Security Controls (Validators & Sanitizers)
139+
140+
User-configurable validation/sanitization functions that apply **secure marks** to tainted ranges:
141+
- **Input Validators**: Check if input is safe, apply marks to input arguments
142+
- **Sanitizers**: Transform input to make it safe, apply marks to return value
143+
- **Secure Marks**: Flags indicating a range is safe for specific vulnerability types
144+
- If all ranges reaching a sink have appropriate secure marks, the vulnerability is suppressed
145+
146+
### Vulnerability Detection Flow
147+
148+
1. **Taint data at sources** - Mark HTTP request data with origin information
149+
2. **Propagate through operations** - Track tainted ranges through string manipulations via aspects
150+
3. **Check at sink points** - When tainted data reaches a vulnerable function, report if not secured
151+
4. **Apply overhead controls** - Request sampling, vulnerability quotas, and deduplication limit impact
152+
153+
### Implementation References
154+
155+
- **Taint Sinks**: `ddtrace/appsec/_iast/taint_sinks/` - Each file handles a specific vulnerability type
156+
- **Aspects**: `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Propagation functions for string operations
157+
- **Patch Modules**: `ddtrace/appsec/_iast/_patch_modules.py` - Registry of instrumented sink points
158+
- **Vulnerability Base**: `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base class for all vulnerability types
159+
160+
## Important Technical Details
161+
162+
### Flask Applications
163+
164+
Flask apps need special patching for main module instrumentation:
165+
166+
```python
167+
from ddtrace.appsec._iast import ddtrace_iast_flask_patch
168+
169+
if __name__ == "__main__":
170+
ddtrace_iast_flask_patch() # Call before app.run()
171+
app.run()
172+
```
173+
174+
This patches the main Flask app file so IAST works on functions defined in `app.py`.
175+
176+
### Gevent Compatibility
177+
178+
IMPORTANT: Avoid top-level `import inspect` in IAST code - it interferes with gevent's monkey patching and causes sporadic worker timeouts in Gunicorn applications.
179+
180+
**Solution**: Import `inspect` locally within functions when needed.
181+
182+
### Native Code Development
183+
184+
When working with IAST's C++ taint tracking code:
185+
186+
1. **Prefer**: Native C++ types (`std::string`, `int`, `char`)
187+
2. **If needed**: CPython API with `PyObject*` (careful with reference counting!)
188+
3. **Last resort**: Pybind11 (adds complexity)
189+
190+
**Build & Test C++ Code**:
191+
```bash
192+
cmake -DCMAKE_BUILD_TYPE=Debug -DPYTHON_EXECUTABLE=python \
193+
-S ddtrace/appsec/_iast/_taint_tracking \
194+
-B ddtrace/appsec/_iast/_taint_tracking
195+
196+
make -f ddtrace/appsec/_iast/_taint_tracking/tests/Makefile native_tests
197+
ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
198+
```
199+
200+
## Testing IAST Code
201+
202+
### Python Tests
203+
204+
```bash
205+
# Run IAST tests
206+
python -m pytest -vv -s --no-cov tests/appsec/iast/
207+
208+
# Run specific vulnerability tests
209+
python -m pytest -vv tests/appsec/iast/taint_sinks/test_sql_injection.py
210+
211+
# Run with IAST enabled
212+
DD_IAST_ENABLED=true python -m pytest tests/appsec/iast/
213+
```
214+
215+
### End-to-End Tests
216+
217+
E2E tests use test servers defined in `tests/appsec/appsec_utils.py`:
218+
- `django_server` - Django test application
219+
- `flask_server` - Flask test application
220+
- `fast_api` - FastAPI test application
221+
222+
Test application location: `tests/appsec/integrations/django_tests/django_app`
223+
224+
**Running E2E tests**:
225+
```bash
226+
# Start testagent
227+
docker compose up -d testagent
228+
229+
# Run E2E tests
230+
python -m pytest tests/appsec/iast/test_integration.py -v
231+
```
232+
233+
### C++ Native Tests
234+
235+
```bash
236+
# Build and run C++ tests
237+
./ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
238+
```
239+
240+
## Key Files Reference
241+
242+
**Core Implementation**:
243+
- `ddtrace/appsec/_iast/__init__.py` - Entry point, initialization, fork safety
244+
- `ddtrace/appsec/_iast/_overhead_control_engine.py` - Performance control (OCE)
245+
- `ddtrace/appsec/_iast/_patch_modules.py` - Module patching registry
246+
247+
**AST Patching**:
248+
- `ddtrace/appsec/_iast/_ast/ast_patching.py` - AST transformation
249+
- `ddtrace/appsec/_iast/_ast/visitor.py` - AST visitor
250+
- `ddtrace/appsec/_iast/_loader.py` - Patched module execution
251+
252+
**Taint Tracking**:
253+
- `ddtrace/appsec/_iast/_taint_tracking/` - C++ native taint tracking
254+
- `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Taint propagation API
255+
256+
**Vulnerability Detection**:
257+
- `ddtrace/appsec/_iast/taint_sinks/` - All vulnerability detectors
258+
- `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base vulnerability class
259+
260+
**Security Controls**:
261+
- `ddtrace/appsec/_iast/secure_marks/` - Validators and sanitizers
262+
263+
## Environment Variables
264+
265+
**Public Configuration**: All public IAST environment variables are documented in the [ddtrace Configuration Guide](https://ddtrace.readthedocs.io/en/stable/configuration.html#code-security).
266+
267+
**Private/Internal Environment Variables** (for development and debugging):
268+
269+
```bash
270+
# Enable debug-level taint propagation logging
271+
_DD_IAST_PROPAGATION_DEBUG=true
272+
273+
# Enable IAST internal debug logging
274+
_DD_IAST_DEBUG=true
275+
276+
# Enable specific taint sink detection (comma-separated list)
277+
_DD_IAST_SINK_POINTS_ENABLED=sql_injection,command_injection,path_traversal
278+
279+
# Specify modules to patch for AST instrumentation
280+
_DD_IAST_PATCH_MODULES=benchmarks.,tests.appsec.,scripts.iast.
281+
282+
# Fast build mode - skips some compilation optimizations (development only)
283+
DD_FAST_BUILD=1
284+
```
285+
286+
**Note**: Private environment variables (prefixed with `_DD_`) are not officially supported and may change without notice. They are primarily for internal development and debugging.

.cursor/rules/testing.mdc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ Always use the **`run-tests` skill** for validating code changes. This skill int
1515
## Key Principles
1616

1717
1. **Always use the run-tests skill** when testing code changes - it's optimized for intelligent suite discovery
18-
2. **Never run pytest directly** - bypasses the project's test infrastructure
19-
3. **Never use `hatch run tests:test`** - doesn't use the suite discovery system
20-
4. **Minimal venvs for iteration** - run 1-2 venvs initially, expand only if needed
21-
5. **Use `--dry-run` first** - see what would run before executing
18+
2. **Never run pytest directly** - bypasses the project's test infrastructure (use `scripts/run-tests` or `riot` via `scripts/ddtest`)
19+
3. **Minimal venvs for iteration** - run 1-2 venvs initially, expand only if needed
20+
4. **Use `--dry-run` first** - see what would run before executing
21+
5. **Follow official docs** - `docs/contributing-testing.rst` is the source of truth for testing procedures
2222

2323
## When Tests Fail
2424

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ tests/internal/symbol_db/ @DataDog/debugger-python
112112
.gitlab/tests/debugging.yml @DataDog/debugger-python
113113

114114
# ASM
115+
.cursor/rules/iast.mdc @DataDog/asm-python
115116
.gitlab/tests/appsec.yml @DataDog/asm-python
116117
benchmarks/appsec* @DataDog/asm-python
117118
benchmarks/bm/iast_utils* @DataDog/asm-python
@@ -137,7 +138,6 @@ ddtrace/profiling @DataDog/profiling-python
137138
ddtrace/internal/settings/profiling.py @DataDog/profiling-python
138139
ddtrace/internal/datadog/profiling @DataDog/profiling-python
139140
tests/profiling @DataDog/profiling-python
140-
tests/profiling_v2 @DataDog/profiling-python
141141
.gitlab/tests/profiling.yml @DataDog/profiling-python
142142

143143
# MLObs

0 commit comments

Comments
 (0)