Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ec31dbf
feat(importers): add async-wait import execution mode
valentijnscholten Jun 13, 2026
c60168d
feat(importers): make scan_added notifications dedup-aware
valentijnscholten Jun 14, 2026
3df12e9
refactor(importers): await dedup before all import notifications
valentijnscholten Jun 14, 2026
210c7bf
refactor: rename import_execution_mode to deduplication_execution_mode
valentijnscholten Jun 14, 2026
dc3f4f4
refactor: rename DD_IMPORT_ASYNC_WAIT_TIMEOUT to DD_DEDUPLICATION_ASY…
valentijnscholten Jun 14, 2026
ca1826c
refactor: keep block_execution as global switch, deduplication_execut…
valentijnscholten Jun 14, 2026
40cd011
docs: 2.60 upgrade notes for deduplication_execution_mode
valentijnscholten Jun 14, 2026
cf47aab
fix(importers): honor profile deduplication_execution_mode for UI imp…
valentijnscholten Jun 14, 2026
0e01d50
test: revert set_block_execution to the block_execution checkbox
valentijnscholten Jun 14, 2026
bc51061
refactor(migrations): split into schema add + data seed (2 migrations)
valentijnscholten Jun 14, 2026
336cbcc
test: use versioned_fixtures so dedup mode tests pass under V3_FEATUR…
valentijnscholten Jun 14, 2026
1533c15
test(perf): add async_wait deduplication performance test
valentijnscholten Jun 14, 2026
8f6f82b
fix(migrations): renumber import-execution-mode migrations onto dev (…
valentijnscholten Jun 25, 2026
6d40a21
fix(importers): drop ignore_result=False override on post_process_fin…
valentijnscholten Jun 25, 2026
af54343
test(e2e): selenium integration test for async_wait deduplication mode
valentijnscholten Jun 25, 2026
45ed913
fix(importers): make async_wait actually join via per-dispatch ignore…
valentijnscholten Jun 26, 2026
b708a44
test(dedupe): replace Selenium async_wait test with real-worker API g…
valentijnscholten Jun 26, 2026
c43a6db
test(dedupe): make async_wait integration test deterministic via gate…
valentijnscholten Jun 26, 2026
00533b4
test(dedupe): scope async_wait dedup delay to the test's findings, wi…
valentijnscholten Jun 27, 2026
1fec8dc
test(dedupe): log WARN when the test-only dedup delay fires
valentijnscholten Jun 27, 2026
c843098
test(dedupe): assert only deduplication_complete for the async control
valentijnscholten Jun 27, 2026
8620a3a
test(dedupe): note async_wait eager test is not a real guard
valentijnscholten Jun 28, 2026
65bb7b5
docs(dedupe): document import/reimport deduplication execution mode
valentijnscholten Jun 28, 2026
5b35236
feat(importers): log async_wait deduplication wait duration
valentijnscholten Jun 28, 2026
10dffaf
chore: stop tracking local-only files committed by mistake
valentijnscholten Jun 28, 2026
311ee6a
fix(migrations): renumber dedup-execution-mode migrations onto curren…
Maffooch Jun 30, 2026
1eb6fdb
test(perf): bump async_wait dedup query baseline to match measured co…
Maffooch Jun 30, 2026
91e8a0b
Merge branch 'dev' into import-execution-mode
Maffooch Jun 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,5 @@ docs/.hugo_build.lock
# claude etc
MEMORY.md
.claude/
CLAUDE.md
CLAUDE.local.md
6 changes: 6 additions & 0 deletions docker-compose.override.integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@ services:
environment:
DD_DATABASE_URL: ${DD_TEST_DATABASE_URL:-postgresql://defectdojo:defectdojo@postgres:5432/test_defectdojo}
DD_V3_FEATURE_LOCATIONS: ${DD_V3_FEATURE_LOCATIONS:-False}
# Delay deduplication batches so the async_wait integration test can
# deterministically distinguish a blocking join (async_wait) from a
# non-blocking one (async). Scoped by _FILTER to that test's findings so
# other dedupe tests are unaffected. Integration-test stack only; never prod.
DD_DEDUPLICATION_BATCH_PROCESS_TEST_DELAY: 10
DD_DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER: "async_wait finding"
initializer:
environment:
PYTHONWARNINGS: error # We are strict about Warnings during testing
Expand Down
43 changes: 41 additions & 2 deletions docs/content/en/open_source/upgrading/2.60.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,45 @@
title: 'Upgrading to DefectDojo Version 2.60.x'
toc_hide: true
weight: -20260601
description: No special instructions.
description: New deduplication execution mode for import/reimport.
---
There are no special instructions for upgrading to 2.60.x. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.60.0) for the contents of the release.

## Deduplication execution mode for import/reimport

This release adds a new `deduplication_execution_mode` setting that controls how
import/reimport deduplication post-processing is dispatched and whether the API
response waits for it. It can be set per user (profile) and overridden per request
on the import and reimport endpoints.

Modes:

- `async` (default): deduplication and the rest of post-processing are dispatched
to the background and the response returns immediately. This is the historical
behavior; nothing changes for existing users.
- `async_wait`: post-processing is still dispatched to the background, but the
request waits for deduplication to finish before responding. As a result the
`scan_added` notification and the statistics in the import/reimport response
reflect the deduplicated state (findings that turned out to be duplicates are
no longer counted/listed as new). JIRA push, product grading and other
non-deduplication tasks remain asynchronous and are not awaited.
- `sync`: import deduplication runs inline in the web request.

The wait in `async_wait` is bounded by the new `DD_DEDUPLICATION_ASYNC_WAIT_TIMEOUT`
environment variable (default `60` seconds). If no worker picks up the work within
the timeout, the request responds anyway (degrading to the `async` outcome) rather
than hanging.

The import/reimport response now also includes a `deduplication_complete` boolean
indicating whether deduplication had finished by the time the response was produced.

### Relationship to `block_execution`

The existing `block_execution` profile flag is unchanged. It remains the global
switch that forces **all** of a user's asynchronous tasks (notifications, JIRA
push, product grading, deduplication, ...) to run in the foreground.
`deduplication_execution_mode` is independent and narrower — it only affects
import/reimport deduplication post-processing. A user who has `block_execution`
enabled continues to get fully synchronous imports; the upgrade migration seeds
their `deduplication_execution_mode` to `sync` so behavior is unchanged.

No action is required to upgrade. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.60.0) for the contents of the release.
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,18 @@ The endpoints also have to match for the findings to be considered duplicates, s

- Dedupe is triggered on import/reimport and during certain updates run via Celery in the background.

### Import/reimport deduplication execution mode

For import and reimport you can control how deduplication post-processing is dispatched and whether the API response waits for it. Set it per user on the profile page (**Deduplication execution mode**), or override it per request with the `deduplication_execution_mode` field on the import/reimport endpoints (the request value takes precedence over the profile).

- `async` (default): deduplication and the rest of post-processing run in the background and the response returns immediately. Historical behavior; the response is produced before findings are deduplicated.
- `async_wait`: post-processing is still dispatched to the background, but the request waits for deduplication to finish before responding. The `scan_added` notification and the statistics in the response then reflect the deduplicated state (findings that turned out to be duplicates are no longer counted/listed as new). JIRA push, product grading and other non-deduplication tasks remain asynchronous and are not awaited. The wait is bounded by `DD_DEDUPLICATION_ASYNC_WAIT_TIMEOUT` (default `60` seconds); if no worker picks up the work in time, the request responds anyway rather than hanging.
- `sync`: import deduplication runs inline in the web request.

The import/reimport response includes a `deduplication_complete` boolean indicating whether deduplication had finished by the time the response was produced (`true` for `sync` and for a completed `async_wait`, `false` for `async`).

This is independent of the global `block_execution` profile flag, which forces **all** of a user's asynchronous tasks (notifications, JIRA push, product grading, deduplication, ...) to the foreground. When no execution mode is set, `block_execution=True` falls back to `sync`.

## Service field and its impact

- By default, `HASH_CODE_FIELDS_ALWAYS = ["service"]`, meaning the `service` associated with a finding is appended to the hash for all scanners.
Expand Down
42 changes: 37 additions & 5 deletions dojo/api_v2/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,14 @@
from dojo.importers.default_reimporter import DefaultReImporter
from dojo.location.models import Location
from dojo.models import (
DEDUPLICATION_EXECUTION_MODE_CHOICES,
IMPORT_ACTIONS,
SEVERITIES,
SEVERITY_CHOICES,
STATS_FIELDS,
App_Analysis,
Development_Environment,
Dojo_User,
DojoMeta,
Endpoint,
Engagement,
Expand Down Expand Up @@ -431,6 +433,16 @@ class CommonImportScanSerializer(serializers.Serializer):
allow_null=True, default=None, queryset=User.objects.all(),
)
push_to_jira = serializers.BooleanField(default=False)
deduplication_execution_mode = serializers.ChoiceField(
required=False,
allow_null=True,
choices=DEDUPLICATION_EXECUTION_MODE_CHOICES,
help_text="Override how import post-processing (deduplication, jira push, grading, ...) is executed for "
"this request. 'async' dispatches post-processing to the background and responds immediately (default). "
"'async_wait' dispatches to the background but waits for deduplication to finish before responding, so "
"notifications and the returned statistics reflect the deduplicated state. 'sync' runs everything inline. "
"If omitted, falls back to the user's profile setting (deduplication_execution_mode).",
)
environment = serializers.CharField(required=False)
build_id = serializers.CharField(
required=False, help_text="ID of the build that was scanned.",
Expand Down Expand Up @@ -476,6 +488,14 @@ class CommonImportScanSerializer(serializers.Serializer):
help_text=_("Also referred to as 'Organization' ID."),
)
statistics = ImportStatisticsSerializer(read_only=True, required=False)
deduplication_complete = serializers.BooleanField(
read_only=True,
required=False,
help_text="Whether deduplication had finished by the time this response was produced. "
"True for 'sync' and for 'async_wait' when deduplication completed within the timeout; "
"False for 'async' (deduplication is still running in the background) or when an "
"'async_wait' import timed out waiting for it.",
)
pro = serializers.ListField(read_only=True, required=False)
apply_tags_to_findings = serializers.BooleanField(
help_text="If set to True, the tags will be applied to the findings",
Expand Down Expand Up @@ -534,6 +554,7 @@ def process_scan(
data["product_id"] = test.engagement.product.id
data["product_type_id"] = test.engagement.product.prod_type.id
data["statistics"] = {"after": test.statistics}
data["deduplication_complete"] = importer.deduplication_complete
duration = time.perf_counter() - start_time
LargeScanSizeProductAnnouncement(response_data=data, duration=duration)
ScanTypeProductAnnouncement(response_data=data, scan_type=context.get("scan_type"))
Expand Down Expand Up @@ -632,6 +653,14 @@ def setup_common_context(self, data: dict) -> dict:
if eng_end_date:
context["target_end"] = context.get("engagement_end_date")

# Resolve the effective import execution mode: request override (if any)
# takes precedence over the user's profile setting, otherwise default async.
request = self.context.get("request")
user = getattr(request, "user", None)
context["deduplication_execution_mode"] = Dojo_User.resolve_deduplication_execution_mode(
user, data.get("deduplication_execution_mode"),
)

return context


Expand Down Expand Up @@ -805,11 +834,11 @@ def process_scan(
try:
logger.debug(f"process_scan called with context: {context}")
start_time = time.perf_counter()
processor = None
if test := context.get("test"):
statistics_before = test.statistics
context["test"], _, _, _, _, _, test_import = self.get_reimporter(
**context,
).process_scan(
processor = self.get_reimporter(**context)
context["test"], _, _, _, _, _, test_import = processor.process_scan(
context.pop("scan", None),
)
if test_import:
Expand All @@ -821,9 +850,10 @@ def process_scan(
# Do not close old findings when creating a brand new test: there are no
# existing findings to compare against, and close_old_findings would
# incorrectly close findings from other tests in the same scope.
context["test"], _, _, _, _, _, _ = self.get_importer(
processor = self.get_importer(
**{**context, "close_old_findings": False},
).process_scan(
)
context["test"], _, _, _, _, _, _ = processor.process_scan(
context.pop("scan", None),
)
else:
Expand All @@ -842,6 +872,8 @@ def process_scan(
if statistics_delta:
data["statistics"]["delta"] = statistics_delta
data["statistics"]["after"] = test.statistics
if processor is not None:
data["deduplication_complete"] = processor.deduplication_complete
duration = time.perf_counter() - start_time
LargeScanSizeProductAnnouncement(response_data=data, duration=duration)
ScanTypeProductAnnouncement(response_data=data, scan_type=context.get("scan_type"))
Expand Down
17 changes: 13 additions & 4 deletions dojo/celery_dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur

- Inject `async_user_id` if missing.
- Capture and inject pghistory context if available.
- Respect `force_sync=True` (foreground execution) and user `block_execution`.
- Respect `force_sync=True` (foreground execution) and the user's
block_execution flag.
- Respect `force_async=True` (background execution even when the caller
would otherwise run synchronously, e.g. user has `block_execution`).
`force_async` wins over `force_sync` and `block_execution`.
would otherwise run synchronously, e.g. user has block_execution).
`force_async` wins over `force_sync` and block_execution.
- Support `countdown=<seconds>` for async dispatch.

Returns:
Expand All @@ -75,6 +76,11 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur
from dojo.decorators import dojo_async_task_counter, we_want_async # noqa: PLC0415 circular import

countdown = cast("int", kwargs.pop("countdown", 0))
# Per-dispatch result storage. The task default is `ignore_result` (global
# CELERY_TASK_IGNORE_RESULT=True), so AsyncResult.get() is a no-op. Callers
# that need to join on the result later (e.g. import 'async_wait' mode) pass
# ignore_result=False to force this one dispatch to store its result.
ignore_result = kwargs.pop("ignore_result", None)
injected = _inject_async_user(kwargs)
injected = _inject_pghistory_context(injected)

Expand All @@ -83,7 +89,10 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur

if we_want_async(*sig.args, func=getattr(sig, "type", None), **sig_kwargs):
# DojoAsyncTask.apply_async tracks async dispatch. Avoid double-counting here.
return sig.apply_async(countdown=countdown)
apply_kwargs = {"countdown": countdown}
if ignore_result is not None:
apply_kwargs["ignore_result"] = ignore_result
return sig.apply_async(**apply_kwargs)

# Track foreground execution as a "created task" as well (matches historical dojo_async_task behavior)
dojo_async_task_counter.incr(str(sig.task), args=sig.args, kwargs=sig_kwargs)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('dojo', '0272_reencrypt_tool_config_credentials_aes_gcm'),
]

operations = [
migrations.AddField(
model_name='usercontactinfo',
name='deduplication_execution_mode',
field=models.CharField(blank=True, choices=[('async', 'Async (do not wait)'), ('async_wait', 'Async, wait for deduplication'), ('sync', 'Synchronous (block)')], help_text="Controls how import/reimport deduplication post-processing is executed. 'Async' dispatches it to the background and returns immediately (default). 'Async, wait for deduplication' dispatches to the background but waits for deduplication to finish before responding, so notifications and statistics reflect the deduplicated state. 'Synchronous' runs the import deduplication inline. Can be overridden per request. Independent of block_execution, which forces all async tasks (notifications, jira, ...) to the foreground.", max_length=20, null=True),
),
]
30 changes: 30 additions & 0 deletions dojo/db_migrations/0274_seed_deduplication_execution_mode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from django.db import migrations


def seed_deduplication_execution_mode(apps, schema_editor):
"""
Seed the new import deduplication execution mode from the legacy block_execution flag.

block_execution remains the global "run all async tasks in the foreground" switch;
users who had it enabled get the synchronous deduplication mode so import behavior is
unchanged for them.
"""
UserContactInfo = apps.get_model("dojo", "UserContactInfo")
UserContactInfo.objects.filter(block_execution=True).update(deduplication_execution_mode="sync")


def unseed_deduplication_execution_mode(apps, schema_editor):
"""Reverse: clear the seeded synchronous mode."""
UserContactInfo = apps.get_model("dojo", "UserContactInfo")
UserContactInfo.objects.filter(deduplication_execution_mode="sync").update(deduplication_execution_mode=None)


class Migration(migrations.Migration):

dependencies = [
('dojo', '0273_usercontactinfo_deduplication_execution_mode'),
]

operations = [
migrations.RunPython(seed_deduplication_execution_mode, unseed_deduplication_execution_mode),
]
4 changes: 4 additions & 0 deletions dojo/engagement/ui/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -948,6 +948,10 @@ def process_form(
"create_finding_groups_for_all_findings": form.cleaned_data.get("create_finding_groups_for_all_findings", None),
"environment": self.get_development_environment(environment_name=form.cleaned_data.get("environment")),
})
# Honor the user's profile deduplication_execution_mode for UI imports. The API resolves
# this in the serializer; the UI has no per-import selector, so fall back to the profile
# (or block_execution) instead of silently defaulting to async.
context["deduplication_execution_mode"] = Dojo_User.resolve_deduplication_execution_mode(request.user)
# Create the engagement if necessary
self.create_engagement(context)
# close_old_findings_product_scope is a modifier of close_old_findings.
Expand Down
16 changes: 15 additions & 1 deletion dojo/finding/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from contextlib import suppress
from datetime import datetime
from itertools import batched
from time import strftime
from time import sleep, strftime

from django.conf import settings
from django.db import transaction
Expand Down Expand Up @@ -470,6 +470,20 @@ def post_process_findings_batch(
force_sync=False,
**kwargs,
):
# Test-only hook: when DEDUPLICATION_BATCH_PROCESS_TEST_DELAY > 0 (set only in
# the integration-test stack) block this batch so the async_wait integration
# test can deterministically distinguish 'async_wait' (which joins on this
# task) from 'async' (which does not). Default 0 -> no effect in production.
# DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER (a finding-title prefix) scopes
# the delay to that one test's findings so unrelated dedupe tests are not slowed.
if (test_delay := settings.DEDUPLICATION_BATCH_PROCESS_TEST_DELAY) > 0:
delay_filter = settings.DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER
if not delay_filter or Finding.objects.filter(id__in=finding_ids, title__istartswith=delay_filter).exists():
logger.warning(
"post_process_findings_batch: TEST-ONLY delay of %ss for %d finding(s) (filter=%r)",
test_delay, len(finding_ids) if finding_ids else 0, delay_filter,
)
sleep(test_delay)

logger.debug(
f"post_process_findings_batch called: finding_ids_count={len(finding_ids) if finding_ids else 0}, "
Expand Down
Loading
Loading