DefectDojo · valentijnscholten · Jun 13, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
@@ -153,3 +153,5 @@ docs/.hugo_build.lock
 # claude etc
 MEMORY.md
 .claude/
+CLAUDE.md
+CLAUDE.local.md
@@ -46,6 +46,12 @@ services:
     environment:
       DD_DATABASE_URL: ${DD_TEST_DATABASE_URL:-postgresql://defectdojo:defectdojo@postgres:5432/test_defectdojo}
       DD_V3_FEATURE_LOCATIONS: ${DD_V3_FEATURE_LOCATIONS:-False}
+      # Delay deduplication batches so the async_wait integration test can
+      # deterministically distinguish a blocking join (async_wait) from a
+      # non-blocking one (async). Scoped by _FILTER to that test's findings so
+      # other dedupe tests are unaffected. Integration-test stack only; never prod.
+      DD_DEDUPLICATION_BATCH_PROCESS_TEST_DELAY: 10
+      DD_DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER: "async_wait finding"
   initializer:
     environment:
       PYTHONWARNINGS: error  # We are strict about Warnings during testing

@@ -2,6 +2,45 @@
 title: 'Upgrading to DefectDojo Version 2.60.x'
 toc_hide: true
 weight: -20260601
-description: No special instructions.
+description: New deduplication execution mode for import/reimport.
 ---
-There are no special instructions for upgrading to 2.60.x. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.60.0) for the contents of the release.
+
+## Deduplication execution mode for import/reimport
+
+This release adds a new `deduplication_execution_mode` setting that controls how
+import/reimport deduplication post-processing is dispatched and whether the API
+response waits for it. It can be set per user (profile) and overridden per request
+on the import and reimport endpoints.
+
+Modes:
+
+- `async` (default): deduplication and the rest of post-processing are dispatched
+  to the background and the response returns immediately. This is the historical
+  behavior; nothing changes for existing users.
+- `async_wait`: post-processing is still dispatched to the background, but the
+  request waits for deduplication to finish before responding. As a result the
+  `scan_added` notification and the statistics in the import/reimport response
+  reflect the deduplicated state (findings that turned out to be duplicates are
+  no longer counted/listed as new). JIRA push, product grading and other
+  non-deduplication tasks remain asynchronous and are not awaited.
+- `sync`: import deduplication runs inline in the web request.
+
+The wait in `async_wait` is bounded by the new `DD_DEDUPLICATION_ASYNC_WAIT_TIMEOUT`
+environment variable (default `60` seconds). If no worker picks up the work within
+the timeout, the request responds anyway (degrading to the `async` outcome) rather
+than hanging.
+
+The import/reimport response now also includes a `deduplication_complete` boolean
+indicating whether deduplication had finished by the time the response was produced.
+
+### Relationship to `block_execution`
+
+The existing `block_execution` profile flag is unchanged. It remains the global
+switch that forces **all** of a user's asynchronous tasks (notifications, JIRA
+push, product grading, deduplication, ...) to run in the foreground.
+`deduplication_execution_mode` is independent and narrower — it only affects
+import/reimport deduplication post-processing. A user who has `block_execution`
+enabled continues to get fully synchronous imports; the upgrade migration seeds
+their `deduplication_execution_mode` to `sync` so behavior is unchanged.
+
+No action is required to upgrade. Check the [Release Notes](https://github.com/DefectDojo/django-DefectDojo/releases/tag/2.60.0) for the contents of the release.
@@ -108,6 +108,18 @@ The endpoints also have to match for the findings to be considered duplicates, s
 
 - Dedupe is triggered on import/reimport and during certain updates run via Celery in the background.
 
+### Import/reimport deduplication execution mode
+
+For import and reimport you can control how deduplication post-processing is dispatched and whether the API response waits for it. Set it per user on the profile page (**Deduplication execution mode**), or override it per request with the `deduplication_execution_mode` field on the import/reimport endpoints (the request value takes precedence over the profile).
+
+- `async` (default): deduplication and the rest of post-processing run in the background and the response returns immediately. Historical behavior; the response is produced before findings are deduplicated.
+- `async_wait`: post-processing is still dispatched to the background, but the request waits for deduplication to finish before responding. The `scan_added` notification and the statistics in the response then reflect the deduplicated state (findings that turned out to be duplicates are no longer counted/listed as new). JIRA push, product grading and other non-deduplication tasks remain asynchronous and are not awaited. The wait is bounded by `DD_DEDUPLICATION_ASYNC_WAIT_TIMEOUT` (default `60` seconds); if no worker picks up the work in time, the request responds anyway rather than hanging.
+- `sync`: import deduplication runs inline in the web request.
+
+The import/reimport response includes a `deduplication_complete` boolean indicating whether deduplication had finished by the time the response was produced (`true` for `sync` and for a completed `async_wait`, `false` for `async`).
+
+This is independent of the global `block_execution` profile flag, which forces **all** of a user's asynchronous tasks (notifications, JIRA push, product grading, deduplication, ...) to the foreground. When no execution mode is set, `block_execution=True` falls back to `sync`.
+
 ## Service field and its impact
 
 - By default, `HASH_CODE_FIELDS_ALWAYS = ["service"]`, meaning the `service` associated with a finding is appended to the hash for all scanners.

@@ -24,12 +24,14 @@
 from dojo.importers.default_reimporter import DefaultReImporter
 from dojo.location.models import Location
 from dojo.models import (
+    DEDUPLICATION_EXECUTION_MODE_CHOICES,
     IMPORT_ACTIONS,
     SEVERITIES,
     SEVERITY_CHOICES,
     STATS_FIELDS,
     App_Analysis,
     Development_Environment,
+    Dojo_User,
     DojoMeta,
     Endpoint,
     Engagement,
@@ -431,6 +433,16 @@ class CommonImportScanSerializer(serializers.Serializer):
         allow_null=True, default=None, queryset=User.objects.all(),
     )
     push_to_jira = serializers.BooleanField(default=False)
+    deduplication_execution_mode = serializers.ChoiceField(
+        required=False,
+        allow_null=True,
+        choices=DEDUPLICATION_EXECUTION_MODE_CHOICES,
+        help_text="Override how import post-processing (deduplication, jira push, grading, ...) is executed for "
+        "this request. 'async' dispatches post-processing to the background and responds immediately (default). "
+        "'async_wait' dispatches to the background but waits for deduplication to finish before responding, so "
+        "notifications and the returned statistics reflect the deduplicated state. 'sync' runs everything inline. "
+        "If omitted, falls back to the user's profile setting (deduplication_execution_mode).",
+    )
     environment = serializers.CharField(required=False)
     build_id = serializers.CharField(
         required=False, help_text="ID of the build that was scanned.",
@@ -476,6 +488,14 @@ class CommonImportScanSerializer(serializers.Serializer):
         help_text=_("Also referred to as 'Organization' ID."),
     )
     statistics = ImportStatisticsSerializer(read_only=True, required=False)
+    deduplication_complete = serializers.BooleanField(
+        read_only=True,
+        required=False,
+        help_text="Whether deduplication had finished by the time this response was produced. "
+        "True for 'sync' and for 'async_wait' when deduplication completed within the timeout; "
+        "False for 'async' (deduplication is still running in the background) or when an "
+        "'async_wait' import timed out waiting for it.",
+    )
     pro = serializers.ListField(read_only=True, required=False)
     apply_tags_to_findings = serializers.BooleanField(
         help_text="If set to True, the tags will be applied to the findings",
@@ -534,6 +554,7 @@ def process_scan(
                 data["product_id"] = test.engagement.product.id
                 data["product_type_id"] = test.engagement.product.prod_type.id
                 data["statistics"] = {"after": test.statistics}
+                data["deduplication_complete"] = importer.deduplication_complete
             duration = time.perf_counter() - start_time
             LargeScanSizeProductAnnouncement(response_data=data, duration=duration)
             ScanTypeProductAnnouncement(response_data=data, scan_type=context.get("scan_type"))
@@ -632,6 +653,14 @@ def setup_common_context(self, data: dict) -> dict:
         if eng_end_date:
             context["target_end"] = context.get("engagement_end_date")
 
+        # Resolve the effective import execution mode: request override (if any)
+        # takes precedence over the user's profile setting, otherwise default async.
+        request = self.context.get("request")
+        user = getattr(request, "user", None)
+        context["deduplication_execution_mode"] = Dojo_User.resolve_deduplication_execution_mode(
+            user, data.get("deduplication_execution_mode"),
+        )
+
         return context
 
 
@@ -805,11 +834,11 @@ def process_scan(
         try:
             logger.debug(f"process_scan called with context: {context}")
             start_time = time.perf_counter()
+            processor = None
             if test := context.get("test"):
                 statistics_before = test.statistics
-                context["test"], _, _, _, _, _, test_import = self.get_reimporter(
-                    **context,
-                ).process_scan(
+                processor = self.get_reimporter(**context)
+                context["test"], _, _, _, _, _, test_import = processor.process_scan(
                     context.pop("scan", None),
                 )
                 if test_import:
@@ -821,9 +850,10 @@ def process_scan(
                 # Do not close old findings when creating a brand new test: there are no
                 # existing findings to compare against, and close_old_findings would
                 # incorrectly close findings from other tests in the same scope.
-                context["test"], _, _, _, _, _, _ = self.get_importer(
+                processor = self.get_importer(
                     **{**context, "close_old_findings": False},
-                ).process_scan(
+                )
+                context["test"], _, _, _, _, _, _ = processor.process_scan(
                     context.pop("scan", None),
                 )
             else:
@@ -842,6 +872,8 @@ def process_scan(
                 if statistics_delta:
                     data["statistics"]["delta"] = statistics_delta
                 data["statistics"]["after"] = test.statistics
+                if processor is not None:
+                    data["deduplication_complete"] = processor.deduplication_complete
             duration = time.perf_counter() - start_time
             LargeScanSizeProductAnnouncement(response_data=data, duration=duration)
             ScanTypeProductAnnouncement(response_data=data, scan_type=context.get("scan_type"))

@@ -61,10 +61,11 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur
 
     - Inject `async_user_id` if missing.
     - Capture and inject pghistory context if available.
-    - Respect `force_sync=True` (foreground execution) and user `block_execution`.
+    - Respect `force_sync=True` (foreground execution) and the user's
+      block_execution flag.
     - Respect `force_async=True` (background execution even when the caller
-      would otherwise run synchronously, e.g. user has `block_execution`).
-      `force_async` wins over `force_sync` and `block_execution`.
+      would otherwise run synchronously, e.g. user has block_execution).
+      `force_async` wins over `force_sync` and block_execution.
     - Support `countdown=<seconds>` for async dispatch.
 
     Returns:
@@ -75,6 +76,11 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur
     from dojo.decorators import dojo_async_task_counter, we_want_async  # noqa: PLC0415 circular import
 
     countdown = cast("int", kwargs.pop("countdown", 0))
+    # Per-dispatch result storage. The task default is `ignore_result` (global
+    # CELERY_TASK_IGNORE_RESULT=True), so AsyncResult.get() is a no-op. Callers
+    # that need to join on the result later (e.g. import 'async_wait' mode) pass
+    # ignore_result=False to force this one dispatch to store its result.
+    ignore_result = kwargs.pop("ignore_result", None)
     injected = _inject_async_user(kwargs)
     injected = _inject_pghistory_context(injected)
 
@@ -83,7 +89,10 @@ def dojo_dispatch_task(task_or_sig: _SupportsSi | _SupportsApplyAsync | Signatur
 
     if we_want_async(*sig.args, func=getattr(sig, "type", None), **sig_kwargs):
         # DojoAsyncTask.apply_async tracks async dispatch. Avoid double-counting here.
-        return sig.apply_async(countdown=countdown)
+        apply_kwargs = {"countdown": countdown}
+        if ignore_result is not None:
+            apply_kwargs["ignore_result"] = ignore_result
+        return sig.apply_async(**apply_kwargs)
 
     # Track foreground execution as a "created task" as well (matches historical dojo_async_task behavior)
     dojo_async_task_counter.incr(str(sig.task), args=sig.args, kwargs=sig_kwargs)

@@ -0,0 +1,16 @@
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('dojo', '0272_reencrypt_tool_config_credentials_aes_gcm'),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name='usercontactinfo',
+            name='deduplication_execution_mode',
+            field=models.CharField(blank=True, choices=[('async', 'Async (do not wait)'), ('async_wait', 'Async, wait for deduplication'), ('sync', 'Synchronous (block)')], help_text="Controls how import/reimport deduplication post-processing is executed. 'Async' dispatches it to the background and returns immediately (default). 'Async, wait for deduplication' dispatches to the background but waits for deduplication to finish before responding, so notifications and statistics reflect the deduplicated state. 'Synchronous' runs the import deduplication inline. Can be overridden per request. Independent of block_execution, which forces all async tasks (notifications, jira, ...) to the foreground.", max_length=20, null=True),
+        ),
+    ]
@@ -0,0 +1,30 @@
+from django.db import migrations
+
+
+def seed_deduplication_execution_mode(apps, schema_editor):
+    """
+    Seed the new import deduplication execution mode from the legacy block_execution flag.
+
+    block_execution remains the global "run all async tasks in the foreground" switch;
+    users who had it enabled get the synchronous deduplication mode so import behavior is
+    unchanged for them.
+    """
+    UserContactInfo = apps.get_model("dojo", "UserContactInfo")
+    UserContactInfo.objects.filter(block_execution=True).update(deduplication_execution_mode="sync")
+
+
+def unseed_deduplication_execution_mode(apps, schema_editor):
+    """Reverse: clear the seeded synchronous mode."""
+    UserContactInfo = apps.get_model("dojo", "UserContactInfo")
+    UserContactInfo.objects.filter(deduplication_execution_mode="sync").update(deduplication_execution_mode=None)
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('dojo', '0273_usercontactinfo_deduplication_execution_mode'),
+    ]
+
+    operations = [
+        migrations.RunPython(seed_deduplication_execution_mode, unseed_deduplication_execution_mode),
+    ]
@@ -948,6 +948,10 @@ def process_form(
             "create_finding_groups_for_all_findings": form.cleaned_data.get("create_finding_groups_for_all_findings", None),
             "environment": self.get_development_environment(environment_name=form.cleaned_data.get("environment")),
         })
+        # Honor the user's profile deduplication_execution_mode for UI imports. The API resolves
+        # this in the serializer; the UI has no per-import selector, so fall back to the profile
+        # (or block_execution) instead of silently defaulting to async.
+        context["deduplication_execution_mode"] = Dojo_User.resolve_deduplication_execution_mode(request.user)
         # Create the engagement if necessary
         self.create_engagement(context)
         # close_old_findings_product_scope is a modifier of close_old_findings.

@@ -2,7 +2,7 @@
 from contextlib import suppress
 from datetime import datetime
 from itertools import batched
-from time import strftime
+from time import sleep, strftime
 
 from django.conf import settings
 from django.db import transaction
@@ -470,6 +470,20 @@ def post_process_findings_batch(
     force_sync=False,
     **kwargs,
 ):
+    # Test-only hook: when DEDUPLICATION_BATCH_PROCESS_TEST_DELAY > 0 (set only in
+    # the integration-test stack) block this batch so the async_wait integration
+    # test can deterministically distinguish 'async_wait' (which joins on this
+    # task) from 'async' (which does not). Default 0 -> no effect in production.
+    # DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER (a finding-title prefix) scopes
+    # the delay to that one test's findings so unrelated dedupe tests are not slowed.
+    if (test_delay := settings.DEDUPLICATION_BATCH_PROCESS_TEST_DELAY) > 0:
+        delay_filter = settings.DEDUPLICATION_BATCH_PROCESS_TEST_DELAY_FILTER
+        if not delay_filter or Finding.objects.filter(id__in=finding_ids, title__istartswith=delay_filter).exists():
+            logger.warning(
+                "post_process_findings_batch: TEST-ONLY delay of %ss for %d finding(s) (filter=%r)",
+                test_delay, len(finding_ids) if finding_ids else 0, delay_filter,
+            )
+            sleep(test_delay)
 
     logger.debug(
         f"post_process_findings_batch called: finding_ids_count={len(finding_ids) if finding_ids else 0}, "