Skip to content

Conversation

@jjhwan-h
Copy link
Contributor

@jjhwan-h jjhwan-h commented Jul 31, 2025

Summary

This PR addresses issue #2152, which causes response files to be overwritten even when the -sd (SkipDedupe) flag is used.

What was the issue?

When the -sd flag is enabled, input targets are processed without deduplication. However, the responses are still written to the same file path based on the SHA1 hash of the URL, leading to overwritten output files.

What’s changed?

  • When SkipDedupe is enabled and a response file with the same name already exists, the response will be written to a new file with a suffix (_1, _2, ...).
  • Repeated input targets are now counted using HybridMap, and the number of processing iterations is determined based on this count.
  • Modified countTargetFromRawTarget to return a known duplicateTargetErr when deduplication is disabled.
  • Refactored response writing logic in analyze and process to ensure unique file writes under concurrency.

Why is this useful?

This change ensures:

  • Accurate tracking and storage of multiple response outputs for identical input targets.
  • Prevents unintentional data loss due to file overwrites.
  • Honors the intent behind the -sd flag.

Result

# test.txt
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
$./httpx/httpx -l test.txt -stream -skip-dedupe -sr
$tree output/response/                                                                                                                            
output/response/                                                                                                                                                                         
├── index.txt
├── localhost_8000
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│   └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
    └── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt

2 directories, 9 files
$ ./httpx/httpx -l test.txt -skip-dedupe -sr
 $ tree output/response/
output/response/
├── index.txt
├── localhost_8000
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│   ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│   └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
    ├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
    └── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt

2 directories, 9 files
$./httpx/httpx -l test.txt -sr
output/
└── response
    ├── index.txt
    ├── localhost_8000
    │   └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── localhost_9000
        └── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt

3 directories, 3 files
$ ./httpx/httpx -l test.txt -stream -sr
$ tree output/
output/
└── response
    ├── index.txt
    ├── localhost_8000
    │   └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── localhost_9000
        └── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt

3 directories, 3 files

Related issue

Closes #2152

Summary by CodeRabbit

  • New Features

    • Duplicate targets are detected and — when configured — processed multiple times.
    • Result metadata now includes a filename hash used to produce unique response and screenshot filenames.
  • Bug Fixes

    • Response saving uses exclusive creation with auto-increment suffixes on collisions; appending/truncation respects resume and dedupe settings.
    • Improved duplicate-target counting and propagation for accurate processing.
  • Refactor

    • Per-target processing reworked to honor per-target counts and stream mode.
  • Tests

    • Updated tests to verify duplicate detection and handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jul 31, 2025

Walkthrough

Adds per-target duplicate detection and counting, propagates duplicate entries when deduplication is disabled, refactors per-target execution to run each target the recorded number of times, changes response/screenshot filename hashing and uses exclusive-create with incremental suffixes to avoid overwrites, and exposes FileNameHash on Result.

Changes

Cohort / File(s) Change Summary
Duplicate detection & counting
runner/runner.go
Added duplicateTargetErr; countTargetFromRawTarget signals duplicates; prepareInput and loadAndCloseFile increment per-target counts and total hosts when Options.SkipDedupe is true.
Per-target execution flow
runner/runner.go
Refactored RunEnumeration to use runProcess(times) and execute targets according to stored counts; stream mode still runs once.
Response / screenshot naming & persistence
runner/runner.go
Switched to FileNameHash-based naming (FileNameHash added to results); response/screenshot writes use exclusive-create (O_EXCL) and retry with incrementing suffixes (_1, _2, ...) to ensure unique files; index handling appends when Resume or SkipDedupe are enabled.
Types / public API
runner/types.go
Added exported field FileNameHash string to Result (json/csv/mapstructure ignored).
Tests & error handling
runner/runner_test.go
Updated tests to import github.com/pkg/errors, set Options.SkipDedupe appropriately, and assert duplicates using errors.Is(err, duplicateTargetErr).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Runner
    participant FS as FileSystem

    Note over Runner: Input parsing & duplicate counting
    User->>Runner: provide target list + options
    Runner->>Runner: countTargetFromRawTarget(raw)
    alt target exists
        Runner-->>Runner: returns duplicateTargetErr
        alt SkipDedupe enabled
            Runner->>Runner: increment stored count for target
        else
            Runner->>Runner: ignore duplicate (do not increment)
        end
    else new target
        Runner->>Runner: store target with count=1
    end

    Note over Runner,FS: Processing & response persistence
    loop for each target run (1..count)
        Runner->>Runner: runProcess(1) — perform enumeration
        Runner->>FS: attempt create response file (O_EXCL)
        alt name collision
            FS->>FS: compute filename with suffix (_1, _2, ...)
            FS->>Runner: return created file
        else success
            FS->>Runner: file created
        end
        Runner->>FS: append to index if Resume or SkipDedupe enabled
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble repeats and keep a count,
I stamp each file so names won't mount.
When collisions hop and try to bite,
I add a tail and sleep at night.
— a rabbit, proud of dedupe flight 🐇

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR partially addresses issue #2152 by implementing duplicate target counting and unique file suffixing (_1, _2, ...), but reviewer comments indicate the index file generation still doesn't properly accumulate across repeated requests. Verify that the index file correctly accumulates all response files across multiple iterations of repeated inputs, not just the latest iteration, to fully resolve the reported inconsistency.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: preventing response file overwrite when the -sd (SkipDedupe) flag is used, which aligns with the primary objective of issue #2152.
Out of Scope Changes check ✅ Passed Changes are appropriately scoped to address the issue: duplicate target tracking, unique file naming via FileNameHash, and refactored response-writing logic align with the stated objective of preventing overwrites when SkipDedupe is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
runner/runner.go (1)

2152-2176: Consider adding a safety limit to prevent potential infinite loops.

While the file creation logic correctly handles duplicates, there's no upper limit on the suffix counter. In edge cases or under concurrent load, this could theoretically loop indefinitely.

Consider adding a reasonable upper limit:

 finalPath := responsePath
 idx := 0
+const maxSuffixAttempts = 1000
 for {
+    if idx >= maxSuffixAttempts {
+        gologger.Error().Msgf("Exceeded maximum attempts to create unique file for '%s'", responsePath)
+        break
+    }
     targetPath := finalPath
     if idx > 0 {
         basePath := strings.TrimSuffix(responsePath, ".txt")
         targetPath = fmt.Sprintf("%s_%d.txt", basePath, idx)
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faac44c and 3cac134.

📒 Files selected for processing (2)
  • runner/runner.go (6 hunks)
  • runner/runner_test.go (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
runner/runner_test.go (1)
runner/options.go (1)
  • Options (173-351)
🔇 Additional comments (5)
runner/runner.go (4)

448-449: LGTM! Good use of a sentinel error for explicit duplicate detection.

The error declaration follows Go's best practices for sentinel errors.


642-645: Good improvement to error handling!

Making duplicate detection explicit with an error return is better than silently returning 0. This allows callers to handle duplicates appropriately based on their needs.


1245-1267: Well-structured refactoring for repeated target processing!

The runProcess helper function cleanly encapsulates the logic for processing targets multiple times based on their count. Good handling of edge cases with the default to 1 when parsing fails.


455-465: Consider initializing counts consistently.

The current implementation always sets the initial count to "1" for new targets (line 458), but this might not accurately reflect the actual number of times a target appears in the input. Additionally, the error from strconv.Atoi at line 461 is ignored, which could lead to unexpected behavior if the stored value is corrupted.

Consider this more robust implementation:

-				r.hm.Set(target, []byte("1")) //nolint
+				r.hm.Set(target, []byte("1"))
 			} else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
 				if v, ok := r.hm.Get(target); ok {
-					cnt, _ := strconv.Atoi(string(v))
+					cnt, err := strconv.Atoi(string(v))
+					if err != nil {
+						// Handle corrupted count, default to 1
+						cnt = 1
+					}
 					r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
 					numHosts += 1
 				}

Likely an incorrect or invalid review comment.

runner/runner_test.go (1)

10-10: Test updates correctly reflect the new duplicate handling behavior!

Good changes:

  • Explicitly setting SkipDedupe: false makes the test's intent clear
  • Using errors.Is is the correct way to check for sentinel errors in Go
  • The test properly validates that duplicates are detected and return the expected error

Also applies to: 128-130, 145-145

Comment on lines 622 to 632
expandedTarget, err := r.countTargetFromRawTarget(target)
if err == nil && expandedTarget > 0 {
numTargets += expandedTarget
r.hm.Set(target, nil) //nolint
r.hm.Set(target, []byte("1")) //nolint
} else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
if v, ok := r.hm.Get(target); ok {
cnt, _ := strconv.Atoi(string(v))
r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
numTargets += 1
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract duplicate counting logic to reduce code duplication.

This logic is duplicated from prepareInput. Consider extracting it into a helper method to maintain DRY principles and ensure consistent behavior.

Consider creating a helper method:

func (r *Runner) incrementTargetCount(target string) {
    if v, ok := r.hm.Get(target); ok {
        cnt, err := strconv.Atoi(string(v))
        if err != nil {
            cnt = 1
        }
        r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
    }
}

Then use it in both locations:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
-    if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
-        r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
-        numTargets += 1
-    }
+    r.incrementTargetCount(target)
+    numTargets += 1
 }
🤖 Prompt for AI Agents
In runner/runner.go around lines 622 to 632, the logic for incrementing the
count of duplicate targets is duplicated from the prepareInput method. Extract
this duplicate counting logic into a new helper method on the Runner struct, for
example incrementTargetCount, which safely retrieves the current count, handles
conversion errors by defaulting to 1, increments the count, and updates the map.
Replace the duplicated code in both places with calls to this new helper method
to adhere to DRY principles and maintain consistent behavior.

@Mzack9999 Mzack9999 self-requested a review October 20, 2025 19:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
runner/runner.go (2)

658-663: Fix error handling in duplicate count increment.

Same issue as in prepareInput (lines 492-495): the error from strconv.Atoi is ignored, which can cause the count to reset to 1 instead of incrementing.

Apply this diff:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
     if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
+        cnt, err := strconv.Atoi(string(v))
+        if err != nil {
+            cnt = 1
+        }
         r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
         numTargets += 1
     }
 }

492-495: Fix error handling in duplicate count increment.

The error from strconv.Atoi is silently ignored. If the stored value is corrupted (not a valid integer), cnt will be 0, and storing cnt+1 (which is 1) will reset the count instead of incrementing it. This causes undercounting when duplicates are encountered.

Apply this diff:

 } else if r.options.SkipDedupe && errors.Is(err, duplicateTargetErr) {
     if v, ok := r.hm.Get(target); ok {
-        cnt, _ := strconv.Atoi(string(v))
+        cnt, err := strconv.Atoi(string(v))
+        if err != nil {
+            cnt = 1
+        }
         r.hm.Set(target, []byte(strconv.Itoa(cnt+1)))
         numHosts += 1
     }
 }
🧹 Nitpick comments (1)
runner/runner.go (1)

2235-2258: Add upper bound to file suffix retry loop.

The loop has no maximum iteration limit. While unlikely in practice, if many duplicate responses accumulate or if there's a bug, this could iterate excessively. Consider adding a reasonable upper bound (e.g., 1000 or 10000 attempts) and logging a warning if the limit is reached.

Apply this diff:

 finalPath := responsePath
 idx := 0
+const maxSuffixAttempts = 1000
 for {
+    if idx >= maxSuffixAttempts {
+        gologger.Error().Msgf("Exceeded maximum suffix attempts (%d) for response file '%s'", maxSuffixAttempts, responsePath)
+        break
+    }
     targetPath := finalPath
     if idx > 0 {
         basePath := strings.TrimSuffix(responsePath, ".txt")
         targetPath = fmt.Sprintf("%s_%d.txt", basePath, idx)
     }
     f, err := os.OpenFile(targetPath, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0644)
     if err == nil {
         _, writeErr := f.Write(data)
         f.Close()
         if writeErr != nil {
             gologger.Error().Msgf("Could not write to '%s': %s", targetPath, writeErr)
         }
         break
     }
     if !os.IsExist(err) {
         gologger.Error().Msgf("Failed to create file '%s': %s", targetPath, err)
         break
     }
     idx++
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63130de and bd4dc82.

📒 Files selected for processing (1)
  • runner/runner.go (5 hunks)
🔇 Additional comments (1)
runner/runner.go (1)

1316-1338: Clean implementation of per-target iteration.

The nested runProcess function effectively handles repeated processing based on the duplicate count. The error handling when retrieving the count from the map (line 1334) is correct and defensive.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
Follow ups:

  • Index file generation always point to the same original sha1 filename:
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/Users/user/go/src/github.com/projectdiscovery/httpx/cmd/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)

@jjhwan-h jjhwan-h requested a review from Mzack9999 October 23, 2025 08:37
@jjhwan-h
Copy link
Contributor Author

Should the entries in the index file match the files under the output/response/ directory?

Currently, it looks like a new index file is created for every request.
As a result, when requests are repeated, the index file and the files inside the response/ folder become inconsistent.

Round 1

output/response/
├── index.txt
└── localhost_8000
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    └── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt

#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt http://localhost:8000 (200 OK)

Round 2

output/response/
├── index.txt
└── localhost_8000
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
    ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
    └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt

#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt http://localhost:8000 (200 OK)

@Mzack9999 Mzack9999 added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jan 9, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @runner/runner.go:
- Around line 1140-1141: The code builds domainResponseFile and
screenshotResponseFile using resp.FileNameHash which can be empty because
FileNameHash is only set inside analyze when scanopts.StoreResponse or
scanopts.StoreChain are true; fix by either computing the hash unconditionally
in analyze (ensure resp.FileNameHash is assigned regardless of scanopts) or add
a guard/fallback where domainResponseFile and screenshotResponseFile are
constructed (e.g., if resp.FileNameHash == "" use domainFile or compute a
fallback hash), and ensure any use of screenshotResponseFile when
r.scanopts.Screenshot is enabled checks for and uses that fallback.
🧹 Nitpick comments (1)
runner/runner.go (1)

2276-2296: Add maximum retry limit to prevent infinite loop.

The exclusive file creation with incremental suffix logic correctly prevents overwrites under concurrent execution. However, the unbounded for idx = 0; ; idx++ loop could theoretically run forever if there's a persistent filesystem issue (not just file existence). Consider adding a reasonable maximum retry limit (e.g., 1000) to prevent potential hangs:

-for idx = 0; ; idx++ {
+const maxRetries = 1000
+for idx = 0; idx < maxRetries; idx++ {
     targetPath := responsePath
     if idx > 0 {
         targetPath = fmt.Sprintf("%s_%d.txt", basePath, idx)
     }
     f, err := os.OpenFile(targetPath, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0644)
     if err == nil {
         _, writeErr := f.Write(data)
         _ = f.Close()
         if writeErr != nil {
             gologger.Error().Msgf("Could not write to '%s': %s", targetPath, writeErr)
         }
         break
     }
     if !os.IsExist(err) {
         gologger.Error().Msgf("Failed to create file '%s': %s", targetPath, err)
         break
     }
 }
+if idx >= maxRetries {
+    gologger.Error().Msgf("Exceeded maximum retries (%d) for file '%s'", maxRetries, responsePath)
+}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3212c6 and 048c43f.

📒 Files selected for processing (3)
  • runner/runner.go
  • runner/runner_test.go
  • runner/types.go
🧰 Additional context used
🧬 Code graph analysis (1)
runner/runner_test.go (1)
runner/options.go (1)
  • Options (177-364)
🔇 Additional comments (11)
runner/types.go (1)

105-105: LGTM: FileNameHash field properly added.

The new FileNameHash field is correctly added with appropriate struct tags to exclude it from JSON, CSV, and mapstructure outputs while remaining accessible within the codebase for response file naming.

runner/runner_test.go (1)

10-10: LGTM: Test updated to validate sentinel error.

The test changes correctly:

  • Import the errors package for sentinel error checking
  • Explicitly set SkipDedupe: false to clarify test intent
  • Use errors.Is(err, duplicateTargetErr) to verify the expected duplicate error

This aligns with the new per-target duplicate tracking behavior.

Also applies to: 158-160, 175-175

runner/runner.go (9)

505-505: LGTM: Sentinel error properly defined.

The duplicateTargetErr sentinel follows Go best practices for error comparison using errors.Is.


512-522: Duplicate counting logic is correct but consider edge case.

The logic properly increments the target count when duplicates are encountered with SkipDedupe enabled. The ignored error from strconv.Atoi at line 518 is acceptable since a parsing failure results in cnt = 0, and the later logic (lines 1358-1361) defaults to cnt = 1 for non-positive values, ensuring at least one execution.


679-689: LGTM: Consistent duplicate handling in file loading.

The duplicate counting logic in loadAndCloseFile mirrors the implementation in prepareInput, ensuring consistent behavior when reading targets from files versus direct input.


700-701: LGTM: Sentinel error properly returned for duplicates.

The function correctly returns duplicateTargetErr when a target already exists in the map, enabling callers to handle duplicates appropriately based on the SkipDedupe setting.


932-933: LGTM: Index files properly configured for append mode.

The index file handling correctly uses append mode when either Resume or SkipDedupe is enabled, ensuring that entries accumulate rather than being overwritten. The comments clearly document this behavior.

Also applies to: 947-948


1341-1363: LGTM: runProcess helper properly implements repeated target execution.

The refactored logic correctly:

  • Runs each target the specified number of times based on duplicate counts
  • Handles multiple request URIs when configured
  • Defaults to a single execution in Stream mode or when count parsing fails
  • Ensures each duplicate target instance is processed as intended by the SkipDedupe feature

2283-2291: LGTM: Atomic file creation prevents race conditions.

The use of os.O_EXCL flag ensures atomic exclusive creation, making this logic safe under concurrent execution. Multiple goroutines processing duplicate targets will automatically receive unique suffixed filenames without risk of overwriting each other's responses.


2492-2492: LGTM: FileNameHash properly propagated to Result.

The FileNameHash field is correctly assigned and will be used downstream in the output routine for consistent file naming. This value reflects the actual suffix used (if any) during response file creation.


153-157: Index file preservation works correctly for skip-dedupe mode; per-request truncation is intentional in normal mode.

The startup logic at lines 153-157 correctly prevents index file removal when SkipDedupe is enabled. The per-request append logic at lines 933 and 948 further ensures that index entries accumulate across repeated requests in skip-dedupe mode (files opened with O_APPEND|O_CREATE). In normal mode (when both Resume and SkipDedupe are false), the index is intentionally recreated per request using os.Create, which is the expected behavior for that mode.

Comment on lines +1140 to +1141
domainResponseFile := fmt.Sprintf("%s.txt", resp.FileNameHash)
screenshotResponseFile := fmt.Sprintf("%s.png", resp.FileNameHash)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get context around lines 1140-1141 to see the conditional protection
sed -n '1130,1160p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 1651


🏁 Script executed:

#!/bin/bash
# Search for FileNameHash assignments and their conditions
rg -n "FileNameHash" runner/runner.go | head -20

Repository: projectdiscovery/httpx

Length of output: 253


🏁 Script executed:

#!/bin/bash
# Check the conditional logic protecting the output writing at lines 1140-1141
sed -n '1100,1150p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 2016


🏁 Script executed:

#!/bin/bash
# Check where FileNameHash is assigned and under what conditions
sed -n '2480,2510p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 1312


🏁 Script executed:

#!/bin/bash
# Search for where fileNameHash variable is created/assigned
rg -n "fileNameHash\s*:=" runner/runner.go -B 5 | head -40

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check the full context of where FileNameHash is set in the Response struct
rg -n "FileNameHash" runner/runner.go -B 3 -A 1

Repository: projectdiscovery/httpx

Length of output: 589


🏁 Script executed:

#!/bin/bash
# Find where fileNameHash is computed/assigned before being used in the Result
rg -n "fileNameHash\s*=" runner/runner.go | head -20

Repository: projectdiscovery/httpx

Length of output: 148


🏁 Script executed:

#!/bin/bash
# Check the full analyze function to see if fileNameHash is always set
sed -n '2350,2370p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 935


🏁 Script executed:

#!/bin/bash
# Look for fileNameHash initialization in analyze function
rg -n "fileNameHash" runner/runner.go -B 2 -A 0 | grep -E "^\d+[-:]" | head -30

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Find the full context of fileNameHash initialization - where it's declared and all assignment paths
rg -n "fileNameHash" runner/runner.go -B 10 | head -80

Repository: projectdiscovery/httpx

Length of output: 1287


🏁 Script executed:

#!/bin/bash
# Check if fileNameHash has a default initialization or is always set
sed -n '2280,2320p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 1299


🏁 Script executed:

#!/bin/bash
# Verify the exact line where fileNameHash is first declared
rg -n "var fileNameHash|fileNameHash :=" runner/runner.go

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Verify the complete code flow - check if fileNameHash is assigned in all code paths
sed -n '2254,2310p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 2184


🏁 Script executed:

#!/bin/bash
# Check the actual usage of domainResponseFile and screenshotResponseFile at lines 1140-1141
sed -n '1137,1200p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 3485


🏁 Script executed:

#!/bin/bash
# Verify how fileNameHash is handled when StoreResponse and StoreChain are both false
rg -n "StoreResponse|StoreChain" runner/runner.go | grep -E "2254|2256|2299|2300|2301"

Repository: projectdiscovery/httpx

Length of output: 122


🏁 Script executed:

#!/bin/bash
# Verify that scanopts used in analyze function and r.scanopts in output routine refer to the same options
# Check the Runner structure to see how scanopts is stored
rg -n "scanopts" runner/runner.go | grep -E "Runner|this|self" | head -20

Repository: projectdiscovery/httpx

Length of output: 493


🏁 Script executed:

#!/bin/bash
# Confirm the relationship between scanopts and r.scanopts by checking the analyze function signature
sed -n '2200,2260p' runner/runner.go | grep -E "func.*analyze|scanopts"

Repository: projectdiscovery/httpx

Length of output: 409


🏁 Script executed:

#!/bin/bash
# Verify whether domainResponseFile and screenshotResponseFile created at lines 1140-1141 are only used inside the StoreResponse/StoreChain conditional
sed -n '1140,1180p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 2359


🏁 Script executed:

#!/bin/bash
# Final verification: Check if there are any uses of domainResponseFile or screenshotResponseFile outside the conditional blocks
sed -n '1137,1220p' runner/runner.go | cat -n

Repository: projectdiscovery/httpx

Length of output: 4375


Add fallback hash or guard FileNameHash usage against empty values.

The switch from domainFile-based naming to resp.FileNameHash introduces a potential issue: FileNameHash is only set within the if scanopts.StoreResponse || scanopts.StoreChain block in the analyze function (lines 2256-2301). If both conditions are false, FileNameHash remains an empty string. However, the output routine creates domainResponseFile and screenshotResponseFile at lines 1140-1141 using this potentially empty hash, and screenshotResponseFile is later used if r.scanopts.Screenshot is enabled. This can result in screenshot files named only .png without a hash prefix. Either:

  • Ensure FileNameHash is always computed regardless of StoreResponse/StoreChain settings, or
  • Guard screenshotResponseFile usage with a fallback hash if FileNameHash is empty.
🤖 Prompt for AI Agents
In @runner/runner.go around lines 1140 - 1141, The code builds
domainResponseFile and screenshotResponseFile using resp.FileNameHash which can
be empty because FileNameHash is only set inside analyze when
scanopts.StoreResponse or scanopts.StoreChain are true; fix by either computing
the hash unconditionally in analyze (ensure resp.FileNameHash is assigned
regardless of scanopts) or add a guard/fallback where domainResponseFile and
screenshotResponseFile are constructed (e.g., if resp.FileNameHash == "" use
domainFile or compute a fallback hash), and ensure any use of
screenshotResponseFile when r.scanopts.Screenshot is enabled checks for and uses
that fallback.

@Mzack9999 Mzack9999 merged commit 59adfd6 into projectdiscovery:dev Jan 9, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

disable dedupe in response file write when -sd is used

2 participants