Skip to content

Share document accumulator between Java/Kotlin plugins#900

Merged
jupblb merged 3 commits into
mainfrom
michal/semadb-collapse
Jun 3, 2026
Merged

Share document accumulator between Java/Kotlin plugins#900
jupblb merged 3 commits into
mainfrom
michal/semadb-collapse

Conversation

@jupblb
Copy link
Copy Markdown
Member

@jupblb jupblb commented Jun 3, 2026

Unifies SemanticDB output across javac and kotlinc via a shared SemanticdbDocumentBuilder and per-source accumulator.

  • Javac visitor now uses the same builder/dedup policy as kotlinc.
  • Per-source state (builder + LocalSymbolsCache) survives across javac's multi-ANALYZE rounds for multi-type source files.
  • Replaces buggy on-disk appendSemanticdb merge (which silently discarded later rounds and shifted local IDs).
  • One LombokBuilder snapshot churns: 10× duplicate refs are now deduped to 1×.

jupblb added 3 commits June 3, 2026 17:40
Extracts SemanticdbDocumentBuilder into semanticdb-shared so both
plugins use one accumulator that enforces a single output policy:
- exact-duplicate occurrences and SymbolInformations are suppressed
- occurrences are sorted by (startLine, startCharacter) before assembly
- the assembled document is stamped with SEMANTICDB4 + caller language

Pre-refactor, javac's accumulator was a raw ArrayList (no dedup, no
sort), while kotlinc's accumulator did both via List.contains() / a
post-hoc sortWith(). Standardizing on the kotlin policy means future
plugins inherit it for free and downstream tools see consistent output.

Tests, javac snapshots, and kotlinc snapshots all still pass byte-for-
byte — javac's traversal already produced source-ordered, non-duplicate
output, so the new policy is a no-op on the existing corpus.
Javac fires ANALYZE once per top-level type, so a multi-type source file
like Interfaces.java produces several ANALYZE events for the same target
SemanticDB file. The old appendSemanticdb logic tried to merge across
rounds via HashSet dedup of full protos, but the merged result was never
assigned back, so effectively only the first round's data was kept on
disk. Worse, even when merging worked, occurrences differed across
rounds (later rounds lose enclosing_range positions for already-attributed
types) and local symbol IDs (`local 0`, `local 1`, ...) drifted because
each round had a fresh LocalSymbolsCache.

Replace the on-disk read/merge/write with an in-memory PerSourceState
keyed by output path. It bundles the shared SemanticdbDocumentBuilder
and the LocalSymbolsCache, both of which now survive across ANALYZE
rounds. SemanticdbDocumentBuilder switches to per-key dedup (occurrences
by (range, symbol, role), symbols by symbol-name) with first-emission-
wins semantics, so the round that originally analyzed a given type wins
its richer information without losing any new occurrences/symbols added
by later rounds.

LombokBuilder.java snapshot regenerated: lombok's repeated synthetic
positions are now collapsed (10x duplicate `reference java/` -> 1x),
matching the kotlinc plugin's policy.
@jupblb jupblb merged commit a9bf7b2 into main Jun 3, 2026
12 checks passed
@jupblb jupblb deleted the michal/semadb-collapse branch June 3, 2026 19:23
jupblb added a commit that referenced this pull request Jun 3, 2026
Rebased onto origin/main (post #899 + #900).

- semanticdb-javac: replace SemanticDB protobuf emission with SCIP shards
  (ScipShardWriter/Aggregator/Occurrences/Symbols/Signatures + ScipVisitor).
- semanticdb-kotlinc: add the SCIP shard infrastructure alongside the
  existing SemanticDB code path (dropped fully in stacked PR B).
- scip-semanticdb: ScipShardAggregator walks per-source shards, applies
  SymbolRewriter, and merges into a single Index.
jupblb added a commit that referenced this pull request Jun 3, 2026
Rebased onto origin/main (post #899 + #900).

- semanticdb-javac: replace SemanticDB protobuf emission with SCIP shards
  (ScipShardWriter/Aggregator/Occurrences/Symbols/Signatures + ScipVisitor).
- semanticdb-kotlinc: add the SCIP shard infrastructure alongside the
  existing SemanticDB code path (dropped fully in stacked PR B).
- scip-semanticdb: ScipShardAggregator walks per-source shards, applies
  SymbolRewriter, and merges into a single Index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant