Skip to content

Fix #3221: preserve dataclass field defaults in stubgen#3225

Open
Arths17 wants to merge 8 commits intofacebook:mainfrom
Arths17:fix/issue-3221-dataclass-stubgen
Open

Fix #3221: preserve dataclass field defaults in stubgen#3225
Arths17 wants to merge 8 commits intofacebook:mainfrom
Arths17:fix/issue-3221-dataclass-stubgen

Conversation

@Arths17
Copy link
Copy Markdown
Contributor

@Arths17 Arths17 commented Apr 24, 2026

Summary
Stubgen was dropping non-literal dataclass field defaults from generated stubs, which caused synthesized __init__ signatures to look required when they were actually optional.

Fix
Treat annotated class fields inside dataclasses as having a default sentinel when the source expression is not a simple literal. This preserves the public constructor shape in generated .pyi output without changing literal defaults.

Files changed

  • pyrefly/lib/stubgen/extract.rs
  • pyrefly/lib/stubgen/mod.rs
  • pyrefly/lib/test/stubgen/dataclasses/input.py
  • pyrefly/lib/test/stubgen/dataclasses/expected.pyi

Testing

  • cargo test -p pyrefly stubgen::tests::test_stubgen_dataclasses -- --exact

Notes

  • The unrelated local workspace edit in .vscode/tasks.json was left untouched and not included in the commit.

Arths17 added 8 commits April 6, 2026 20:43
modified:   perf/scale_test/mock_org/__pycache__/api.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/core.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/events.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/handlers.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/models.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/repositories.cpython-311.pyc
modified:   perf/scale_test/mock_org/__pycache__/utils.cpython-311.pyc
modified:   perf/scale_test/mock_org/analytics.py
modified:   perf/scale_test/mock_org/api.py
modified:   perf/scale_test/mock_org/core.py
modified:   perf/scale_test/mock_org/events.py
modified:   perf/scale_test/mock_org/handlers.py
modified:   perf/scale_test/mock_org/models.py
modified:   perf/scale_test/mock_org/repositories.py
modified:   perf/scale_test/mock_org/utils.py
- Emit = ... for dataclass fields whose defaults are non-literal expressions\n- Keep literal defaults unchanged so generated stubs preserve constructor shape\n- Add a stubgen regression test for dataclass defaults\n\nFixes facebook#3221
Copilot AI review requested due to automatic review settings April 24, 2026 04:52
@meta-cla meta-cla Bot added the cla signed label Apr 24, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR, as represented by the provided diffs, adds a synthetic “mock_org” Python package and a generator script intended for performance/scale testing (deep inheritance graphs, heavy typing, dense docstrings, and cross-module imports).

Changes:

  • Adds large generated modules under perf/scale_test/mock_org/ to stress indexing/type-analysis workflows.
  • Adds perf/scale_test/generate_mock_org.py to (re)generate the dataset and a short README describing usage.
  • Adds perf/scale_test/mock_org/__init__.py exporting the mock package’s modules.

Reviewed changes

Copilot reviewed 12 out of 28 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
perf/scale_test/mock_org/models.py Adds a large synthetic module with generics, dataclasses, and async call patterns.
perf/scale_test/mock_org/events.py Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/core.py Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/analytics.py Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/init.py Declares package exports for the mock org modules.
perf/scale_test/generate_mock_org.py Adds the generator used to produce the synthetic package/modules.
perf/scale_test/README.md Documents the perf suite and regeneration command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +9
#!/usr/bin/env python3
"""Generate a synthetic enterprise-scale Python codebase for indexing benchmarks.

This generator creates a package with many modules that intentionally stress:
- deep inheritance and large class graphs
- heavy typing usage (Generic, Protocol, Annotated, callable-heavy signatures)
- dense cross-module and circular import structure
- high metadata volume via Google-style docstrings
"""
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description is about a Rust stubgen change (“preserve dataclass field defaults”), but the provided diffs only add a Python perf/scale-test dataset and generator. Please update the PR title/description to match these changes, or split the perf fixture additions into a separate PR so the stubgen fix can be reviewed/tracked independently (e.g., against #3221).

Copilot uses AI. Check for mistakes.
V = TypeVar("V")
W = TypeVar("W")

ComplexCallable = Callable[[List[T], Dict[str, Any]], Awaitable[V]]
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +98
output: List[V] = []
for position, item in enumerate(items):
output.append(await callback([item], {"position": position}))
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Copilot uses AI. Check for mistakes.
]


async def noop_callback(values: List[int], metadata: Dict[str, str]) -> float:
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Suggested change
async def noop_callback(values: List[int], metadata: Dict[str, str]) -> float:
async def noop_callback(values: List[int], metadata: Dict[str, Any]) -> float:

Copilot uses AI. Check for mistakes.
"""
output: List[V] = []
for position, item in enumerate(items):
output.append(await callback([item], {{"position": position}}))
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator emits {"position": position} (an int) for collect(), but the checked-in generated modules are inconsistent (some use str(position)). This output drift makes regeneration non-idempotent and can cause confusing diffs and inconsistent typing across modules. Suggest updating the generator template to match the intended checked-in output (e.g., consistently str(position) or consistently typed metadata), then re-generate the committed modules to keep the dataset reproducible.

Suggested change
output.append(await callback([item], {{"position": position}}))
output.append(await callback([item], {{"position": str(position)}}))

Copilot uses AI. Check for mistakes.
Comment thread perf/scale_test/README.md

Generated synthetic Python modules for stressing import and type indexing.

Use generate_mock_org.py to regenerate this dataset.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing multiple ~2K-line generated Python modules significantly increases repo size and can slow down clones, indexing, and CI steps that scan the tree. If the benchmark harness can generate these files at build/benchmark time, consider committing only the generator (and perhaps a small “golden” seed) rather than the full generated output; alternatively, document why committing the generated modules is required (determinism, offline runs, etc.) and consider keeping the generated set minimal.

Suggested change
Use generate_mock_org.py to regenerate this dataset.
The generated modules are checked in so the benchmark can run deterministically
and in offline environments without requiring a generation step during CI or
local performance runs.
Use `generate_mock_org.py` only when intentionally refreshing this dataset.
Keep the committed generated corpus minimal so repository size, clone time,
indexing, and CI tree scans stay as small as practical.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,2042 @@
from __future__ import annotations
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These mock_org/*.py modules appear to be generated artifacts, but they don’t include an explicit “generated file / do not edit” header. Adding a short module-level comment (e.g., “Generated by perf/scale_test/generate_mock_org.py”) helps prevent manual edits and makes it clear how to update the fixture.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Copy link
Copy Markdown
Contributor

@kinto0 kinto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to avoid the huge binaries being added to the codebase?

@kinto0
Copy link
Copy Markdown
Contributor

kinto0 commented Apr 24, 2026

looks like someone else already started this in #3223. next time please comment on the issue when you're working on it so we can assign accordingly! if you want to collaborate with them, ping them in the discord. I'll leave this open until we discuss this more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants