Fix #3221: preserve dataclass field defaults in stubgen by Arths17 · Pull Request #3225 · facebook/pyrefly

Arths17 · 2026-04-24T04:52:27Z

Summary
Stubgen was dropping non-literal dataclass field defaults from generated stubs, which caused synthesized __init__ signatures to look required when they were actually optional.

Fix
Treat annotated class fields inside dataclasses as having a default sentinel when the source expression is not a simple literal. This preserves the public constructor shape in generated .pyi output without changing literal defaults.

Files changed

pyrefly/lib/stubgen/extract.rs
pyrefly/lib/stubgen/mod.rs
pyrefly/lib/test/stubgen/dataclasses/input.py
pyrefly/lib/test/stubgen/dataclasses/expected.pyi

Testing

cargo test -p pyrefly stubgen::tests::test_stubgen_dataclasses -- --exact

Notes

The unrelated local workspace edit in .vscode/tasks.json was left untouched and not included in the commit.

modified: perf/scale_test/mock_org/__pycache__/api.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/core.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/events.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/handlers.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/models.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/repositories.cpython-311.pyc modified: perf/scale_test/mock_org/__pycache__/utils.cpython-311.pyc modified: perf/scale_test/mock_org/analytics.py modified: perf/scale_test/mock_org/api.py modified: perf/scale_test/mock_org/core.py modified: perf/scale_test/mock_org/events.py modified: perf/scale_test/mock_org/handlers.py modified: perf/scale_test/mock_org/models.py modified: perf/scale_test/mock_org/repositories.py modified: perf/scale_test/mock_org/utils.py

- Emit = ... for dataclass fields whose defaults are non-literal expressions\n- Keep literal defaults unchanged so generated stubs preserve constructor shape\n- Add a stubgen regression test for dataclass defaults\n\nFixes facebook#3221

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR, as represented by the provided diffs, adds a synthetic “mock_org” Python package and a generator script intended for performance/scale testing (deep inheritance graphs, heavy typing, dense docstrings, and cross-module imports).

Changes:

Adds large generated modules under perf/scale_test/mock_org/ to stress indexing/type-analysis workflows.
Adds perf/scale_test/generate_mock_org.py to (re)generate the dataset and a short README describing usage.
Adds perf/scale_test/mock_org/__init__.py exporting the mock package’s modules.

Reviewed changes

Copilot reviewed 12 out of 28 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
perf/scale_test/mock_org/models.py	Adds a large synthetic module with generics, dataclasses, and async call patterns.
perf/scale_test/mock_org/events.py	Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/core.py	Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/analytics.py	Adds another large synthetic module with similar patterns for scale testing.
perf/scale_test/mock_org/init.py	Declares package exports for the mock org modules.
perf/scale_test/generate_mock_org.py	Adds the generator used to produce the synthetic package/modules.
perf/scale_test/README.md	Documents the perf suite and regeneration command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T04:54:15Z

+#!/usr/bin/env python3
+"""Generate a synthetic enterprise-scale Python codebase for indexing benchmarks.
+
+This generator creates a package with many modules that intentionally stress:
+- deep inheritance and large class graphs
+- heavy typing usage (Generic, Protocol, Annotated, callable-heavy signatures)
+- dense cross-module and circular import structure
+- high metadata volume via Google-style docstrings
+"""


The PR title/description is about a Rust stubgen change (“preserve dataclass field defaults”), but the provided diffs only add a Python perf/scale-test dataset and generator. Please update the PR title/description to match these changes, or split the perf fixture additions into a separate PR so the stubgen fix can be reviewed/tracked independently (e.g., against #3221).

Copilot · 2026-04-24T04:54:15Z

+V = TypeVar("V")
+W = TypeVar("W")
+
+ComplexCallable = Callable[[List[T], Dict[str, Any]], Awaitable[V]]


In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Copilot · 2026-04-24T04:54:15Z

+        output: List[V] = []
+        for position, item in enumerate(items):
+            output.append(await callback([item], {"position": position}))


In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Copilot · 2026-04-24T04:54:15Z

+    ]
+
+
+async def noop_callback(values: List[int], metadata: Dict[str, str]) -> float:


In models.py, ComplexCallable expects Dict[str, Any] metadata, collect() passes an int value in that metadata, but noop_callback is typed as accepting Dict[str, str]. This makes noop_callback incompatible with ComplexCallable[int, str, float] (invariant Dict), which is likely to create type errors if the module is used as a typed perf fixture. Consider making these consistent (e.g., change noop_callback to accept Dict[str, Any], or tighten ComplexCallable/collect() to use Dict[str, str] like other modules).

Suggested change

async def noop_callback(values: List[int], metadata: Dict[str, str]) -> float:

async def noop_callback(values: List[int], metadata: Dict[str, Any]) -> float:

Copilot · 2026-04-24T04:54:16Z

+        """
+        output: List[V] = []
+        for position, item in enumerate(items):
+            output.append(await callback([item], {{"position": position}}))


The generator emits {"position": position} (an int) for collect(), but the checked-in generated modules are inconsistent (some use str(position)). This output drift makes regeneration non-idempotent and can cause confusing diffs and inconsistent typing across modules. Suggest updating the generator template to match the intended checked-in output (e.g., consistently str(position) or consistently typed metadata), then re-generate the committed modules to keep the dataset reproducible.

Suggested change

output.append(await callback([item], {{"position": position}}))

output.append(await callback([item], {{"position": str(position)}}))

Copilot · 2026-04-24T04:54:16Z

+
+Generated synthetic Python modules for stressing import and type indexing.
+
+Use generate_mock_org.py to regenerate this dataset.


Committing multiple ~2K-line generated Python modules significantly increases repo size and can slow down clones, indexing, and CI steps that scan the tree. If the benchmark harness can generate these files at build/benchmark time, consider committing only the generator (and perhaps a small “golden” seed) rather than the full generated output; alternatively, document why committing the generated modules is required (determinism, offline runs, etc.) and consider keeping the generated set minimal.

Suggested change

Use generate_mock_org.py to regenerate this dataset.

The generated modules are checked in so the benchmark can run deterministically

and in offline environments without requiring a generation step during CI or

local performance runs.

Use `generate_mock_org.py` only when intentionally refreshing this dataset.

Keep the committed generated corpus minimal so repository size, clone time,

indexing, and CI tree scans stay as small as practical.

Copilot · 2026-04-24T04:54:16Z

@@ -0,0 +1,2042 @@
+from __future__ import annotations


These mock_org/*.py modules appear to be generated artifacts, but they don’t include an explicit “generated file / do not edit” header. Adding a short module-level comment (e.g., “Generated by perf/scale_test/generate_mock_org.py”) helps prevent manual edits and makes it clear how to update the fixture.

github-actions · 2026-04-24T05:11:52Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

kinto0

is there a way to avoid the huge binaries being added to the codebase?

kinto0 · 2026-04-24T17:45:58Z

looks like someone else already started this in #3223. next time please comment on the issue when you're working on it so we can assign accordingly! if you want to collaborate with them, ping them in the discord. I'll leave this open until we discuss this more

Arths17 added 8 commits April 6, 2026 20:43

new file: perf/scale_test/README.md

a6240e0

Merge branch 'facebook:main' into main

1a9b680

Merge branch 'facebook:main' into main

c3b94e8

Merge branch 'facebook:main' into main

30103b5

Merge branch 'facebook:main' into main

7f3e8eb

Merge branch 'facebook:main' into main

ac7aeeb

Copilot AI review requested due to automatic review settings April 24, 2026 04:52

meta-cla Bot added the cla signed label Apr 24, 2026

github-actions Bot added the size/xl label Apr 24, 2026

Copilot AI reviewed Apr 24, 2026

View reviewed changes

kinto0 requested changes Apr 24, 2026

View reviewed changes

kinto0 self-assigned this Apr 24, 2026

NathanTempest mentioned this pull request Apr 24, 2026

stubgen: emit reduced field()/Field() in class stubs (with optional doc metadata) #3223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #3221: preserve dataclass field defaults in stubgen#3225

Fix #3221: preserve dataclass field defaults in stubgen#3225
Arths17 wants to merge 8 commits intofacebook:mainfrom
Arths17:fix/issue-3221-dataclass-stubgen

Arths17 commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kinto0 left a comment

Uh oh!

kinto0 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		]


		async def noop_callback(values: List[int], metadata: Dict[str, str]) -> float:

	output.append(await callback([item], {{"position": position}}))
	output.append(await callback([item], {{"position": str(position)}}))


		Generated synthetic Python modules for stressing import and type indexing.

		Use generate_mock_org.py to regenerate this dataset.

-Use generate_mock_org.py to regenerate this dataset.
+The generated modules are checked in so the benchmark can run deterministically
+and in offline environments without requiring a generation step during CI or
+local performance runs.
+Use `generate_mock_org.py` only when intentionally refreshing this dataset.
+Keep the committed generated corpus minimal so repository size, clone time,
+indexing, and CI tree scans stay as small as practical.

Conversation

Arths17 commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kinto0 left a comment

Choose a reason for hiding this comment

Uh oh!

kinto0 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants