Notes for Claude Code when working in this repo (the Nucleus Python SDK).
The official Python client for Nucleus. Wraps the /v1/nucleus REST endpoints on scaleapi. Distributed on PyPI as scale-nucleus.
- Sources live under
nucleus/. - Backend lives in the
scaleapirepo atserver/src/routes/v1/select.tsandserver/src/lib/select/api/. - The default API base URL is
NUCLEUS_ENDPOINT = "https://api.scale.com/v1/nucleus"(nucleus/constants.py). Override via theendpoint=kwarg orNUCLEUS_ENDPOINTenv var (e.g. point at fedramp).
Releases are version-numbered with Semantic Versioning and tracked in CHANGELOG.md using the Keep a Changelog format.
When making a user-facing change, the convention (see PRs #459, #455) is:
- Bump
version = "..."inpyproject.tomlunder[tool.poetry]. This is the single version source — there is no__version__innucleus/__init__.py.- Patch bump for additive, backwards-compatible changes (new fields, new methods).
- Minor bump for new features that change behaviour or remove deprecated paths.
- Major bump for breaking changes (Python version drops, sentinel removal, etc.).
- Prepend a
## [X.Y.Z](https://github.com/scaleapi/nucleus-python-client/releases/tag/vX.Y.Z) - YYYY-MM-DDsection toCHANGELOG.mdwith### Added/### Changed/### Fixed/### Removedsubsections as appropriate. - Commit the version bump + CHANGELOG entry alongside the code change in the same PR.
Pure refactors / doc-only PRs (#456) sometimes skip the version bump. When in doubt, bump.
- Branch naming:
<author>/<kebab-description>(e.g.vinayparakala/expose-phash-on-dataset-item). - PR title commonly starts with the Linear ticket:
[DE-XXXX] <description>— seegit log --oneline -20. - PRs land via squash merge.
nucleus/__init__.py—NucleusClient, top-level operations.nucleus/dataset.py—Datasetclass. Most user-facing methods live here (item upload/fetch, generators, queries, slices, autotags, exports). Generators page through the backend vianucleus/utils.py:paginate_generator.nucleus/dataset_item.py—DatasetItemdataclass.DatasetItem.from_jsonis the single deserialization entry point for items coming back from the API — every SDK method that returns aDatasetItem(generators, queries,iloc/refloc/loc, theitemsproperty) routes through it. To expose a new server-side field on items, add it to the dataclass +from_jsonand you're done on the SDK side.nucleus/utils.py—convert_export_payloadandformat_dataset_item_responseare the shared shapers used by the export and single-item endpoints. They wrap raw JSON into typed objects via the respectivefrom_jsonclassmethods.nucleus/constants.py— All API payload keys are constants here. When adding a new field, add a*_KEYconstant first and reference it fromfrom_json/to_payloadrather than inlining the string.nucleus/annotation.py,nucleus/prediction.py— Annotation and prediction types. Each has its ownfrom_json/to_payload.
Run the suite from the repo root:
poetry install
poetry run pytest testsMany tests require a real NUCLEUS_API_KEY and hit the live API; use pytest -k <name> to scope. Pre-commit hooks (.pre-commit-config.yaml) run black, ruff, isort.