fix: JsonItemsParser should yield floats, not Decimals#1052
Conversation
ijson.items() parses non-integer JSON numbers as decimal.Decimal by default, which the CDK cannot serialize downstream (orjson raises 'Decimal is not JSON serializable'). This broke any JsonItemsDecoder stream with decimal fields (e.g. Amazon Brand Analytics clickShare/conversionShare, Sales & Traffic, Vendor reports). Pass use_float=True so non-integer numbers are parsed as float, matching the json.loads/orjson behavior of the other JSON parsers. Adds a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@tolik0/cdk/jsonitems-use-float#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch tolik0/cdk/jsonitems-use-floatPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a serialization regression in the CDK’s streaming JSON path parser (JsonItemsParser) where non-integer JSON numbers were being parsed as decimal.Decimal (breaking downstream orjson serialization). It aligns JsonItemsParser numeric parsing with json.loads/orjson by enabling float parsing in ijson.
Changes:
- Pass
use_float=Truetoijson.items(...)so non-integer numbers are parsed asfloatinstead ofDecimal. - Add a regression unit test asserting float/int types and
orjson-serializability for decoded records.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py |
Ensures JsonItemsParser yields floats (not Decimal) for non-integer JSON numbers via use_float=True. |
unit_tests/sources/declarative/decoders/test_composite_decoder.py |
Adds regression coverage to confirm float parsing and orjson serialization compatibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR modifies JSON number parsing in the declarative source framework. The ChangesJSON numeric type handling
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/autofix
|
What
Follow-up to #1049.
JsonItemsParsercallsijson.items(...)withoutuse_float=True, so non-integer JSON numbers are parsed asdecimal.Decimal. The CDK can't serializeDecimaldownstream —orjsonraisesTypeError: Type is not JSON serializable: decimal.Decimal— so anyJsonItemsDecoderstream with decimal fields fails at serialization.Discovered while migrating
source-amazon-seller-partner's report streams: Brand Analytics (clickShare,conversionShare), Sales & Traffic, and Vendor reports all carry decimals and broke on read. The originalGzipJsonDecoderusedjson.loads(floats), so this is a regression.How
Pass
use_float=Truetoijson.items, so non-integer numbers come back asfloat(integers stayint), matching thejson.loads/orjsonbehavior ofJsonParser/JsonLineParser. Adds a regression test asserting floats andorjson-serializability.Validation
Verified on a real 3.45 GB Amazon Search Terms report:
clickShare/conversionSharenowfloat(wereDecimal), andorjson.dumpssucceeds. 42 decoder unit tests pass.🤖 Generated with Claude Code
Summary by CodeRabbit