Skip to content

feat(cubeorchestrator): Split numeric query result into typed variants#11036

Open
ovr wants to merge 1 commit into
masterfrom
feat/numeric-query-result
Open

feat(cubeorchestrator): Split numeric query result into typed variants#11036
ovr wants to merge 1 commit into
masterfrom
feat/numeric-query-result

Conversation

@ovr

@ovr ovr commented Jun 8, 2026

Copy link
Copy Markdown
Member

Replace the single f64 DBResponsePrimitive::Number / FieldValue::Number variant with precise Int64/UInt64/Float64 variants so integer-vs-float type is preserved through the query result path and large 64-bit integers no longer lose precision via f64.

@ovr ovr requested review from a team as code owners June 8, 2026 14:53
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Jun 8, 2026
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 2m 11s —— View job


Review

Tasks & detailed findings (click to expand) — 1 high, 2 medium, 2 low
  • Pull diff and inspect changed files
  • Inspect DBResponsePrimitive serialize/deserialize asymmetry
  • Inspect FieldValue integer cast semantics in scan.rs
  • Inspect JS → FieldValue path in stream.rs
  • Inspect Int128/UInt128 reachability
  • Post inline comments

Summary

The split into typed numeric variants is the right direction — i64::MIN/u64::MAX round-trip cleanly through the new deserializer, and the size-of test guards layout. Two concerns worth resolving before merge:

  1. Wire format change is silent and asymmetric (high). The hand-written Serialize emits every numeric variant as a JSON string (collect_str), but Deserialize only accepts JSON numbers. So:

    • On-the-wire JSON output for every column cell changes from e.g. 42 / 2.0 to "42" / "2". Any consumer that expects numeric JSON (JS clients reading result.data, downstream Rust deserializing back through this type, snapshot tests, third-party tooling) is affected. Worth either confirming all consumers tolerate strings, or emitting numbers when they fit JS safe range and strings only beyond 2^53.
    • A value serialized and then deserialized by the same type does not round-trip: Int64(5)"5"String("5") (because visit_str returns the String variant). This will silently break any internal flow that re-parses an already-serialized result.
  2. Integer cast semantics changed for narrow targets (medium). In scan.rs, the Int16/Int32 branches now do number as i16 / number as i32 on Int64/UInt64. Rust's as between integer types truncates / wraps, whereas the previous Float64.round() as i16 saturates (since Rust 1.45). So 1_000_000_i64 now becomes 16960 for an Int16 column instead of i16::MAX. If saturation is preferred (it usually is for a "tried to fit too big a number into too small a column" case), use i16::try_from(number).unwrap_or(if number > 0 { i16::MAX } else { i16::MIN }) or .clamp(i16::MIN as i64, i16::MAX as i64) as i16.

  3. stream.rs doesn't take advantage of the new variants (medium). JsNumberFieldValue::Float64 always. JS Number doesn't distinguish int vs float, but integer-valued doubles are common from JS code; consider checking n.fract() == 0.0 && n.is_finite() && n >= i64::MIN as f64 && n <= i64::MAX as f64 and mapping to Int64. Or document that the JS streaming path is an explicit exception.

  4. Int128/UInt128 visitor branches are likely dead code (low). serde_json's default Number representation never calls visit_i128 / visit_u128 — those are only reached if arbitrary_precision is enabled or if a different deserializer feeds values larger than u64. With the current serde_json setup, the Int128/UInt128 variants will only appear if something constructs them directly. Either enable arbitrary_precision, add a visit_str parse fallback for big integer literals, or document the limitation.

  5. Test asserts only Float64 for 2.0 (low). test_from_js_raw_data_parses_numeric_variants feeds 2.0 and expects Float64(2.0). Note that serde_json parses 2.0 via visit_f64 (because of the decimal point), so this works — but 2 (integer literal) would route to visit_u64 and yield UInt64(2). A test case covering "integer-valued floats vs integer literals" would make the distinction explicit.

Notes

  • Size test (size_of::<DBResponsePrimitive>() == 32) is a good regression guard. Worth a one-line comment that the value is dominated by the String payload ({ptr,len,cap} = 24 + tag rounded to 8).
  • Display for floats uses Rust's default f64::to_string, which can print 2 for 2.0 — same shape as the new Serialize. Fine for diagnostics.
    — branch: feat/numeric-query-result

@ovr ovr force-pushed the feat/numeric-query-result branch from fbf6637 to 86eb368 Compare June 8, 2026 14:54
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 1m 17s —— View job


Code Review — 3 medium concerns (all pre-existing inline threads still apply)

Full review

Summary

This PR replaces a single Number(f64) cell with Int64/UInt64/Float64 (and Int128/UInt128 as decimal strings) so that integer-vs-float type and full 64-bit precision survive the result path. The intent is right and the test coverage for parsing + the explicit size_of guard at 32 bytes is good. Three concerns from the previous review are still live in this revision — I am not re-posting them inline.

Re-affirmed 3 prior threads still apply:

  1. Wire-format change + asymmetric round-tripquery_result_transform.rs:1187. Every numeric variant now serializes as a JSON string via collect_str, but Deserialize only accepts JSON numbers. Two effects:

    • External payload for every numeric cell changes from 42 / 2.0 to "42" / "2". JS consumers, BI clients, and any snapshot tests will see strings where they previously saw numbers. The PR description doesn't call this out — worth being explicit in the changelog if intentional.
    • Int64(5)"5" → re-deserialized as String("5"), not Int64(5). Any code that re-parses a serialized result silently loses typing. A to_stringfrom_str round-trip test would lock this in (or accept strings on the deserialize side and make the type its own inverse).
    • Also: Float64(2.0) serializes as "2", indistinguishable from Int64(2) → "2" on the wire. If the goal is to preserve int-vs-float across the boundary, the string form drops the distinction the variants were introduced to keep.
  2. Integer-cast semantics change in scan.rsscan.rs:949 (and the Int32 / Int64 branches below). Previously f64.round() as iN saturated (Rust ≥1.45), so out-of-range values pinned to iN::MAX/iN::MIN. New Int64(n) as i16 / UInt64(n) as i16 truncate/wrap: 1_000_000_i64 as i16 == 16960. Same applies to the Int64 arm when handed a UInt64 ≥ i64::MAX — it silently wraps negative. The String arm logs a warning on parse failure; the typed-int arms now silently corrupt the value, which is the opposite of what one would expect. i16::try_from(n).unwrap_or(saturated) (or an explicit warn-and-saturate) restores the prior behavior.

  3. Streaming path always emits Float64stream.rs:283. JsNumber is the only numeric type in JS, so integer-valued JS numbers (1, 42, every COUNT(*) result coming back from a JS data source) now land as Float64 and the new int variants don't help. If int/float preservation is the goal end-to-end, this path should detect f.fract() == 0.0 within the safe-int range and emit Int64. If keeping Float64-only here is intentional (e.g. JS can't tell us anyway), a one-line comment explaining why would save the next reader a detour.

Other notes

  • db_primitive_to_field_value in orchestrator.rs:135-137 flattens Int128/UInt128 into FieldValue::String. Consumers downstream see no signal that this was a number that overflowed u64/i64 — they'd need to attempt parse::<i128>(). Adding an Int128/UInt128 arm to FieldValue (or at least documenting this collapse) would be more consistent with the rest of the change.
  • The 32-byte size assertion is great; consider duplicating the spirit on FieldValue if it ever gains 128-bit variants.

Comment on lines +1175 to +1187
match self {
DBResponsePrimitive::Null => serializer.serialize_none(),
DBResponsePrimitive::Boolean(b) => serializer.serialize_bool(*b),
DBResponsePrimitive::Int64(n) => serializer.collect_str(n),
DBResponsePrimitive::UInt64(n) => serializer.collect_str(n),
DBResponsePrimitive::Float64(n) => serializer.collect_str(n),
DBResponsePrimitive::Int128(s) => serializer.serialize_str(s),
DBResponsePrimitive::UInt128(s) => serializer.serialize_str(s),
DBResponsePrimitive::String(s) => serializer.serialize_str(s),
DBResponsePrimitive::Uncommon(v) => v.serialize(serializer),
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wire format change + asymmetric round-trip.

Every numeric variant now serializes as a JSON string (collect_str), but Deserialize only accepts JSON numbers. Two consequences:

  1. The on-wire JSON payload for every numeric cell changes from 42 / 2.0 to "42" / "2". JS consumers reading result.data and any external client (BI tools, snapshot tests) will see strings where they used to see numbers.
  2. A round-trip through this type is lossy: Int64(5) → serializes to "5" → deserializes via visit_str → ends up as DBResponsePrimitive::String("5"), not Int64(5). Any internal code that re-parses an already-serialized result silently loses the numeric type.

Consider:

  • Emitting JSON numbers when the value fits in JS's safe integer range (|n| <= 2^53), and strings only beyond that, OR
  • Always emitting strings but also accepting strings on the deserialize side so the type is its own inverse.

A test that explicitly round-trips serde_json::to_stringfrom_str and asserts variant equality would lock this in.

Comment on lines +948 to +949
(FieldValue::Int64(number), builder) => builder.append_value(number as i16)?,
(FieldValue::UInt64(number), builder) => builder.append_value(number as i16)?,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer cast semantics change: saturate → truncate/wrap.

Previously, this column path went through FieldValue::Number(f64)number.round() as i16. Rust's f64 as i16 saturates (since Rust 1.45), so e.g. 1_000_000.0 → i16::MAX.

With Int64(n) as i16 / UInt64(n) as i16, the cast truncates/wraps: 1_000_000_i64 as i16 == 16960. Same concern for the Int32 branch below, and for Int64 when fed UInt64i64::MAX.

If saturation is the intended semantics for "value too wide for column type" (it usually is, otherwise the warning row produced by the String arm gets silently better treatment than typed-int rows), replace as with a checked path, e.g.:

Suggested change
(FieldValue::Int64(number), builder) => builder.append_value(number as i16)?,
(FieldValue::UInt64(number), builder) => builder.append_value(number as i16)?,
(FieldValue::Int64(number), builder) => builder.append_value(i16::try_from(number).unwrap_or(if number > 0 { i16::MAX } else { i16::MIN }))?,
(FieldValue::UInt64(number), builder) => builder.append_value(i16::try_from(number).unwrap_or(i16::MAX))?,

Ok(FieldValue::String(Cow::Owned(s.value(&mut self.cx))))
} else if let Ok(n) = value.downcast::<JsNumber, _>(&mut self.cx) {
Ok(FieldValue::Number(n.value(&mut self.cx)))
Ok(FieldValue::Float64(n.value(&mut self.cx)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS streaming path collapses every JsNumber to Float64, so the new Int64/UInt64 variants don't help here even when the JS value is integer-valued. If the goal is to preserve int-vs-float through the whole result path, this should detect integer-valued doubles:

let f = n.value(&mut self.cx);
if f.is_finite() && f.fract() == 0.0
    && f >= i64::MIN as f64 && f <= i64::MAX as f64 {
    Ok(FieldValue::Int64(f as i64))
} else {
    Ok(FieldValue::Float64(f))
}

(Note: integers > 2^53 from JS already arrive as floats with lost precision — this only matters for values inside JS's safe integer range, which is most real-world cases.)

If keeping this path Float64-only is intentional, a comment explaining why would help future readers.

Comment on lines +1209 to +1217
Ok(DBResponsePrimitive::Int64(v))
}

fn visit_i128<E: de::Error>(self, v: i128) -> Result<Self::Value, E> {
Ok(DBResponsePrimitive::Number(v as f64))
Ok(DBResponsePrimitive::Int128(v.to_string()))
}

fn visit_u64<E: de::Error>(self, v: u64) -> Result<Self::Value, E> {
Ok(DBResponsePrimitive::Number(v as f64))
Ok(DBResponsePrimitive::UInt64(v))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visit_i128 / visit_u128 are likely unreachable from serde_json by default.

serde_json parses all numbers through visit_u64 / visit_i64 / visit_f64 unless the arbitrary_precision feature is enabled. With the current setup, the Int128 / UInt128 variants will essentially never be produced from JSON input — only from hand-constructed values.

If 128-bit JSON input is a real concern (e.g. bigint measures from CubeStore), consider either enabling serde_json/arbitrary_precision (and handling Number accordingly) or adding a visit_str fallback that tries i128::from_str / u128::from_str for digit-only strings before falling back to String. Otherwise it's worth a comment noting the limitation.

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 49s —— View job


Code Review — no changes since prior review

Verdict: 1 high, 3 medium, 1 low — all from the prior pass; the diff is unchanged on this push, so no new inline comments. Re-affirmed 4 prior threads still apply.

Tasks & re-affirmed findings (click to expand)
  • Diff vs origin/master — identical to previous review (1 commit, 5 files)
  • List existing review threads via gh list-review-threads
  • Verify each prior thread's line is still present in the current source
  • Skip duplicate inline comments (dedupe by path+line+root cause)

Re-affirmed (still unresolved, still applicable)

  1. [high] Wire-format change + asymmetric round-trip — rust/cube/cubeorchestrator/src/query_result_transform.rs:1178-1180 (collect_str for every numeric variant) vs :1208-1226 (deserializer accepts numbers only). External payload shifts 42"42", and Int64(5) → serialize → deserialize lands as String("5"). Also Float64(2.0) and Int64(2) both serialize to "2", indistinguishable on the wire — defeats the variant split's stated goal.
  2. [medium] Integer cast as truncates/wraps — rust/cubesql/cubesql/src/compile/engine/df/scan.rs:956-957 (and the Int32 / Int64 arms below). Previously f64.round() as iN saturated; now Int64(n) as i16 wraps (1_000_000 → 16960). The String arm still warns-and-nulls on parse failure; typed-int arms silently corrupt — inconsistent.
  3. [medium] JS streaming path always emits Float64packages/cubejs-backend-native/src/stream.rs:283. Integer-valued JS numbers (every COUNT(*) from a JS data source) land as Float64, so the new int variants don't reach this entry point at all. Either detect fract() == 0.0 within safe-int range, or document why this path is an exception.
  4. [low] visit_i128 / visit_u128 are unreachable from default serde_jsonrust/cube/cubeorchestrator/src/query_result_transform.rs:1212-1222. Without arbitrary_precision, those variants only appear from hand-constructed values. Either enable that feature, add a visit_str digit-parse fallback, or comment the limitation.

Other note (not posted inline)

  • db_primitive_to_field_value in packages/cubejs-backend-native/src/orchestrator.rs:135-137 collapses Int128/UInt128 into FieldValue::String. Consumers downstream lose the "this was a number that overflowed u64/i64" signal — adding Int128/UInt128 arms to FieldValue would carry the type through consistently.
— branch: `feat/numeric-query-result`

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.50000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.32%. Comparing base (44bd455) to head (9a3f471).

Files with missing lines Patch % Lines
rust/cubesql/cubesql/src/compile/engine/df/scan.rs 62.50% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11036      +/-   ##
==========================================
+ Coverage   78.78%   83.32%   +4.53%     
==========================================
  Files         471      255     -216     
  Lines       94509    77274   -17235     
  Branches     3520        0    -3520     
==========================================
- Hits        74463    64391   -10072     
+ Misses      19538    12883    -6655     
+ Partials      508        0     -508     
Flag Coverage Δ
cube-backend ?
cubesql 83.32% <62.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace the single f64 `DBResponsePrimitive::Number` / `FieldValue::Number`
variant with precise `Int64`/`UInt64`/`Float64` variants so integer-vs-float
type is preserved through the query result path and large 64-bit integers no
longer lose precision via f64.
@ovr ovr force-pushed the feat/numeric-query-result branch from cc5cad1 to 9a3f471 Compare June 8, 2026 15:15
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 33s —— View job


Code Review — no changes since prior review

Verdict: 1 high, 3 medium, 1 low — all from prior passes; the diff is unchanged on this trigger, so no new inline comments. Re-affirmed 4 prior threads still apply.

Tasks & re-affirmed findings (click to expand)
  • Diff vs origin/master — identical to previous review (1 commit, 5 files, +197/-22)
  • List existing review threads via gh list-review-threads — 4 unresolved Claude threads
  • Verify each prior thread's line is still present in the current source
  • Skip duplicate inline comments (dedupe by path+line+root cause)

Re-affirmed (still unresolved, still applicable)

  1. [high] Wire-format change + asymmetric round-trip — rust/cube/cubeorchestrator/src/query_result_transform.rs:1187. collect_str for every numeric variant vs deserializer accepting numbers only. External payload shifts 42"42"; Int64(5) → serialize → deserialize lands as String("5"); Float64(2.0) and Int64(2) both serialize to "2", indistinguishable on the wire — defeats the variant split's stated goal.
  2. [medium] Integer cast as truncates/wraps — rust/cubesql/cubesql/src/compile/engine/df/scan.rs:960 (and the Int32 / Int64 arms below). Previously f64.round() as iN saturated; now Int64(n) as i16 wraps (1_000_000 → 16960). The String arm still warns-and-nulls on parse failure; typed-int arms silently corrupt — inconsistent.
  3. [medium] JS streaming path always emits Float64packages/cubejs-backend-native/src/stream.rs:283. Integer-valued JS numbers (every COUNT(*) from a JS data source) land as Float64, so the new int variants don't reach this entry point at all. Either detect fract() == 0.0 within safe-int range, or document why this path is an exception.
  4. [low] visit_i128 / visit_u128 likely unreachable from default serde_jsonrust/cube/cubeorchestrator/src/query_result_transform.rs:1217. Without arbitrary_precision, those variants only appear from hand-constructed values.

Other note (not posted inline)

  • db_primitive_to_field_value in packages/cubejs-backend-native/src/orchestrator.rs:135-137 collapses Int128/UInt128 into FieldValue::String. Consumers downstream lose the "overflowed u64/i64" signal — adding Int128/UInt128 arms to FieldValue would carry the type through consistently.

— branch: feat/numeric-query-result ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant