Skip to content

impl(bigquery): basic row parsing#5916

Draft
alvarowolfx wants to merge 4 commits into
googleapis:mainfrom
alvarowolfx:impl-bq-row-basic-parse
Draft

impl(bigquery): basic row parsing#5916
alvarowolfx wants to merge 4 commits into
googleapis:mainfrom
alvarowolfx:impl-bq-row-basic-parse

Conversation

@alvarowolfx

Copy link
Copy Markdown
Contributor

Towards #5844

@product-auto-label product-auto-label Bot added the api: bigquery Issues related to the BigQuery API. label Jun 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust error handling and type conversion capabilities for BigQuery query results, including the FromSql trait for mapping database values to Rust types and a schema-aware Row parser. The reviewer feedback highlights critical correctness and performance improvements: addressing unnecessary clones and allocations during row parsing, preventing silent data corruption when handling special float values (like NaN and Infinity), ensuring consistent row-to-schema length validation, and improving error diagnostics by propagating actual field names instead of hardcoding 'unknown' in conversion errors.

Comment on lines +70 to +76
match schema.get_field_by_index(i) {
Some(f) => {
let field_type = f.r#type.clone();
let schema = Arc::new(Schema::new_from_field(f.clone()));
let value = convert_value(value, field_type, &schema)?;
values.push(value);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation clones f.r#type (a String), clones f (a potentially large TableFieldSchema), and allocates a new Arc<Schema> for every single cell in every row. Furthermore, the _schema parameter in convert_value is completely unused.

To avoid these expensive allocations and clones, and to allow passing the actual column name for better error diagnostics, we can pass &f.r#type and &f.name to convert_value.

            match schema.get_field_by_index(i) {
                Some(f) => {
                    let value = convert_value(value, &f.r#type, &f.name)?;
                    values.push(value);
                }
                None => continue,
            }

Comment on lines +180 to +188
"FLOAT" | "FLOAT64" => {
let num = value.parse::<f64>().map_err(|e| RowError::TypeConversion {
column: "unknown".to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Number(
serde_json::Number::from_f64(num).unwrap_or_else(|| serde_json::Number::from(0)),
))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

BigQuery FLOAT64 columns can return special values like "Infinity", "-Infinity", or "NaN". While f64::parse successfully parses these, serde_json::Number::from_f64 returns None for non-finite floats. The current unwrap_or_else silently defaults these values to 0, causing silent data corruption.

Instead of defaulting to 0, we can fall back to storing the original string representation as Value::String(value). Since FromSql for f64 already supports parsing from Value::String, this will correctly preserve and parse special float values.

        "FLOAT" | "FLOAT64" => {
            let num = value.parse::<f64>().map_err(|e| RowError::TypeConversion {
                column: field_name.to_string(),
                source: ConvertError::Convert(Box::new(e)),
            })?;
            match serde_json::Number::from_f64(num) {
                Some(n) => Ok(Value::Number(n)),
                None => Ok(Value::String(value)),
            }
        }

Comment thread src/bigquery/src/query/row.rs Outdated
Comment on lines +64 to +87
pub(crate) fn try_new(row: &Struct, schema: &Arc<Schema>) -> Result<Self> {
let field_list = get_field_list(row)?;

let mut values = ListValue::new();
for (i, cell) in field_list.iter().enumerate() {
let value = get_field_value(cell)?;
match schema.get_field_by_index(i) {
Some(f) => {
let field_type = f.r#type.clone();
let schema = Arc::new(Schema::new_from_field(f.clone()));
let value = convert_value(value, field_type, &schema)?;
values.push(value);
}
None => continue,
}
}

if values.len() != schema.len() {
return Err(RowError::InvalidRowFormat(format!(
"schema and row cell mismatch (expected {}, got {})",
schema.len(),
values.len()
)));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Checking the length mismatch at the end of the loop is inconsistent because if field_list has more elements than schema, the extra elements are silently ignored (since schema.get_field_by_index(i) returns None and the loop continues), resulting in values.len() == schema.len() and succeeding.

Performing an early length check at the beginning of try_new ensures consistent error behavior for any mismatch (more or fewer cells) and avoids unnecessary processing.

Comment on lines +146 to +157
fn convert_value(value: Value, field_type: String, _schema: &Arc<Schema>) -> Result<Value> {
match value {
Value::Null => Ok(Value::Null),
Value::String(v) => convert_basic_type(v, field_type),
Value::Object(_) => unimplemented!("TODO(#5592): nested records not implemented"),
Value::Array(_) => unimplemented!("TODO(#5592): repeated fields not implemented"),
_ => Err(RowError::InvalidRowFormat(format!(
"cell value is not an object: value={:?}, field_type={:?}",
value, field_type
))),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update convert_value to accept field_type as &str and field_name as &str to avoid unnecessary clones and provide better error context.

Suggested change
fn convert_value(value: Value, field_type: String, _schema: &Arc<Schema>) -> Result<Value> {
match value {
Value::Null => Ok(Value::Null),
Value::String(v) => convert_basic_type(v, field_type),
Value::Object(_) => unimplemented!("TODO(#5592): nested records not implemented"),
Value::Array(_) => unimplemented!("TODO(#5592): repeated fields not implemented"),
_ => Err(RowError::InvalidRowFormat(format!(
"cell value is not an object: value={:?}, field_type={:?}",
value, field_type
))),
}
}
fn convert_value(value: Value, field_type: &str, field_name: &str) -> Result<Value> {
match value {
Value::Null => Ok(Value::Null),
Value::String(v) => convert_basic_type(v, field_type, field_name),
Value::Object(_) => unimplemented!("TODO(#5592): nested records not implemented"),
Value::Array(_) => unimplemented!("TODO(#5592): repeated fields not implemented"),
_ => Err(RowError::InvalidRowFormat(format!(
"cell value is not an object: value={:?}, field_type={:?}",
value, field_type
))),
}
}

Comment on lines +159 to +160
fn convert_basic_type(value: String, field_type: String) -> Result<Value> {
match field_type.as_str() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update convert_basic_type signature to accept field_type and field_name as &str.

Suggested change
fn convert_basic_type(value: String, field_type: String) -> Result<Value> {
match field_type.as_str() {
fn convert_basic_type(value: String, field_type: &str, field_name: &str) -> Result<Value> {

Comment on lines +173 to +179
"INTEGER" | "INT64" => {
let num = value.parse::<i64>().map_err(|e| RowError::TypeConversion {
column: "unknown".to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Number(serde_json::Number::from(num)))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the actual field_name instead of the hardcoded "unknown" for better error diagnostics.

Suggested change
"INTEGER" | "INT64" => {
let num = value.parse::<i64>().map_err(|e| RowError::TypeConversion {
column: "unknown".to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Number(serde_json::Number::from(num)))
}
"INTEGER" | "INT64" => {
let num = value.parse::<i64>().map_err(|e| RowError::TypeConversion {
column: field_name.to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Number(serde_json::Number::from(num)))
}

Comment on lines +189 to +198
"BOOLEAN" | "BOOL" => {
let b = value
.to_lowercase()
.parse::<bool>()
.map_err(|e| RowError::TypeConversion {
column: "unknown".to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Bool(b))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the actual field_name instead of the hardcoded "unknown" for better error diagnostics.

Suggested change
"BOOLEAN" | "BOOL" => {
let b = value
.to_lowercase()
.parse::<bool>()
.map_err(|e| RowError::TypeConversion {
column: "unknown".to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Bool(b))
}
"BOOLEAN" | "BOOL" => {
let b = value
.to_lowercase()
.parse::<bool>()
.map_err(|e| RowError::TypeConversion {
column: field_name.to_string(),
source: ConvertError::Convert(Box::new(e)),
})?;
Ok(Value::Bool(b))
}

@alvarowolfx

Copy link
Copy Markdown
Contributor Author

Splitting errors into #5917. Later will split FromSql initial impl.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.14607% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.84%. Comparing base (75c4db2) to head (3936289).

Files with missing lines Patch % Lines
src/bigquery/src/query/row.rs 73.68% 45 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5916      +/-   ##
==========================================
- Coverage   97.90%   97.84%   -0.07%     
==========================================
  Files         234      236       +2     
  Lines       59940    60202     +262     
==========================================
+ Hits        58683    58902     +219     
- Misses       1257     1300      +43     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alvarowolfx

Copy link
Copy Markdown
Contributor Author

Split FromSql trait and some basic conversion into PR #5924

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the BigQuery API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant