Skip to content

feat: support geometry and geography types#2653

Draft
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:support-geometry-geography-types
Draft

feat: support geometry and geography types#2653
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:support-geometry-geography-types

Conversation

@wirybeaver

@wirybeaver wirybeaver commented Jun 16, 2026

Copy link
Copy Markdown

Summary

This draft PR adds Iceberg Geometry/Geography primitive type support by reusing arrow-rs/parquet-geospatial support instead of introducing a local geospatial model.

  • Adds GeometryType and GeographyType, using parquet_geospatial::WkbEdges for geography edge interpolation algorithms.
  • Converts Geometry/Geography to Arrow WKB extension metadata and enables Parquet geospatial logical type writing.
  • Maps Avro, Glue, and HMS representations to bytes/binary.
  • Rejects non-null JSON defaults, blocks unsupported partition transforms, and skips byte min/max statistics for spatial values.

Related issues

Related to #2411 and #1884.

@paleolimbot paleolimbot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

I will take a closer look tomorrow, but note that there is a PR into parquet-geospatial to fix the translation between GeoArrow and Parquet types, which mostly was incorrect for Geography but there were a few subtle issues otherwise. There are a number of test cases parameterized nicely at the bottom that you can reuse here. apache/arrow-rs#10065

Just because I happen to be reviewing it, it may also be useful to reference the initial Go PR which has a similar scope: apache/iceberg-go#1138

@wirybeaver

Copy link
Copy Markdown
Author

@paleolimbot Thanks for pointing out those edge cases. I will take a closer look.

@huan233usc huan233usc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see this — the type model, Parquet geometry/geography logical types, and especially skipping byte min/max for spatial columns all line up with the spec. A couple of design-alignment points inline.


impl GeographyType {
/// Creates a geography type with an optional coordinate reference system and edge interpolation algorithm.
pub fn new(crs: Option<String>, algorithm: WkbEdges) -> Result<Self> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public iceberg::spec API exposes parquet_geospatial::WkbEdges (GeographyType::new/algorithm()), coupling the type layer to the parquet-geospatial crate. The set of edge algorithms is defined by the spec itself (edge-interpolation-algorithm) — could we define an Iceberg-owned enum mirroring the spec and convert to WkbEdges only at the Parquet boundary, keeping the type layer format-agnostic?

PrimitiveType::Fixed(_)
| PrimitiveType::Binary
| PrimitiveType::Geometry(_)
| PrimitiveType::Geography(_) => PrimitiveLiteral::Binary(Vec::from(bytes)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single-value path decodes geo bytes as opaque WKB. Per the spec, geo bounds are a single point encoded as x:y[:z][:m] little-endian f64s, not WKB (bound-serialization, bounds-for-geometry-and-geography). Both implementations skip spatial bounds today so it's latent, but worth agreeing the bound codec follows the spec point encoding so they don't diverge later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants