feat: support geometry and geography types#2653
Conversation
There was a problem hiding this comment.
Very cool!
I will take a closer look tomorrow, but note that there is a PR into parquet-geospatial to fix the translation between GeoArrow and Parquet types, which mostly was incorrect for Geography but there were a few subtle issues otherwise. There are a number of test cases parameterized nicely at the bottom that you can reuse here. apache/arrow-rs#10065
Just because I happen to be reviewing it, it may also be useful to reference the initial Go PR which has a similar scope: apache/iceberg-go#1138
|
@paleolimbot Thanks for pointing out those edge cases. I will take a closer look. |
huan233usc
left a comment
There was a problem hiding this comment.
Really nice to see this — the type model, Parquet geometry/geography logical types, and especially skipping byte min/max for spatial columns all line up with the spec. A couple of design-alignment points inline.
|
|
||
| impl GeographyType { | ||
| /// Creates a geography type with an optional coordinate reference system and edge interpolation algorithm. | ||
| pub fn new(crs: Option<String>, algorithm: WkbEdges) -> Result<Self> { |
There was a problem hiding this comment.
The public iceberg::spec API exposes parquet_geospatial::WkbEdges (GeographyType::new/algorithm()), coupling the type layer to the parquet-geospatial crate. The set of edge algorithms is defined by the spec itself (edge-interpolation-algorithm) — could we define an Iceberg-owned enum mirroring the spec and convert to WkbEdges only at the Parquet boundary, keeping the type layer format-agnostic?
| PrimitiveType::Fixed(_) | ||
| | PrimitiveType::Binary | ||
| | PrimitiveType::Geometry(_) | ||
| | PrimitiveType::Geography(_) => PrimitiveLiteral::Binary(Vec::from(bytes)), |
There was a problem hiding this comment.
The single-value path decodes geo bytes as opaque WKB. Per the spec, geo bounds are a single point encoded as x:y[:z][:m] little-endian f64s, not WKB (bound-serialization, bounds-for-geometry-and-geography). Both implementations skip spatial bounds today so it's latent, but worth agreeing the bound codec follows the spec point encoding so they don't diverge later.
Summary
This draft PR adds Iceberg Geometry/Geography primitive type support by reusing arrow-rs/parquet-geospatial support instead of introducing a local geospatial model.
Related issues
Related to #2411 and #1884.