-
Notifications
You must be signed in to change notification settings - Fork 8
feat: GeoArrow export — GEOGRAPHY/GEOMETRY as geoarrow.wkb #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -548,7 +548,7 @@ func (st *statement) ExecuteQuery(ctx context.Context) (reader array.RecordReade | |
| return nil, err | ||
| } | ||
|
|
||
| reader, err = newRecordReader(ctx, st.alloc, loader, st.queueSize, st.prefetchConcurrency, st.useHighPrecision, st.maxTimestampPrecision) | ||
| reader, err = newRecordReader(ctx, st.alloc, loader, st.queueSize, st.prefetchConcurrency, st.useHighPrecision, st.maxTimestampPrecision, nil) | ||
| return reader, err | ||
| }, | ||
| currentBatch: st.bound, | ||
|
|
@@ -566,14 +566,20 @@ func (st *statement) ExecuteQuery(ctx context.Context) (reader array.RecordReade | |
| return | ||
| } | ||
|
|
||
| // Detect geo columns before executing the query. For table scans, | ||
| // try to extract the table name and run DESCRIBE TABLE to identify | ||
| // GEOGRAPHY/GEOMETRY columns (catalog metadata is unaffected by WKB output format). | ||
| // TODO: Support arbitrary queries — currently only table scans get geoarrow metadata. | ||
| geoCols := st.cnxn.detectGeoColumnsFromQuery(ctx, st.query) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make this opt-in. It would also be good to document it in snowflake.md, and add validation cases for geometry/geography.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I don’t know. It really feels this should be fixed upstream, and I agree this is too much… yeah affecting the entire driver for this issue is too much. Maybe this whole thing should not be fixed until snowflake fixes upstream. The workaround is tooo bad. |
||
|
|
||
| var loader gosnowflake.ArrowStreamLoader | ||
| loader, err = st.cnxn.cn.QueryArrowStream(ctx, st.query) | ||
| if err != nil { | ||
| err = errToAdbcErr(adbc.StatusInternal, err) | ||
| return | ||
| } | ||
|
|
||
| reader, err = newRecordReader(ctx, st.alloc, loader, st.queueSize, st.prefetchConcurrency, st.useHighPrecision, st.maxTimestampPrecision) | ||
| reader, err = newRecordReader(ctx, st.alloc, loader, st.queueSize, st.prefetchConcurrency, st.useHighPrecision, st.maxTimestampPrecision, geoCols) | ||
| nRows = loader.TotalRows() | ||
| return | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if a user explicitly sets this to something that is not WKB, we should avoid adding the GeoArrow type and metadata, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an interesting point. Snowflake I believe is the only engine that allows you to configure how the output of geo data types are encoded.
My feeling is that if the query or table is returning a native geometry or geography type we should always encode as WKB and mark it as geoarrow. It is not that you can have gearrow encoded on different formats (well you can with WKB vs native arrow objects but that’s another thing). The goal is to preserve the type so I don’t think it makes sense to change that.
But if the user does st_astext(geom) then it becomes text and it should be retuned as that. So essentially behaving like everybody else. You use functions to specify output formats (as_text, as_wkb, as_geojson,as_gml…).
So since this is adbc I would say no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words. We ignore because we override whatever output format the user has specified.