[GH-2230] Add GeoSeries.shortest_line and GeoSeries.offset_curve#2828
[GH-2230] Add GeoSeries.shortest_line and GeoSeries.offset_curve#2828jiayuasu wants to merge 2 commits intoapache:masterfrom
Conversation
Implement GeoPandas counterparts for ST_ShortestLine and ST_OffsetCurve: - shortest_line: binary operation returning the shortest line between two geometries, delegating to ST_ShortestLine - offset_curve: unary operation returning a line at a given offset distance from a linear geometry, delegating to ST_OffsetCurve. Handles empty geometries by returning LINESTRING EMPTY to match GeoPandas behavior. Add tests in both test_geoseries.py and test_match_geopandas_series.py.
There was a problem hiding this comment.
Pull request overview
Adds GeoPandas-compatible GeoSeries operations for computing (1) offset curves for linear geometries and (2) shortest connecting lines between geometries, implemented in the Sedona GeoPandas-on-Spark layer and validated against native GeoPandas behavior.
Changes:
- Implement
GeoSeries.offset_curve(...)backed byST_OffsetCurve, including special-casing empty inputs to returnLINESTRING EMPTY. - Implement
GeoSeries.shortest_line(other, align=None)backed byST_ShortestLine, supporting GeoSeries-to-GeoSeries and GeoSeries-to-scalar geometry usage. - Add unit and compatibility tests for both methods.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/sedona/spark/geopandas/geoseries.py |
Adds the concrete Spark-backed implementations for offset_curve and shortest_line. |
python/sedona/spark/geopandas/base.py |
Exposes the APIs on the base class with GeoPandas-style docstrings and delegation for GeoSeries/GeoDataFrame. |
python/tests/geopandas/test_geoseries.py |
Adds focused functional tests for the new methods, including GeoDataFrame delegation. |
python/tests/geopandas/test_match_geopandas_series.py |
Adds cross-implementation comparison tests vs native GeoPandas for both methods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Addressed all review comments in 647a25c:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ST_OffsetCurve returns null for empty geometries, but GeoPandas returns LINESTRING EMPTY | ||
| empty_line = stc.ST_GeomFromText(F.lit("LINESTRING EMPTY")) | ||
| spark_col = F.when( | ||
| stf.ST_IsEmpty(self.spark.column), | ||
| empty_line, | ||
| ).otherwise(stf.ST_OffsetCurve(self.spark.column, distance, quad_segs)) |
There was a problem hiding this comment.
offset_curve replaces empty input geometries with ST_GeomFromText('LINESTRING EMPTY'), which will have SRID=0. If the input GeoSeries has CRS set (via SRID), this can silently drop CRS for results (e.g., if the first non-null row is empty, GeoSeries.crs will become None). Consider constructing the empty geometry with the input SRID (e.g., pass ST_SRID(self.spark.column) as the srid arg to ST_GeomFromText, or ST_SetSRID on the empty geometry) so CRS/SRID is preserved even for empty rows.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Part of [EPIC] Implement More Geopandas Functions #2230What changes were proposed in this PR?
Implement GeoPandas counterparts for
ST_ShortestLineandST_OffsetCurve:GeoSeries.shortest_line(other, align=None): binary operation returning the shortest line between two geometries, delegating toST_ShortestLine. Supports GeoSeries-to-GeoSeries and GeoSeries-to-single-geometry operations.GeoSeries.offset_curve(distance, quad_segs=8, join_style="round", mitre_limit=5.0): unary operation returning a line at a given offset distance from a linear geometry, delegating toST_OffsetCurve. Handles empty geometries by returningLINESTRING EMPTYto match GeoPandas behavior (Sedona'sST_OffsetCurvereturns null for empty inputs).Both functions are implemented in
geoseries.py(actual logic) andbase.py(docstrings + delegation), following the existing patterns for binary and unary operations respectively.How was this patch tested?
test_geoseries.pyfor both functions, covering GeoSeries-vs-GeoSeries, GeoSeries-vs-single-geometry, and GeoDataFrame delegation.test_match_geopandas_series.pyfor both functions, comparing Sedona GeoSeries output against native GeoPandas output across all geometry types.Did this PR include necessary documentation updates?
base.pyimplementations.