Skip to content

[GH-2230] Add GeoSeries.shortest_line and GeoSeries.offset_curve#2828

Open
jiayuasu wants to merge 2 commits intoapache:masterfrom
jiayuasu:feature/geopandas-shortest-line-offset-curve
Open

[GH-2230] Add GeoSeries.shortest_line and GeoSeries.offset_curve#2828
jiayuasu wants to merge 2 commits intoapache:masterfrom
jiayuasu:feature/geopandas-shortest-line-offset-curve

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented Apr 7, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Implement GeoPandas counterparts for ST_ShortestLine and ST_OffsetCurve:

  • GeoSeries.shortest_line(other, align=None): binary operation returning the shortest line between two geometries, delegating to ST_ShortestLine. Supports GeoSeries-to-GeoSeries and GeoSeries-to-single-geometry operations.
  • GeoSeries.offset_curve(distance, quad_segs=8, join_style="round", mitre_limit=5.0): unary operation returning a line at a given offset distance from a linear geometry, delegating to ST_OffsetCurve. Handles empty geometries by returning LINESTRING EMPTY to match GeoPandas behavior (Sedona's ST_OffsetCurve returns null for empty inputs).

Both functions are implemented in geoseries.py (actual logic) and base.py (docstrings + delegation), following the existing patterns for binary and unary operations respectively.

How was this patch tested?

  • Added tests in test_geoseries.py for both functions, covering GeoSeries-vs-GeoSeries, GeoSeries-vs-single-geometry, and GeoDataFrame delegation.
  • Added tests in test_match_geopandas_series.py for both functions, comparing Sedona GeoSeries output against native GeoPandas output across all geometry types.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation. Docstrings are included in the base.py implementations.

Implement GeoPandas counterparts for ST_ShortestLine and ST_OffsetCurve:

- shortest_line: binary operation returning the shortest line between
  two geometries, delegating to ST_ShortestLine
- offset_curve: unary operation returning a line at a given offset
  distance from a linear geometry, delegating to ST_OffsetCurve.
  Handles empty geometries by returning LINESTRING EMPTY to match
  GeoPandas behavior.

Add tests in both test_geoseries.py and test_match_geopandas_series.py.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GeoPandas-compatible GeoSeries operations for computing (1) offset curves for linear geometries and (2) shortest connecting lines between geometries, implemented in the Sedona GeoPandas-on-Spark layer and validated against native GeoPandas behavior.

Changes:

  • Implement GeoSeries.offset_curve(...) backed by ST_OffsetCurve, including special-casing empty inputs to return LINESTRING EMPTY.
  • Implement GeoSeries.shortest_line(other, align=None) backed by ST_ShortestLine, supporting GeoSeries-to-GeoSeries and GeoSeries-to-scalar geometry usage.
  • Add unit and compatibility tests for both methods.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
python/sedona/spark/geopandas/geoseries.py Adds the concrete Spark-backed implementations for offset_curve and shortest_line.
python/sedona/spark/geopandas/base.py Exposes the APIs on the base class with GeoPandas-style docstrings and delegation for GeoSeries/GeoDataFrame.
python/tests/geopandas/test_geoseries.py Adds focused functional tests for the new methods, including GeoDataFrame delegation.
python/tests/geopandas/test_match_geopandas_series.py Adds cross-implementation comparison tests vs native GeoPandas for both methods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu
Copy link
Copy Markdown
Member Author

jiayuasu commented Apr 7, 2026

Addressed all review comments in 647a25c:

  1. geoseries.py:1064 — Scalar-only distance is correct; GeoPandas offset_curve also only accepts scalar. Updated the docstring accordingly.
  2. base.py:1006 — Fixed docstring from float or array-like to float.
  3. base.py:1015 — Already documented: the join_style/mitre_limit compatibility note was present in the docstring.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1061 to +1066
# ST_OffsetCurve returns null for empty geometries, but GeoPandas returns LINESTRING EMPTY
empty_line = stc.ST_GeomFromText(F.lit("LINESTRING EMPTY"))
spark_col = F.when(
stf.ST_IsEmpty(self.spark.column),
empty_line,
).otherwise(stf.ST_OffsetCurve(self.spark.column, distance, quad_segs))
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset_curve replaces empty input geometries with ST_GeomFromText('LINESTRING EMPTY'), which will have SRID=0. If the input GeoSeries has CRS set (via SRID), this can silently drop CRS for results (e.g., if the first non-null row is empty, GeoSeries.crs will become None). Consider constructing the empty geometry with the input SRID (e.g., pass ST_SRID(self.spark.column) as the srid arg to ST_GeomFromText, or ST_SetSRID on the empty geometry) so CRS/SRID is preserved even for empty rows.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants