SedonaInfo is a Spark data source that reads raster file metadata without decoding pixel data, similar to gdalinfo. It returns one row per file with metadata including dimensions, coordinate system, band information, tiling, overviews, and compression.
This is useful for:
- Cataloging and inventorying large collections of raster files
- Detecting Cloud Optimized GeoTIFFs (COGs) by checking tiling and overview status
- Inspecting file properties before loading full raster data
- Building spatial indexes over raster file collections
Currently supports GeoTIFF files. Additional formats can be added in the future.
=== "Scala"
```scala
val df = sedona.read.format("sedonainfo").load("/path/to/rasters/")
df.show()
```
=== "Java"
```java
Dataset<Row> df = sedona.read().format("sedonainfo").load("/path/to/rasters/");
df.show();
```
=== "Python"
```python
df = sedona.read.format("sedonainfo").load("/path/to/rasters/")
df.show()
```
You can also use glob patterns:
df = sedona.read.format("sedonainfo").load("/path/to/rasters/*.tif")Or load a single file:
df = sedona.read.format("sedonainfo").load("/path/to/image.tiff")Each row represents one raster file with the following columns:
| Column | Type | Description |
|---|---|---|
path |
String | File path |
driver |
String | Format driver (e.g., "GTiff") |
fileSize |
Long | File size in bytes |
width |
Int | Image width in pixels |
height |
Int | Image height in pixels |
numBands |
Int | Number of bands |
srid |
Int | EPSG code (0 if unknown) |
crs |
String | Coordinate Reference System as WKT |
geoTransform |
Struct | Affine transform parameters |
cornerCoordinates |
Struct | Bounding box |
bands |
Array[Struct] | Per-band metadata |
overviews |
Array[Struct] | Overview (pyramid) levels |
metadata |
Map[String, String] | File-wide TIFF metadata tags |
isTiled |
Boolean | Whether the file uses internal tiling |
compression |
String | Compression type (e.g., "Deflate") |
| Field | Type | Description |
|---|---|---|
upperLeftX |
Double | Origin X in world coordinates |
upperLeftY |
Double | Origin Y in world coordinates |
scaleX |
Double | Pixel size in X direction |
scaleY |
Double | Pixel size in Y direction |
skewX |
Double | Rotation/shear in X |
skewY |
Double | Rotation/shear in Y |
| Field | Type | Description |
|---|---|---|
minX |
Double | Minimum X (west) |
minY |
Double | Minimum Y (south) |
maxX |
Double | Maximum X (east) |
maxY |
Double | Maximum Y (north) |
| Field | Type | Description |
|---|---|---|
band |
Int | Band number (1-indexed) |
dataType |
String | Data type (e.g., "REAL_32BITS") |
colorInterpretation |
String | Color interpretation (e.g., "Gray") |
noDataValue |
Double | NoData value (null if not set) |
blockWidth |
Int | Internal tile/block width |
blockHeight |
Int | Internal tile/block height |
description |
String | Band description |
unit |
String | Unit type (e.g., "meters") |
| Field | Type | Description |
|---|---|---|
level |
Int | Overview level (1, 2, 3, ...) |
width |
Int | Overview width in pixels |
height |
Int | Overview height in pixels |
A COG is a GeoTIFF that is internally tiled and has overview levels:
df = sedona.read.format("sedonainfo").load("/path/to/rasters/")
cogs = df.filter("isTiled AND size(overviews) > 0")
cogs.select("path", "compression", "overviews").show(truncate=False)df = sedona.read.format("sedonainfo").load("/path/to/image.tif")
df.selectExpr("path", "explode(bands) as band").selectExpr(
"path",
"band.band",
"band.dataType",
"band.noDataValue",
"band.blockWidth",
"band.blockHeight",
).show()df = sedona.read.format("sedonainfo").load("/path/to/rasters/")
df.filter("cornerCoordinates.minX > -120 AND cornerCoordinates.maxX < -100").select(
"path", "width", "height", "srid"
).show()df = sedona.read.format("sedonainfo").load("/path/to/image.tif")
df.selectExpr("path", "explode(overviews) as ovr").selectExpr(
"path", "ovr.level", "ovr.width", "ovr.height"
).show()Select only the columns you need. SedonaInfo uses column pruning to skip extracting unused metadata:
df = (
sedona.read.format("sedonainfo")
.load("/path/to/rasters/")
.select("path", "width", "height", "numBands")
)
df.show()| Format | Driver | Extensions |
|---|---|---|
| GeoTIFF | GTiff | .tif, .tiff |
Additional formats may be added in future releases.