Pathrex is a Rust library and CLI tool for benchmarking queries on edge-labeled graphs constrained by regular languages and context-free languages. It uses SuiteSparse:GraphBLAS (via LAGraph) for sparse Boolean matrix operations and decomposes a graph by edge label into one Boolean adjacency matrix per label.
pathrex/
├── Cargo.toml # Crate manifest (edition 2024)
├── build.rs # Links LAGraph + LAGraphX; optionally regenerates FFI bindings
├── src/
│ ├── lib.rs # Modules: eval, formats, graph, rpq, sparql, utils, lagraph_sys
│ ├── main.rs # Binary entry point (placeholder)
│ ├── lagraph_sys.rs # FFI module — includes generated bindings
│ ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI)
│ ├── utils.rs # Public helpers: CountingBuilder, CountOutput, VecSource,
│ │ # grb_ok! and la_ok! macros, build_graph
│ ├── graph/
│ │ ├── mod.rs # Core traits (GraphBuilder, GraphDecomposition, GraphSource,
│ │ │ # Backend, Graph<B>), error types, RAII wrappers, GrB init
│ │ └── inmemory.rs # InMemory marker, InMemoryBuilder, InMemoryGraph
│ ├── eval/
│ │ └── mod.rs # Evaluator, PreparedEvaluator, ResultCount traits
│ ├── rpq/
│ │ ├── mod.rs # RPQ query types, RpqError, RPQ marker subtraits
│ │ ├── nfarpq.rs # NfaRpqEvaluator (LAGraph_RegularPathQuery)
│ │ └── rpqmatrix.rs # Matrix-plan RPQ evaluator
│ ├── sparql/
│ │ └── mod.rs # parse_rpq / extract_rpq → RpqQuery (spargebra)
│ └── formats/
│ ├── mod.rs # FormatError enum, re-exports
│ ├── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│ ├── mm.rs # MatrixMarket directory loader (vertices.txt, edges.txt, *.txt)
│ └── rdf.rs # Rdf — unified RDF parser (N-Triples, Turtle) → Edge iterator
├── tests/
│ ├── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
│ ├── mm_tests.rs # Integration tests for MatrixMarket format
│ ├── nfarpq_tests.rs # Integration tests for NfaRpqEvaluator
│ └── rpqmatrix_tests.rs # Integration tests for matrix-plan RPQ evaluator
├── pathrex-sys/deps/
│ └── LAGraph/ # Git submodule (SparseLinearAlgebra/LAGraph)
└── .github/workflows/ci.yml # CI: build GraphBLAS + LAGraph, cargo build & test
| Dependency | Purpose |
|---|---|
| cmake | Building GraphBLAS and LAGraph from source |
| git | Fetching pinned GraphBLAS source at build time |
| C/C++ toolchain | Compiling GraphBLAS and LAGraph (gcc or clang) |
| OpenMP runtime | Linked dynamically: libgomp on Linux, libomp on macOS, /openmp on MSVC |
| libclang-dev / clang | Required by bindgen when regenerate-bindings feature is active |
SuiteSparse:GraphBLAS no longer needs to be installed system-wide. It is
fetched, built statically, and linked into the binary by pathrex-sys/build.rs.
# Ensure the LAGraph submodule is present
git submodule update --init --recursive
# Build pathrex. First cold build clones GraphBLAS and runs cmake; takes
# ~2-10 minutes depending on core count. Subsequent builds reuse the
# GraphBLAS source tree under target/.../graphblas-src/ and the cmake
# build dir, so they are incremental.
cargo build
# Run tests (no LD_LIBRARY_PATH needed — everything but the OpenMP
# runtime is statically linked)
cargo test --workspacepathrex-sys/build.rs performs three jobs:
-
GraphBLAS fetch. Clones SuiteSparse:GraphBLAS at the pin defined by the
GRAPHBLAS_TAGconstant (currentlyv10.3.1) into$OUT_DIR/graphblas-src/viagit clone --depth=1 --branch <tag>. A sentinel file<dir>/.pathrex-fetchedcontaining the tag string marks a completed clone; if the pin is bumped, the sentinel mismatches and the clone is wiped and retried. If the directory exists without a sentinel (interrupted earlier clone), it is also wiped before retrying. -
Native build + linking. Drives cmake twice — once for GraphBLAS, once for the
pathrex-sys/deps/LAGraphsubmodule:- GraphBLAS flags:
BUILD_SHARED_LIBS=OFF,BUILD_STATIC_LIBS=ON,GRAPHBLAS_BUILD_STATIC_LIBS=ON(belt-and-braces),GRAPHBLAS_USE_JIT=OFF(no runtime C compiler required),GRAPHBLAS_COMPACT=OFF(full FactoryKernels for performance),GRAPHBLAS_USE_OPENMP=ON,GRAPHBLAS_USE_CUDA=OFF,SUITESPARSE_DEMOS=OFF,BUILD_TESTING=OFF,Releaseprofile. - LAGraph flags:
BUILD_SHARED_LIBS=OFF,BUILD_STATIC_LIBS=ON,BUILD_TESTING=OFF, plusCMAKE_PREFIX_PATH=<graphblas_install>andGRAPHBLAS_ROOT=<graphblas_install>so LAGraph'sfind_package(GraphBLAS)picks up our static build instead of any system one. - Static archives land in
$OUT_DIR/.../out/lib/(orlib64/on Fedora-family distros — both candidates are probed bypick_libdir). - Emits
cargo:rustc-link-lib=static=lagraphx,cargo:rustc-link-lib=static=lagraph,cargo:rustc-link-lib=static=graphblas. Order matters:lagraphxreferences symbols fromlagraph's utility module; both referencegraphblas. - Emits OS-specific runtime libraries:
gomp+pthread+dl+mon Linux,omp+pthreadon macOS, nothing explicit on MSVC. Override viaRUSTFLAGSif your toolchain ships a different OpenMP runtime (e.g.libompon Linux + clang).
- GraphBLAS flags:
-
docs.rs guard. If the
DOCS_RSenvironment variable is set, the entire native build is skipped. docs.rs sandboxes block all network access (so thegit clonewould fail) and have strict time/memory limits; rustdoc only needs to compile Rust code, not link or execute it. -
Optional FFI binding regeneration (feature
regenerate-bindings). When the feature is active,regenerate_bindings()runsbindgenagainstdeps/LAGraph/include/LAGraph.h,deps/LAGraph/include/LAGraphX.h, and the GraphBLAS install tree'sinclude/suitesparse/GraphBLAS.h. The generated Rust file is written topathrex-sys/src/lagraph_sys_generated.rs. Only a curated allowlist of GraphBLAS/LAGraph types and functions is exposed (see theallowlist_*calls inpathrex-sys/build.rs).
| Feature | Effect |
|---|---|
regenerate-bindings |
Runs bindgen at build time to regenerate src/lagraph_sys_generated.rs from LAGraph.h, LAGraphX.h (both from deps/LAGraph/include) and GraphBLAS.h. Without this feature the checked-in bindings are used as-is. |
The file src/lagraph_sys_generated.rs is checked into version control. CI
regenerates it with --features regenerate-bindings. Do not hand-edit this file.
Edge is the universal currency between format parsers and graph
builders: { source: String, target: String, label: String }.
GraphSource<B> is implemented by any data source that knows how to
feed itself into a specific [GraphBuilder]:
apply_to(self, builder: B) -> Result<B, B::Error>— consumes the source and returns the populated builder.
Csv<R>, MatrixMarket, and Rdf
implement GraphSource<InMemoryBuilder> (see src/graph/inmemory.rs), so they
can be passed to [GraphBuilder::load] and [Graph::try_from].
GraphBuilder accumulates edges and produces a
GraphDecomposition:
load<S: GraphSource<Self>>(self, source: S)— primary entry point; delegates toGraphSource::apply_to.build(self)— finalise into an immutable graph.
InMemoryBuilder also exposes lower-level helpers outside the trait:
push_edge(&mut self, edge: Edge)— ingest one edge.with_stream<I, E>(self, stream: I)— consume anIntoIterator<Item = Result<Edge, E>>.push_grb_matrix(&mut self, label, matrix: GrB_Matrix)— accept a pre-builtGrB_Matrixfor a label, wrapping it in anLAGraph_Graphimmediately.
Backend associates a marker type with a concrete builder/graph pair:
pub trait Backend {
type Graph: GraphDecomposition;
type Builder: GraphBuilder<Graph = Self::Graph>;
}Graph<B> is a zero-sized handle parameterised by a Backend:
Graph::<InMemory>::builder()— returns a freshInMemoryBuilder.Graph::<InMemory>::try_from(source)— builds a graph from a single source in one call.
InMemory is the concrete backend marker type.
GraphDecomposition is the read-only query interface:
get_graph(label)— returnsArc<LagraphGraph>for a given edge label.get_node_id(string_id)/get_node_name(mapped_id)— bidirectional string ↔ integer dictionary.num_nodes()— total unique nodes.
src/eval/mod.rs defines query-language-agnostic evaluator traits:
Evaluatoruses associated types forQuery,Result,Error, andPrepared. The graph backend stays a method-level generic (G: GraphDecomposition) so one evaluator type can run against any graph backend selected at the call site.PreparedEvaluatorrepresents prepared(query, graph)state that can be executed repeatedly, which is used by benchmark timing loops.ResultCountis separate fromEvaluator::Result; only CLI runners that need counts require this bound, leaving room for future evaluators with richer result types.
InMemoryBuilder is the primary GraphBuilder implementation.
It collects edges in RAM, then build() calls
GraphBLAS to create one GrB_Matrix per label via COO format, wraps each in an
LAGraph_Graph, and returns an InMemoryGraph.
Multiple CSV sources can be chained with repeated .load() calls; all edges are merged
into a single graph.
Node ID representation: Internally, InMemoryBuilder uses HashMap<usize, String> for
id_to_node (changed from Vec<String> to support sparse/pre-assigned IDs from MatrixMarket).
The set_node_map() method allows bulk-installing a node mapping,
which is used by the MatrixMarket loader.
Three built-in parsers are available, each yielding
Iterator<Item = Result<Edge, FormatError>> and pluggable into
GraphBuilder::load() via GraphSource<InMemoryBuilder> (see src/graph/inmemory.rs).
Csv<R> parses delimiter-separated edge files.
Configuration is via CsvConfig:
| Field | Default | Description |
|---|---|---|
source_column |
Index(0) |
Column for the source node (by index or name) |
target_column |
Index(1) |
Column for the target node |
label_column |
Index(2) |
Column for the edge label |
has_header |
true |
Whether the first row is a header |
delimiter |
b',' |
Field delimiter byte |
ColumnSpec is either Index(usize) or Name(String).
Name-based lookup requires has_header: true.
MatrixMarket loads an edge-labeled graph from a directory with:
vertices.txt— one line per node:<node_name> <1-based-index>on disk;get_node_idreturns the matching 0-based matrix indexedges.txt— one line per label:<label_name> <1-based-index>(selectsn.txt)<n>.txt— MatrixMarket adjacency matrix for label with indexn
Names in mapping files may be written with SPARQL-style angle brackets (e.g. <Article1>).
parse_index_map strips a single pair of surrounding </> so
dictionary keys match short labels (Article1), aligning with IRIs after
RpqQuery::strip_base on SPARQL-derived queries.
The loader uses LAGraph_MMRead to parse each .txt file into a
GrB_Matrix, then wraps it in an LAGraph_Graph. Vertex indices from vertices.txt are
converted to 0-based and installed via InMemoryBuilder::set_node_map().
Helper functions:
load_mm_file(path)— reads a single MatrixMarket file into aGrB_Matrix.parse_index_map(path)— parses<name> <index>lines; indices must be >= 1 and unique within the file.
MatrixMarket implements GraphSource<InMemoryBuilder> in src/graph/inmemory.rs (see the impl at line 215): vertices.txt maps are converted from 1-based file indices to 0-based matrix ids before set_node_map; edges.txt indices are unchanged for n.txt lookup.
Rdf is a unified parser for RDF formats using oxttl and oxrdf.
It supports both N-Triples (.nt) and Turtle (.ttl) formats via the RdfFormat enum.
Each triple (subject, predicate, object) becomes an Edge where:
source— subject IRI or blank-node ID (_:label).target— object IRI or blank-node ID; triples whose object is an RDF literal yieldErr(FormatError::LiteralAsNode)(callers may filter these out).label— full predicate IRI string (including fragment#…when present).
Constructor:
Rdf::from_path(path)— auto-detects format from file extension (.nt→ N-Triples,.ttl→ Turtle). Parses in parallel using memory-mapping and rayon.
Format detection via RdfFormat::from_path(path):
| Extension | Format |
|---|---|
.nt, .ntriples |
RdfFormat::NTriples |
.ttl, .turtle |
RdfFormat::Turtle |
Example usage:
use pathrex::formats::Rdf;
use pathrex::graph::{Graph, InMemory};
// Auto-detect from extension
let graph = Graph::<InMemory>::try_from(
Rdf::from_path("data.ttl")?
)?;The sparql module uses the spargebra
crate to parse SPARQL 1.1 query strings and build a pathrex-native RpqQuery
for RPQ evaluators.
Supported query form: SELECT queries with exactly one triple or property
path pattern in the WHERE clause. Relative IRIs such as <knows> require a
BASE declaration (or PREFIX / full IRIs). Example:
BASE <http://example.org/>
SELECT ?x ?y WHERE { ?x <knows>/<likes>* ?y . }Key public items:
parse_rpq(sparql)— parses a SPARQL string withSparqlParserand returns anRpqQuery.extract_rpq(query)— validates a parsed [spargebra::Query] is aSELECTwith a single path pattern and returns anRpqQuery. Use this when you construct a customSparqlParser(e.g. with prefix declarations) and callparse_queryyourself.ExtractError— error enum for extraction failures (NotSelect,NotSinglePath,UnsupportedSubject,UnsupportedObject,VariablePredicate). Converts toRpqError::Extractvia#[from].
Call RpqQuery::strip_base when graph edge labels are short names
and the parsed query contains full IRIs sharing a common prefix.
The module handles spargebra's desugaring of sequence paths (?x <a>/<b>/<c> ?y)
from a chain of BGP triples back into a single path expression.
The rpq module provides an abstraction for evaluating
Regular Path Queries (RPQs) over edge-labeled graphs using GraphBLAS/LAGraph.
Key public items:
Endpoint—Variable(String)orNamed(String)(IRI string).PathExpr—Label,Sequence,Alternative,ZeroOrMore,OneOrMore,ZeroOrOne.RpqQuery—{ subject, path, object }using the types above;strip_base(&mut self, base)removes a shared IRI prefix from named endpoints and labels.RpqEvaluator— marker subtrait overEvaluator<Query = RpqQuery, Error = RpqError>, preserving the RPQ-facing trait name while the generic evaluator hierarchy lives insrc/eval/.PreparedRpq— marker subtrait overPreparedEvaluator<Error = RpqError>.RpqError— unified error type for RPQ parsing and evaluation:Parse(SPARQL syntax),Extract(query extraction),UnsupportedPath,VertexNotFound, andGraph(wrapsGraphErrorfor label-not-found and GraphBLAS/LAGraph failures).
NfaRpqResult wraps a [GraphblasVector] of reachable target
vertices. When the subject is a variable, every vertex is used as a source and
LAGraph_RegularPathQuery returns the union of targets — individual (source, target)
pairs are not preserved.
RpqMatrixEvaluator compiles [PathExpr] into a Boolean matrix plan
over label adjacency matrices and runs [LAGraph_RPQMatrix]. It returns
RpqMatrixResult: the path-relation nnz plus a
[GraphblasMatrix] duplicate of the result matrix (full reachability relation for the path).
Subject/object do not filter the matrix; a named subject is only validated to exist.
Bound objects are not supported yet ([RpqError::UnsupportedPath]).
NTriples<R> parses W3C N-Triples
RDF files using oxttl and oxrdf. Each triple (subject, predicate, object) becomes an
Edge where:
source— subject IRI or blank-node ID (_:label).target— object IRI or blank-node ID; triples whose object is an RDF literal yieldErr(FormatError::LiteralAsNode)(callers may filter these out).label— full predicate IRI string (including fragment#…when present).
Constructor:
NTriples::new(reader)— parses the stream; each predicate IRI is copied verbatim to the edge label.
The rpq module provides an abstraction for evaluating
Regular Path Queries (RPQs) over edge-labeled graphs using GraphBLAS/LAGraph.
Key public items:
Endpoint—Variable(String)orNamed(String)(IRI string).PathExpr—Label,Sequence,Alternative,ZeroOrMore,OneOrMore,ZeroOrOne.RpqQuery—{ subject, path, object }using the types above;strip_base(&mut self, base)removes a shared IRI prefix from named endpoints and labels.RpqEvaluator— marker subtrait overEvaluator<Query = RpqQuery, Error = RpqError>, preserving the RPQ-facing trait name while the generic evaluator hierarchy lives insrc/eval/.PreparedRpq— marker subtrait overPreparedEvaluator<Error = RpqError>.RpqError— unified error type for RPQ parsing and evaluation:Parse(SPARQL syntax),Extract(query extraction),UnsupportedPath,VertexNotFound, andGraph(wrapsGraphErrorfor label-not-found and GraphBLAS/LAGraph failures).
NfaRpqEvaluator implements [RpqEvaluator] by:
- Converting a [
PathExpr] into anNfavia Thompson's construction (Nfa::from_path_expr()). - Eliminating ε-transitions via epsilon closure (
NfaBuilder::epsilon_closure()). - Building one
LAGraph_Graphper NFA label transition (Nfa::build_lagraph_matrices()). - Calling [
LAGraph_RegularPathQuery] with the NFA matrices, data-graph matrices, start/final states, and source vertices.
type Result = NfaRpqResult ([GraphblasVector] of reachable targets).
Supported path operators match [PathExpr] variants above. Reverse and
NegatedPropertySet from SPARQL map to [RpqError::UnsupportedPath] when they
appear in extracted paths.
Subject/object resolution: [Endpoint::Variable] means "all vertices";
[Endpoint::Named] resolves to a single vertex via
GraphDecomposition::get_node_id().
NfaRpqResult wraps a [GraphblasVector] of reachable target
vertices. When the subject is a variable, every vertex is used as a source and
LAGraph_RegularPathQuery returns the union of targets — individual (source, target)
pairs are not preserved.
RpqMatrixEvaluator compiles [PathExpr] into a Boolean matrix plan
over label adjacency matrices and runs [LAGraph_RPQMatrix]. It returns
RpqMatrixResult: the path-relation nnz plus a
[GraphblasMatrix] duplicate of the result matrix (full reachability relation for the path).
Subject/object do not filter the matrix; a named subject is only validated to exist.
Bound objects are not supported yet ([RpqError::UnsupportedPath]).
With the bench feature enabled, src/cli/dispatch.rs is the single
mapping from Algo variants to concrete evaluator types. dispatch_query
and dispatch_bench each perform one exhaustive match per requested algorithm, then call
generic runners (run_query_for_evaluator<E> and run_bench_for_evaluator<E>) that are
monomorphized for the selected evaluator.
Adding a new algorithm requires a new Algo variant, its Display arm, one dispatch_query
arm, one dispatch_bench arm, an impl Evaluator for the evaluator type, and an
impl ResultCount for any result type used by CLI count reporting.
lagraph_sys exposes raw C bindings for GraphBLAS and
LAGraph. Safe Rust wrappers live in graph::mod:
LagraphGraph— RAII wrapper aroundLAGraph_Graph(callsLAGraph_Deleteon drop). Also providesLagraphGraph::from_coo()to build directly from COO arrays.GraphblasVector— RAII wrapper aroundGrB_Vector(derivesDebug).GraphblasMatrix— RAII wrapper aroundGrB_Matrix(dup+freeon drop).ensure_grb_init()— internal one-timeLAGraph_Initviastd::sync::Once. Called automatically by RAII-wrapped constructors (LagraphGraph::from_coo,LagraphGraph::from_matrix,ThreadScope::enter) and byload_mm_file. Crate-private; no other code should call it.
Two #[macro_export] macros handle FFI error mapping:
grb_ok!(expr)— evaluates a GraphBLAS call insideunsafe, maps thei32return toResult<(), GraphError::GraphBlas(info)>.la_ok!(fn::path(args…))— evaluates a LAGraph call, automatically appending the required*mut i8message buffer, and maps failure toGraphError::LAGraph(info, msg).
A convenience function is also provided:
build_graph(edges)— builds anInMemoryGraphfrom a slice of(&str, &str, &str)triples (source, target, label). Used by integration tests.
- Rust edition 2024.
- Error handling via
thiserrorderive macros; three main error enums:GraphError,FormatError, andRpqError. FormatErrorconverts intoGraphErrorvia#[from] FormatErroron theGraphError::Formatvariant.GraphErrorconverts intoRpqErrorvia#[from] GraphErroron theRpqError::Graphvariant, enabling?propagation in evaluators.- Unsafe FFI calls are confined to
lagraph_sys,graph/mod.rs,graph/inmemory.rs,rpq/nfarpq.rs. All raw pointers are wrapped in RAII types that free resources on drop. unsafe impl Send + Syncis provided forLagraphGraph,GraphblasVector, andGraphblasMatrixbecause GraphBLAS handles are thread-safe after init.- Unit tests live in
#[cfg(test)] mod testsblocks inside each module. Integration tests that need GraphBLAS live intests/inmemory_tests.rs,tests/mm_tests.rs,tests/nfarpq_tests.rs.
# Run all tests (LAGraph installed system-wide)
LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose
# If LAGraph is NOT installed system-wide (only built in the submodule):
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test --verboseTests in src/graph/mod.rs use CountingBuilder / CountOutput / VecSource from
src/utils.rs — these do not call into GraphBLAS and run without
native libraries.
Tests in src/formats/csv.rs and src/formats/rdf.rs are pure Rust and need no native dependencies.
Tests in src/sparql/mod.rs are pure Rust and need no native dependencies.
Tests in src/rpq/nfarpq.rs (NFA construction unit tests) are pure Rust and need no
native dependencies.
Tests in src/graph/inmemory.rs, tests/inmemory_tests.rs,
tests/mm_tests.rs, tests/nfarpq_tests.rs,
and tests/rpqmatrix_tests.rs call real GraphBLAS/LAGraph and
require the native libraries to be present.
The GitHub Actions workflow (.github/workflows/ci.yml)
runs on every push and PR across stable, beta, and nightly toolchains:
- Checks out with
submodules: recursiveandlfs: true. - Installs
cmake,libclang-dev,clangvia apt. cargo build --workspace --features pathrex-sys/regenerate-bindings—pathrex-sys/build.rsclones GraphBLAS at the pinned tag, builds it statically, builds LAGraph statically against it, and regenerates FFI bindings.cargo test --workspace --verbose— runs the full test suite. NoLD_LIBRARY_PATHis needed because GraphBLAS and LAGraph are linked statically; only the OpenMP runtime (libgomp) is dynamic and is already on the default loader path.
Pushing a v*.*.* tag triggers .github/workflows/release.yml:
dockerjob — builds the image once and pushes the same tags ({version},{major}.{minor},{major},latest) to two registries:ghcr.io/sparselinearalgebra/pathrex(authenticated viaGITHUB_TOKEN, no extra setup needed).docker.io/vanyaglazunov/pathrex(authenticated viaDOCKERHUB_USERNAME+DOCKERHUB_TOKENrepository secrets).
publish-cratesjob — publishes both crates to crates.io in the correct order:- Reads
pathrex-sysversion viacargo metadata. - Skips publishing
pathrex-sysif that version is already on the registry (allows re-tagging when onlypathrexchanged). - Polls
cargo searchuntil the newpathrex-sysis indexed (up to 5 minutes), then publishespathrexagainst it. - Requires the
CARGO_REGISTRY_TOKENrepository secret.
- Reads
To verify the publish path without uploading:
cargo publish --dry-run -p pathrex-sys
# (pathrex dry-run requires pathrex-sys to be on crates.io first)