Skip to content

Latest commit

 

History

History
2074 lines (1630 loc) · 63.2 KB

File metadata and controls

2074 lines (1630 loc) · 63.2 KB

Natch: FINE Wrapper Implementation Plan

Last Updated: 2025-11-09 Status: ✅ Phases 1-6 Complete + v0.2.0 Released + Memory Locality Optimizations Timeline: MVP achieved in ~1 hour, Production-ready v0.2.0 with precompiled binaries published


Executive Summary

This document outlines a plan to wrap the clickhouse-cpp library using FINE (Foreign Interface Native Extensions) to provide native TCP protocol access to ClickHouse from Elixir. This approach delivers 51% faster inserts and 53% less bandwidth compared to HTTP while maintaining feature completeness through the mature clickhouse-cpp library.

Why FINE + clickhouse-cpp?

Pros:

  • ✅ Native protocol performance (51% faster than HTTP)
  • ✅ Full feature completeness immediately (leverage clickhouse-cpp)
  • ✅ Automatic type marshalling (FINE handles FFI complexity)
  • ✅ 4-6 weeks to production vs 4-6 months for pure Elixir
  • ✅ Native C++ speed for serialization/deserialization

Cons:

  • ⚠️ C++ build dependencies (cmake, C++17 compiler)
  • ⚠️ Cross-language debugging complexity
  • ⚠️ Platform-specific compilation required
  • ⚠️ Ongoing maintenance tracking upstream changes

Architecture Overview

┌─────────────────────────────────────────────┐
│  Elixir Application                         │
│  - Natch.query/3, Natch.insert/3             │
│  - Idiomatic Elixir API                     │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  Elixir Management Layer                    │
│  - Natch.Connection GenServer                │
│  - Connection pooling                       │
│  - Resource lifecycle                       │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  FINE NIF Layer (C++)                       │
│  - Type conversion Elixir ↔ C++             │
│  - Resource management                      │
│  - Exception handling                       │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  clickhouse-cpp Library                     │
│  - Native TCP protocol                      │
│  - Binary columnar format                   │
│  - Compression (LZ4/ZSTD)                   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  ClickHouse Server                          │
│  - Native protocol interface (9000)         │
└─────────────────────────────────────────────┘

✅ Phase 1: Foundation (COMPLETED)

Goal: Basic client lifecycle and connection management Status: ✅ Complete - 10 tests passing

Deliverables

  1. Project Setup

    • Add FINE dependency to mix.exs
    • Set up C++ build infrastructure (CMake/Makefile)
    • Configure clickhouse-cpp as git submodule or vendored dependency
    • Create native/natch_fine/ directory structure
  2. Client Resource Wrapper

    // native/natch_fine/client.cpp
    #include <fine.hpp>
    #include <clickhouse/client.h>
    
    using namespace clickhouse;
    
    FINE_RESOURCE(Client);
    
    fine::ResourcePtr<Client> client_create(
        ErlNifEnv *env,
        std::string host,
        int port) {
      ClientOptions opts;
      opts.SetHost(host);
      opts.SetPort(port);
      return fine::make_resource<Client>(opts);
    }
    FINE_NIF(client_create, 0);
    
    void client_ping(
        ErlNifEnv *env,
        fine::ResourcePtr<Client> client) {
      client->Ping();
    }
    FINE_NIF(client_ping, 0);
    
    void client_execute(
        ErlNifEnv *env,
        fine::ResourcePtr<Client> client,
        std::string sql) {
      client->Execute(sql);
    }
    FINE_NIF(client_execute, 0);
    
    FINE_INIT("Elixir.Natch.Native");
  3. Elixir Connection Module

    defmodule Natch.Connection do
      use GenServer
    
      def start_link(opts) do
        GenServer.start_link(__MODULE__, opts)
      end
    
      def init(opts) do
        host = Keyword.get(opts, :host, "localhost")
        port = Keyword.get(opts, :port, 9000)
    
        case Natch.Native.client_create(host, port) do
          {:ok, client} ->
            {:ok, %{client: client, opts: opts}}
          {:error, reason} ->
            {:stop, reason}
        end
      end
    
      def ping(conn) do
        GenServer.call(conn, :ping)
      end
    
      def execute(conn, sql) do
        GenServer.call(conn, {:execute, sql})
      end
    
      def handle_call(:ping, _from, state) do
        Natch.Native.client_ping(state.client)
        {:reply, :ok, state}
      end
    
      def handle_call({:execute, sql}, _from, state) do
        case Natch.Native.client_execute(state.client, sql) do
          :ok -> {:reply, :ok, state}
          {:error, reason} -> {:reply, {:error, reason}, state}
        end
      end
    end

Testing

  • Connection establishment
  • Ping works
  • Simple DDL operations (CREATE TABLE, DROP TABLE)
  • Error handling for connection failures

Success Criteria: Can connect, ping, and execute DDL statements


✅ Phase 2: Type System & Column Creation (COMPLETED)

Goal: Handle ClickHouse type system and column creation Status: ✅ Complete - 33 tests passing (5 core types: UInt64, Int64, String, Float64, DateTime)

Key Challenge

ClickHouse has 42+ column types. We need to:

  1. Parse type strings (e.g., "Array(Nullable(String))")
  2. Create appropriate Column objects dynamically
  3. Populate columns from Elixir data

Approach: Leverage clickhouse-cpp's Factory

// native/natch_fine/columns.cpp
#include <fine.hpp>
#include <clickhouse/columns/factory.h>

FINE_RESOURCE(clickhouse::Column);

fine::ResourcePtr<clickhouse::Column> column_create(
    ErlNifEnv *env,
    std::string type_name) {
  auto col = clickhouse::CreateColumnByType(type_name);
  return fine::make_resource_from_ptr(col);
}
FINE_NIF(column_create, 0);

Column Population Strategy

Option A: Type-Specific NIFs (Recommended for MVP)

// Separate NIF for each common type
void column_uint64_append(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Column> col,
    uint64_t value) {
  auto typed = std::static_pointer_cast<clickhouse::ColumnUInt64>(col);
  typed->Append(value);
}
FINE_NIF(column_uint64_append, 0);

void column_string_append(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Column> col,
    std::string value) {
  auto typed = std::static_pointer_cast<clickhouse::ColumnString>(col);
  typed->Append(value);
}
FINE_NIF(column_string_append, 0);

Option B: Generic Variant-Based (More complex, defer to Phase 4)

using ElixirValue = std::variant<
  int64_t, uint64_t, double, std::string, bool,
  std::nullptr_t, std::vector<ElixirValue>
>;

void column_append_value(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Column> col,
    std::string type_name,
    ElixirValue value) {
  // Dispatch based on type_name
  if (type_name == "UInt64") {
    auto typed = std::static_pointer_cast<clickhouse::ColumnUInt64>(col);
    typed->Append(std::get<uint64_t>(value));
  } else if (type_name == "String") {
    // ...
  }
  // etc.
}

Deliverables

  1. Type Parser Integration

    • Wrap CreateColumnByType
    • Return opaque Column resource
  2. Core Type Support (start with 5 essential types)

    • UInt64, Int64
    • String
    • Float64
    • DateTime
  3. Column Builders in Elixir

    defmodule Natch.Column do
      def new(type) do
        Natch.Native.column_create(type)
      end
    
      def append(%{type: :uint64} = col, value) when is_integer(value) do
        Natch.Native.column_uint64_append(col.ref, value)
      end
    
      def append(%{type: :string} = col, value) when is_binary(value) do
        Natch.Native.column_string_append(col.ref, value)
      end
    end

Testing

  • Create columns of different types
  • Append values
  • Type safety (wrong type should error)

Success Criteria: Can create and populate basic column types


✅ Phase 3: Block Building & INSERT (COMPLETED)

Goal: Build blocks from Elixir data and insert into ClickHouse Status: ✅ Complete - 17 tests passing

Block Resource

// native/natch_fine/block.cpp
#include <fine.hpp>
#include <clickhouse/block.h>

FINE_RESOURCE(clickhouse::Block);

fine::ResourcePtr<clickhouse::Block> block_create(ErlNifEnv *env) {
  return fine::make_resource<clickhouse::Block>();
}
FINE_NIF(block_create, 0);

void block_append_column(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Block> block,
    std::string name,
    fine::ResourcePtr<clickhouse::Column> col) {
  block->AppendColumn(name, col);  // Need to convert to shared_ptr
}
FINE_NIF(block_append_column, 0);

uint64_t block_row_count(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Block> block) {
  return block->GetRowCount();
}
FINE_NIF(block_row_count, 0);

void client_insert(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Client> client,
    std::string table_name,
    fine::ResourcePtr<clickhouse::Block> block) {
  client->Insert(table_name, *block);
}
FINE_NIF(client_insert, 0);

Smart Pointer Challenge

Problem: clickhouse-cpp uses std::shared_ptr<Column> but FINE uses ResourcePtr<Column>

Solution: Wrapper or conversion function

// Helper to convert ResourcePtr to shared_ptr
std::shared_ptr<clickhouse::Column> resource_to_shared(
    fine::ResourcePtr<clickhouse::Column> res) {
  // Option 1: Store shared_ptr inside ResourcePtr wrapper
  // Option 2: Create new shared_ptr (may need ref counting coordination)
  return std::shared_ptr<clickhouse::Column>(res.get(), [](auto*){});
}

void block_append_column(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Block> block,
    std::string name,
    fine::ResourcePtr<clickhouse::Column> col) {
  auto shared_col = resource_to_shared(col);
  block->AppendColumn(name, shared_col);
}

Elixir Builder Pattern

defmodule Natch.Insert do
  def build_block(table_schema, rows) do
    block = Natch.Native.block_create()

    # Create columns based on schema
    columns = for {name, type} <- table_schema do
      {name, Natch.Column.new(type)}
    end

    # Populate columns from rows
    for row <- rows do
      for {name, col} <- columns do
        value = Map.get(row, name)
        Natch.Column.append(col, value)
      end
    end

    # Attach columns to block
    for {name, col} <- columns do
      Natch.Native.block_append_column(block, to_string(name), col.ref)
    end

    block
  end

  def insert(conn, table, rows, schema) do
    block = build_block(schema, rows)
    GenServer.call(conn, {:insert, table, block})
  end
end

Schema Inference

Two approaches:

A. Explicit Schema (MVP)

schema = [
  {:id, :uint64},
  {:name, :string},
  {:amount, :float64}
]

Natch.Insert.insert(conn, "test_table", rows, schema)

B. Schema Introspection (Phase 4)

# Query ClickHouse for table schema
schema = Natch.get_table_schema(conn, "test_table")
Natch.Insert.insert(conn, "test_table", rows, schema)

Testing

  • Build block from Elixir data
  • Insert into ClickHouse
  • Verify data with SELECT
  • Batch inserts (1k, 10k, 100k rows)

Success Criteria: Can insert data from Elixir maps into ClickHouse


✅ Phase 4: SELECT & Data Retrieval (COMPLETED)

Goal: Execute SELECT queries and return results to Elixir Status: ✅ Complete - 12 tests passing

Challenge: Callback Bridge

clickhouse-cpp uses callbacks for SELECT:

void Select(const std::string& query, SelectCallback cb);
// SelectCallback = std::function<void(const Block&)>

We need to bridge this to Elixir.

Approach: Accumulate Results

// native/natch_fine/select.cpp
std::vector<fine::ResourcePtr<clickhouse::Block>> client_select(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Client> client,
    std::string query) {

  std::vector<fine::ResourcePtr<clickhouse::Block>> results;

  client->Select(query, [&](const clickhouse::Block& block) {
    // Copy block and wrap in ResourcePtr
    auto block_copy = std::make_shared<clickhouse::Block>(block);
    results.push_back(fine::make_resource_from_ptr(block_copy));
  });

  return results;  // FINE will convert vector to Elixir list
}
FINE_NIF(client_select, 0);

Block to Elixir Conversion

// native/natch_fine/convert.cpp
std::vector<std::map<std::string, ElixirValue>> block_to_maps(
    ErlNifEnv *env,
    fine::ResourcePtr<clickhouse::Block> block) {

  std::vector<std::map<std::string, ElixirValue>> rows;
  size_t row_count = block->GetRowCount();

  for (size_t row = 0; row < row_count; ++row) {
    std::map<std::string, ElixirValue> row_map;

    for (size_t col = 0; col < block->GetColumnCount(); ++col) {
      auto col_name = block->GetColumnName(col);
      auto column = block->GetColumn(col);

      // Extract value based on column type
      // This requires type dispatch logic
      row_map[col_name] = extract_value(column, row);
    }

    rows.push_back(row_map);
  }

  return rows;  // FINE converts to list of Elixir maps
}
FINE_NIF(block_to_maps, 0);

Type Extraction

ElixirValue extract_value(
    std::shared_ptr<clickhouse::Column> col,
    size_t row) {

  auto type = col->Type();

  if (type->GetCode() == clickhouse::Type::UInt64) {
    auto typed = std::static_pointer_cast<clickhouse::ColumnUInt64>(col);
    return (*typed)[row];
  } else if (type->GetCode() == clickhouse::Type::String) {
    auto typed = std::static_pointer_cast<clickhouse::ColumnString>(col);
    return std::string((*typed)[row]);
  }
  // ... handle all types

  throw std::runtime_error("Unsupported type");
}

Elixir Query API

defmodule Natch do
  def query(conn, sql) do
    case GenServer.call(conn, {:select, sql}, :infinity) do
      {:ok, blocks} ->
        rows = blocks
        |> Enum.flat_map(&Natch.Native.block_to_maps/1)
        {:ok, rows}
      error ->
        error
    end
  end

  def query!(conn, sql) do
    case query(conn, sql) do
      {:ok, rows} -> rows
      {:error, reason} -> raise "Query failed: #{inspect(reason)}"
    end
  end
end

Testing

  • Simple SELECT queries
  • Multiple column types
  • Large result sets
  • JOIN queries
  • Aggregations

Success Criteria: Can execute SELECT and get Elixir maps back


Phase 5: Columnar API & Performance Optimization (CRITICAL - In Progress)

Goal: Redesign insert API for columnar format to match ClickHouse native storage and eliminate performance impedance mismatches Status: 🔄 In Progress Priority: CRITICAL - Current row-oriented API has 10-1000x performance penalty

Performance Problem Discovered

Current Implementation Issue: The row-oriented insert API has a severe performance impedance mismatch:

# Current API (row-oriented)
rows = [
  %{id: 1, name: "Alice", value: 100.0},
  %{id: 2, name: "Bob", value: 200.0},
  # ... 100 rows
]
Natch.insert(conn, "table", rows, schema)  # 100 rows × 100 columns = 10,000 NIF calls!

Performance Analysis:

  • For N rows × M columns, current implementation makes N × M NIF boundary crossings
  • Each Column.append/2 call crosses Elixir → C++ → Elixir boundary
  • 100 rows × 100 columns = 10,000 NIF calls (!)
  • Row-to-column transposition happens in Elixir with map lookups per cell
  • Significant overhead for bulk analytics workloads

Why This Matters:

  • ClickHouse is a columnar database - data stored by column, not row
  • Analytics workloads are columnar by nature (SUM, AVG, GROUP BY operate on columns)
  • Other high-performance tools (Arrow, Parquet, DuckDB, Polars) use columnar formats
  • Native protocol performance benefits are negated by API mismatch

Solution: Columnar-First API

New Design Principles:

  1. Columnar as primary API - matches ClickHouse native storage
  2. Bulk operations - 1 NIF call per column (not per value)
  3. Zero transposition - data already in correct format
  4. Performance-obsessed - designed for 100k-1M rows/sec throughput

New API:

# Columnar format (RECOMMENDED)
columns = %{
  id: [1, 2, 3, 4, 5],
  name: ["Alice", "Bob", "Charlie", "Dave", "Eve"],
  value: [100.0, 200.0, 300.0, 400.0, 500.0],
  timestamp: [~U[2024-01-01 10:00:00Z], ~U[2024-01-01 11:00:00Z], ...]
}

schema = [
  id: :uint64,
  name: :string,
  value: :float64,
  timestamp: :datetime
]

Natch.insert(conn, "events", columns, schema)
# Only 4 NIF calls (1 per column) for ANY number of rows!

Performance Improvement:

  • 100 rows × 100 columns: 10,000 NIF calls → 100 NIF calls (100x improvement)
  • Better memory locality (all values for one column together)
  • Matches ClickHouse's native columnar format
  • Vectorization opportunities in C++

Bulk Append NIFs

Replace single-value appends with bulk operations:

// Bulk append - single NIF call for entire column
fine::Atom column_uint64_append_bulk(
    ErlNifEnv *env,
    fine::ResourcePtr<ColumnResource> col_res,
    std::vector<uint64_t> values) {
  try {
    auto typed = std::static_pointer_cast<ColumnUInt64>(col_res->ptr);
    for (auto value : values) {
      typed->Append(value);
    }
    return fine::Atom("ok");
  } catch (const std::exception& e) {
    throw std::runtime_error("UInt64 bulk append failed");
  }
}
FINE_NIF(column_uint64_append_bulk, 0);

// Similar for: int64, string, float64, datetime

FINE Advantage: Automatically converts Elixir lists to std::vector<T> - zero-copy where possible.

Conversion Utilities

For users with row-oriented data sources:

# Helper module for format conversion
defmodule Natch.Conversion do
  def rows_to_columns(rows, schema) do
    # Transpose row-oriented to column-oriented
    for {name, _type} <- schema, into: %{} do
      values = Enum.map(rows, & &1[name])
      {name, values}
    end
  end

  def columns_to_rows(columns, schema) do
    # Convert columnar back to row format (useful for testing)
  end
end

# Usage
rows = [%{id: 1, name: "Alice"}, %{id: 2, name: "Bob"}]
columns = Natch.Conversion.rows_to_columns(rows, schema)
Natch.insert(conn, "users", columns, schema)

Type Safety: Validation happens automatically in Natch.Column.append_bulk/2 and FINE NIFs during block building. No explicit validation calls needed - type errors are caught with helpful error messages, and FINE ensures the VM never crashes.

Note: Streaming insert support was removed. For large datasets, use Natch.insert/4 directly as clickhouse-cpp handles wire-level chunking (64KB compression blocks) automatically. Users can chunk data in their own code before calling Natch.insert/4 if needed for memory management.

✅ Additional Column Types (Phase 5C-D - Complete!)

Status: All 5 advanced types complete (213 tests passing)

✅ Completed Types:

  1. UUID - ✅ Complete

    • 128-bit universally unique identifiers
    • Flexible parsing: strings (with/without hyphens), 16-byte binaries, mixed case
    • Full serialization/deserialization with standard UUID string format
    • 11 tests (10 unit + 1 integration)
  2. DateTime64(6) - ✅ Complete

    • Microsecond precision timestamps
    • DateTime struct support with automatic conversion
    • Integer timestamp support for direct microsecond values
    • 7 tests (6 unit + 1 integration)
  3. Decimal64(9) - ✅ Complete

    • Fixed-point decimals using Decimal library
    • Support for Decimal structs, integers, and floats
    • Automatic scaling and conversion for financial precision
    • 9 tests (8 unit + 1 integration)
  4. Nullable(T) - ✅ Complete

    • NULL support for UInt64, Int64, String, Float64
    • Natural Elixir nil handling
    • Dual-column management (nested data + null bitmap)
    • 6 tests (5 unit + 1 integration)
  5. Array(T) - ✅ Complete

    • Universal generic path for ALL array types
    • Offset-based encoding with ClickHouse's AppendAsColumn
    • Recursive support for arbitrary nesting: Array(Array(Array(T)))
    • Works for all types including nested: Array(Nullable(String)), Array(Array(Date))
    • ~5-10 µs per array operation (very fast!)
    • 3 integration tests covering simple arrays, nested arrays, and complex types
    • See ALL_TYPES.md for complete architecture documentation

Already Completed (Phase 5B):

  • ✅ Date - Date without time
  • ✅ Bool - Boolean (ClickHouse UInt8)
  • ✅ Float32 - Single precision float
  • ✅ UInt32/16, Int32/16/8 - Additional integer types

✅ Complex Type Nesting (Phase 5E - Complete!)

Status: All complex nesting patterns complete (227 tests passing)

Problem Discovered: During integration testing, discovered critical bugs preventing complex nested types from working:

  1. Map deserialization crash - tuple_col->Size() returned tuple element count (2) instead of key-value pair count
  2. Limited Nullable support - Only handled 4 specific types (UInt64, Int64, String, Float64), missing all others
  3. Missing type handlers - block_to_maps_impl lacked handlers for Map, Tuple, Enum8, Enum16, LowCardinality

Solutions Implemented:

  1. Fixed Map SELECT in select.cpp:

    • Changed tuple_col->Size() to keys_col->Size() for correct map size
    • Added Map handler to block_to_maps_impl for full SELECT support
  2. Generic Nullable handling:

    // Old: Type-specific handling (4 types only)
    if (auto uint64_col = nested->As<ColumnUInt64>()) { ... }
    else if (auto int64_col = nested->As<ColumnInt64>()) { ... }
    // ... only 4 types
    
    // New: Universal generic handling (ALL types)
    auto single_value_col = nested->Slice(i, 1);
    ERL_NIF_TERM elem_list = column_to_elixir_list(env, single_value_col);
    // Extract first element - works for ANY type
  3. Complete type handler coverage in block_to_maps_impl:

    • Added Tuple column handler
    • Added Enum8/Enum16 handlers
    • Added LowCardinality handler
    • All nested type combinations now work
  4. Enhanced Block.build_block for INSERT:

    • Added transpose_tuples/2 - converts list of tuples to columnar format
    • Added transpose_maps/1 - converts list of maps to keys/values arrays
    • Enables natural Elixir data structures: [{a, 1}, {b, 2}] and [%{"k" => 1}]

Comprehensive Integration Tests:

Created test/nesting_integration_test.exs with 14 full roundtrip tests:

  1. Array(Nullable(T)) - Arrays with null values

    • Array(Nullable(String)) with mixed nulls
    • Array(Nullable(UInt64)) with all nulls
  2. LowCardinality(Nullable(String)) - Dictionary encoding with nulls

  3. Tuple with Nullable elements - Tuple(Nullable(String), UInt64)

  4. Map with Nullable values - Map(String, Nullable(UInt64)) (was crashing)

  5. Array(LowCardinality(String)) - Dictionary encoding in arrays

  6. Array(LowCardinality(Nullable(String))) - Triple wrapper type!

  7. Map(String, Array(UInt64)) - Arrays as map values

  8. Map(String, Enum16) - Enums as map values

  9. Tuple(String, Array(UInt64)) - Arrays in tuples

  10. Tuple(Enum8, UInt64) - Enums in tuples

  11. Array(Array(Nullable(UInt64))) - Triple nesting with nulls

  12. Array(Array(Array(UInt64))) - Deep nesting stress test

  13. Array(Enum8) - Enums in arrays

All 14 tests pass with full INSERT→SELECT roundtrip validation.

Impact:

  • Enables arbitrarily complex nested types
  • Production-ready support for analytics workloads with complex schemas
  • Generic patterns ensure future types work automatically

Future: Explorer DataFrame Integration

Status: Documented for future implementation (Phase 6+)

Rationale:

  • Explorer is Elixir's high-performance DataFrame library
  • Already columnar format (uses Apache Arrow internally)
  • Perfect fit for analytics workloads
  • Near zero-copy potential

Proposed API:

# Future: Direct DataFrame support
df = Explorer.DataFrame.new(
  id: [1, 2, 3],
  name: ["Alice", "Bob", "Charlie"],
  amount: [100.5, 200.75, 300.25]
)

# Schema inference from DataFrame types
Natch.insert(conn, "events", df)

# Or explicit schema
Natch.insert(conn, "events", df, schema: [id: :uint64, name: :string, amount: :float64])

Implementation Notes:

  • Explorer DataFrames are backed by Rust Polars
  • Can access underlying Arrow arrays for zero-copy operations
  • Need to map Explorer types to ClickHouse types
  • Binary data can be passed directly to C++ without copying

Benefits:

  • Natural fit: Analytics → DataFrame → Columnar DB
  • Users work in DataFrames, insert to ClickHouse seamlessly
  • Potential for SIMD/vectorized operations
  • Ecosystem integration (Nx, Explorer, Kino)

Breaking Changes

API Changes:

  • Natch.insert/4 now expects columnar format (map of lists), not row format (list of maps)
  • Natch.Column.append/2 removed, replaced with append_bulk/2
  • Single-value append NIFs removed (bulk operations only)

Migration:

# Before (Phases 1-4)
rows = [%{id: 1, name: "Alice"}, %{id: 2, name: "Bob"}]
Natch.insert(conn, "users", rows, schema)

# After (Phase 5+)
columns = %{id: [1, 2], name: ["Alice", "Bob"]}
Natch.insert(conn, "users", columns, schema)

# Or use conversion helper
columns = Natch.Conversion.rows_to_columns(rows, schema)
Natch.insert(conn, "users", columns, schema)

Connection Options & Production Features

Support full ClientOptions:

  • Authentication (user/password) ✅ Already supported
  • Compression (LZ4) ✅ Already supported
  • SSL/TLS ✅ Complete (Phase 5G)
  • Timeouts ✅ Complete (Phase 6A)
  • Retry logic - Future consideration

Note: Connection pooling should be investigated in the clickhouse-cpp library itself. If the C++ library handles connection pooling, we should leverage that rather than implementing it at the Elixir level.


Phase 6: Production Polish

Goal: Error handling, testing, and production-readiness Status: 🔄 In Progress (Timeout support + GitHub release preparation complete) Priority: High

✅ Phase 6A: Socket-Level Timeout Configuration (COMPLETED)

Status: Complete - 316 tests passing (8 integration tests added)

Implementation: Added three configurable socket-level timeout options to prevent operations from hanging indefinitely:

{:ok, conn} = Natch.Connection.start_link(
  host: "localhost",
  port: 9000,
  connect_timeout: 5_000,   # TCP connection establishment (default: 5000ms)
  recv_timeout: 60_000,     # Data receive operations (default: 0 = infinite)
  send_timeout: 60_000      # Data send operations (default: 0 = infinite)
)

Key Features:

  • Three timeout types: connect, recv, send (all in milliseconds)
  • Conservative defaults: Match clickhouse-cpp library (5000ms, 0ms, 0ms)
  • Infinite by default: recv_timeout=0 allows long-running analytical queries
  • Consistent errors: Timeouts raise Natch.ConnectionError with descriptive messages
  • Integration tests: 8 tests marked with :integration tag, excluded by default

Files Modified:

  • lib/natch/connection.ex: Added timeout options to type spec, moduledoc, and build_client/1
  • lib/natch/native.ex: Updated client_create/10 signature (was 7 params, now 10)
  • native/natch_fine/src/minimal.cpp: Added 3 uint64_t timeout params and ClientOptions setters
  • test/timeout_integration_test.exs: Created integration test suite
  • test/test_helper.exs: Added :integration to excluded tags
  • README.md: Added "Timeout Configuration" section

Design Decisions:

  • Connection-level only (not per-operation) - matches clickhouse-cpp architecture
  • Simplified naming (connect_timeout vs connection_connect_timeout) for Elixir ergonomics
  • Use milliseconds consistently (matching Elixir conventions)
  • Keep as ConnectionError (no new exception type)

Testing Strategy:

  • Integration tests use SELECT sleep(seconds) to trigger recv timeout
  • Connect timeout tested with unreachable IP (TEST-NET-1: 192.0.2.1)
  • Tests verify timeouts fail within expected duration
  • All existing 316 tests remain passing

✅ Phase 6B: GitHub Release Preparation (COMPLETED)

Status: Complete - Ready for public release

Deliverables Completed:

  1. Licensing

    • ✅ MIT License added (LICENSE file)
    • ✅ Third-party notices documented (THIRD_PARTY_NOTICES.md)
    • ✅ Apache 2.0 dependency compliance verified
  2. Git Submodule Management

    • ✅ clickhouse-cpp converted to git submodule at native/clickhouse-cpp
    • ✅ Submodule points to v2.6.0 (commit 69195246)
    • ✅ CMakeLists.txt updated for flexible path resolution (env var or submodule)
    • ✅ Removed hardcoded absolute paths
    • ✅ .gitmodules configured
  3. GitHub Actions CI/CD

    • ✅ Comprehensive test workflow (.github/workflows/test.yml)
    • ✅ Matrix testing: Elixir 1.18.4/1.19.1 × OTP 27.2/28.1
    • ✅ ClickHouse service container with authentication
    • ✅ Valgrind memory leak detection job
    • ✅ Docker-based valgrind testing (Dockerfile.valgrind)
    • ✅ Formatting checks
    • ✅ Compiler warnings as errors
    • ✅ Artifact uploads for valgrind reports
  4. Package Metadata

    • ✅ mix.exs updated with proper package configuration
    • ✅ Hex.pm ready with licenses, links, and file filters
    • ✅ CHANGELOG.md created (documents v0.2.0 features)
  5. Code Cleanup

    • ✅ Removed experimental files (run_.sh, valgrind_.txt, docker_build.log)
    • ✅ Updated .gitignore for valgrind artifacts
    • ✅ Removed .claude/settings.local.json from git tracking
    • ✅ Cleaned up backward compatibility code

GitHub Actions Status:

  • Automated testing on push to main and pull requests
  • Comprehensive valgrind testing for memory safety
  • Multiple Elixir/OTP version combinations verified
  • Zero memory leaks verified via valgrind

✅ Phase 6D: Prebuilt Binary System (COMPLETED)

Status: Complete - v0.2.0 published to Hex.pm with precompiled binaries

Deliverables Completed:

  1. Precompiler Configuration

    • ✅ mix.exs configured with cc_precompiler
    • ✅ GitHub release URL template configured
    • ✅ NIF version 2.17 (OTP 26-28) specified
    • ✅ checksum*.exs added to package files
    • ✅ checksum*.exs added to .gitignore
  2. GitHub Actions Workflow

    • ✅ Created .github/workflows/precompile.yml
    • ✅ Builds on tag push (v*)
    • ✅ 7 platform binaries generated:
      • x86_64-linux-gnu
      • aarch64-linux-gnu
      • x86_64-apple-darwin
      • aarch64-apple-darwin
      • armv7l-linux-gnueabihf (cross-compiled)
      • i686-linux-gnu (cross-compiled)
      • riscv64-linux-gnu (cross-compiled)
  3. Release Process

    • ✅ v0.2.0 tagged and pushed
    • ✅ GitHub Actions built all binaries
    • ✅ Checksums generated with mix elixir_make.checksum --all
    • ✅ Published to Hex.pm successfully
    • ✅ Verified with fresh test installation

Verification: Created test project that successfully:

  • Downloaded precompiled binary from Hex.pm cache
  • Connected to ClickHouse
  • Executed all basic operations (CREATE, INSERT, SELECT)
  • Tested both row-major and columnar formats
  • Verified type system and aggregations work

Remaining for Future Releases:

  • ⏳ Phase 6C: Parameterized queries
  • ⏳ Phase 6E: Additional documentation polish

✅ Phase 6E: Memory Locality Optimizations (COMPLETED)

Status: Complete - 36% INSERT performance improvement Date: 2025-11-09

Problem Discovered: Counterintuitive benchmark results showed row-major INSERT (with O(N×M) conversion) was 1.3x faster than direct columnar INSERT for large datasets (1M rows). This contradicted expected algorithmic performance.

Root Cause:

  • Pre-generated columnar data was fragmented in BEAM's old heap (7 separate comprehensions)
  • Poor cache locality → 50% cache miss rate (~350M wasted cycles)
  • NIF list traversal is memory-bound, not computation-bound
  • Memory locality dominated over algorithmic complexity

Solution Implemented:

  1. Fresh Allocation Helper (bench/helpers.ex):
def fresh_columnar_data(columns) do
  Map.new(columns, fn {name, values} ->
    {name, Enum.to_list(values)}  # Forces young heap allocation
  end)
end
  1. Optimized Single-Pass Generation (bench/helpers.ex):
def generate_test_data_optimized(row_count) do
  initial = %{id: [], user_id: [], event_type: [], ...}

  # Sequential prepends create adjacent cons cells
  columns_reversed = Enum.reduce(1..row_count, initial, fn id, acc ->
    %{id: [id | acc.id], user_id: [rand() | acc.user_id], ...}
  end)

  # Reverse creates contiguous allocation
  Map.new(columns_reversed, fn {name, vals} -> {name, :lists.reverse(vals)} end)
end

Performance Results (1M rows INSERT):

Method Time Improvement Memory
Columnar (original) 2.10s baseline 940 bytes
Columnar (fresh) 1.80s 14% faster 1.74 KB
Row-major 1.49s 29% faster 997 MB
Columnar (optimized) 1.39s 36% faster 871 MB

Key Findings:

  • Columnar with optimized generation is now fastest method (beats row-major by 7%)
  • Sequential allocation in young heap provides excellent cache locality (10% miss rate vs 50%)
  • Memory locality matters more than algorithmic complexity for NIF boundary crossing
  • ~280M cycles saved from better cache behavior (≈140ms at 2GHz)

Deliverables:

  • Bench.Helpers.fresh_columnar_data/1 for forcing young heap allocation
  • Bench.Helpers.generate_test_data_optimized/1 with single-pass reduction pattern
  • ✅ Updated all benchmarks with fresh and optimized-gen variants
  • ✅ Comprehensive documentation in research/PERF.md Phase 6
  • ✅ Updated README with optimized generation pattern and new benchmark stats
  • ✅ Updated API documentation with best practices

Updated Performance Stats (vs Pillar):

  • INSERT 10k rows: 12.9ms (4.8x faster)
  • INSERT 100k rows: 156ms (4.0x faster)
  • INSERT 1M rows: 1,338ms (3.8x faster) ← 36% improvement from previous 2,094ms

Documentation:

  • Complete analysis in research/PERF.md Phase 6
  • Best practices added to README Performance Tips section
  • Cache miss calculations and BEAM GC behavior explained

✅ Phase 6C: Parameterized Queries (COMPLETED)

Goal: Add support for parameterized queries to prevent SQL injection and enable type-safe query building Status: ✅ Complete - 348 tests passing (19 new tests added) Priority: High - Security and usability Date: 2025-11-09

Problem: Current API requires string concatenation for dynamic queries, which is:

  • Vulnerable to SQL injection
  • Error-prone (manual escaping)
  • Lacks type safety
# Current - Unsafe!
user_input = "'; DROP TABLE users; --"
Natch.query(conn, "SELECT * FROM users WHERE name = '#{user_input}'")

Solution: Native ClickHouse Query API with Type Inference

ClickHouse native protocol supports parameterized queries using {name:Type} syntax. We leverage this with automatic type inference for SELECT operations to provide both safety and ergonomics.

Design Decision: Inference for SELECTs, Explicit for Writes

After analysis, we determined:

  • SELECT queries (90% of use): Types are logically redundant since schema defines them
  • INSERT queries: Use specialized columnar API (Natch.insert/4) - no params needed
  • execute/3 (UPDATE/DELETE): Keep explicit types for safety

Implementation:

  1. C++ Query NIFs (native/natch_fine/src/query.cpp)

    • query_create/1 - Creates Query resource from SQL
    • query_bind_* functions for all ClickHouse types
    • query_send/2 - Executes parameterized query via native protocol
  2. Elixir Query Builder (lib/natch/query.ex)

    # Builder API - explicit types required
    query = Query.new("SELECT * FROM users WHERE id = {id:UInt64}")
    |> Query.bind(:id, 42)
    
    # Automatic type inference from Elixir values
    defp bind_value(ref, name, value) when is_integer(value) and value >= 0 do
      Natch.Native.query_bind_uint64(ref, name, value)  # UInt64
    end
  3. Type Inference for SELECT APIs (lib/natch.ex)

    def select_rows(conn, sql, params) when is_binary(sql) do
      # Transform {id} -> {id:UInt64} based on param value
      sql_with_types = add_parameter_types(sql, params)
      query = Query.new(sql_with_types) |> Query.bind_all(params)
      select_rows(conn, query)
    end
    
    defp infer_clickhouse_type(v) when is_integer(v) and v >= 0, do: "UInt64"
    defp infer_clickhouse_type(v) when is_integer(v), do: "Int64"
    defp infer_clickhouse_type(v) when is_float(v), do: "Float64"
    defp infer_clickhouse_type(v) when is_binary(v), do: "String"
    defp infer_clickhouse_type(%DateTime{}), do: "DateTime"

Usage Patterns:

# SELECT with automatic type inference (RECOMMENDED)
{:ok, rows} = Natch.select_rows(
  conn,
  "SELECT * FROM users WHERE id = {id} AND status = {status}",
  id: 42,          # Inferred as UInt64
  status: "active" # Inferred as String
)

# Builder API with explicit types (advanced control)
query = Query.new("SELECT * FROM users WHERE id = {id:UInt64}")
|> Query.bind(:id, 42)
{:ok, rows} = Natch.select_rows(conn, query)

# execute/3 requires explicit types (safety for writes)
:ok = Natch.execute(
  conn,
  "DELETE FROM users WHERE id = {id:UInt64}",
  id: 42
)

Type Inference Rules:

  • integer() >= 0 → UInt64
  • integer() < 0 → Int64
  • float() → Float64
  • binary() → String
  • DateTime.t() → DateTime
  • Date.t() → Date
  • nil → raises (must use explicit Nullable(T))

Security:

  • Parameters sent separately from SQL over wire protocol
  • No string escaping needed - values cannot be interpreted as SQL
  • SQL injection prevention verified with malicious input tests

Testing:

  • 19 parameterized query tests added
  • SQL injection prevention tests
  • Type inference tests for all supported types
  • Mixed typed/untyped placeholder tests
  • NULL handling tests

Implemented Features: ✓ Builder API with explicit types ✓ Simple API with automatic type inference ✓ Both keyword list and map parameter styles ✓ SQL injection protection ✓ Comprehensive type support (integers, floats, strings, DateTime, Date, NULL) ✓ Mixed typed/untyped placeholders ✓ Integration with select_rows, select_cols, and execute

Error Handling

// Custom error types
namespace fine {
  template<>
  struct Encoder<clickhouse::ServerException> {
    static ERL_NIF_TERM encode(ErlNifEnv* env, const clickhouse::ServerException& ex) {
      // Convert to Elixir exception with error code, message, stack trace
    }
  };
}

Testing

  • Comprehensive error handling tests
  • Memory leaks (valgrind)
  • Concurrent operations
  • Authentication
  • Connection failures and retries

Success Criteria: Production-ready error handling and reliability


Phase 7: Advanced Query Features (Nice to Have)

Goal: Streaming SELECT, batch operations, advanced query patterns Status: ⏳ Pending Priority: Low - Nice to have

Streaming SELECT

For large result sets, stream data back to Elixir:

void client_select_async(
    ErlNifEnv *env,
    fine::ResourcePtr<Client> client,
    std::string query,
    ErlNifPid receiver_pid) {

  client->Select(query, [&](const Block& block) {
    // Send block to Elixir process
    ErlNifEnv* msg_env = enif_alloc_env();
    ERL_NIF_TERM block_term = encode_block(msg_env, block);
    enif_send(env, &receiver_pid, msg_env, block_term);
    enif_free_env(msg_env);
  });
}

Batch Operations

Support multiple queries in a single transaction or batch.


❌ Phase 8: Ecto Integration (NOT IMPLEMENTING)

Status: ❌ Will not implement Reason: Per user feedback - "I don't think Ecto is a good fit. We should never do this."

ClickHouse is an OLAP database optimized for analytics, not OLTP. Ecto's schema-based approach and transaction model don't align well with ClickHouse's use cases. Users should use Natch's direct query interface instead.


Technical Challenges & Solutions

1. Smart Pointer Lifetime Management

Problem: clickhouse-cpp uses std::shared_ptr, FINE uses ResourcePtr

Solution Options:

  • A. Store shared_ptr inside a wrapper struct, wrap that with ResourcePtr
  • B. Use aliasing constructor to keep resources alive
  • C. Careful manual lifetime management

Recommended: Option A

struct ColumnResource {
  std::shared_ptr<clickhouse::Column> ptr;

  ColumnResource(std::shared_ptr<clickhouse::Column> p) : ptr(p) {}
};

FINE_RESOURCE(ColumnResource);

fine::ResourcePtr<ColumnResource> column_create(
    ErlNifEnv *env, std::string type) {
  auto col = clickhouse::CreateColumnByType(type);
  return fine::make_resource<ColumnResource>(col);
}

2. Type Dispatch for 42 Column Types

Problem: Need to handle 42+ column types dynamically

Solution: Phased implementation

  • Phase 1: 5 essential types (UInt64, Int64, String, Float64, DateTime)
  • Phase 2: 10 common types (add Nullable, Array, Date, Bool, Float32)
  • Phase 3: 20 types (add Decimal, UUID, IPv4/6, Int32/16/8, UInt32/16/8)
  • Phase 4: All remaining types (Geo, Enum, Map, Tuple, LowCardinality)

Pattern: Use visitor or type code dispatch

ElixirValue extract_value(ColumnRef col, size_t row) {
  switch (col->Type()->GetCode()) {
    case Type::UInt64:
      return (*col->As<ColumnUInt64>())[row];
    case Type::String:
      return std::string((*col->As<ColumnString>())[row]);
    case Type::Array:
      return extract_array_value(col, row);
    // ... etc
  }
}

3. Callback Bridge for SELECT

Problem: C++ callback → Elixir process

Solution: Accumulate in C++, return to Elixir

  • SELECT accumulates all blocks
  • Return vector of blocks
  • Elixir converts blocks to maps

Alternative (Phase 4): Streaming with send to Elixir process

void client_select_async(
    ErlNifEnv *env,
    fine::ResourcePtr<Client> client,
    std::string query,
    ErlNifPid receiver_pid) {

  client->Select(query, [&](const Block& block) {
    // Send block to Elixir process
    ErlNifEnv* msg_env = enif_alloc_env();
    ERL_NIF_TERM block_term = encode_block(msg_env, block);
    enif_send(env, &receiver_pid, msg_env, block_term);
    enif_free_env(msg_env);
  });
}

4. Binary Data Efficiency

Problem: Large strings/binaries copy overhead

Solution: Use ErlNifBinary for zero-copy where possible

// For ColumnString, explore zero-copy paths
ERL_NIF_TERM column_string_get_binary(
    ErlNifEnv *env,
    fine::ResourcePtr<ColumnResource> col,
    size_t index) {

  auto typed = col->ptr->As<ColumnString>();
  auto str_view = (*typed)[index];

  // Ideally zero-copy, but may require copying for safety
  ErlNifBinary bin;
  enif_alloc_binary(str_view.size(), &bin);
  std::memcpy(bin.data, str_view.data(), str_view.size());

  return enif_make_binary(env, &bin);
}

Build System Setup

Directory Structure

natch/
├── lib/
│   ├── natch.ex                 # Public API
│   ├── natch/
│   │   ├── connection.ex       # GenServer
│   │   ├── native.ex          # NIF declarations (minimal)
│   │   ├── column.ex          # Column builders
│   │   └── insert.ex          # Insert helpers
├── native/
│   └── natch_fine/
│       ├── CMakeLists.txt     # Build config
│       ├── Makefile           # Mix integration
│       ├── src/
│       │   ├── client.cpp     # Client NIFs
│       │   ├── block.cpp      # Block NIFs
│       │   ├── column.cpp     # Column NIFs
│       │   ├── select.cpp     # SELECT NIFs
│       │   └── convert.cpp    # Type conversions
│       └── clickhouse-cpp/    # Git submodule or vendored
├── test/
│   ├── natch_test.exs
│   └── integration_test.exs
├── mix.exs
└── README.md

CMakeLists.txt

cmake_minimum_required(VERSION 3.15)
project(natch_fine)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Find FINE (from Hex package)
find_package(fine REQUIRED)

# Add clickhouse-cpp
add_subdirectory(clickhouse-cpp)

# Build NIF shared library
add_library(natch_fine SHARED
  src/client.cpp
  src/block.cpp
  src/column.cpp
  src/select.cpp
  src/convert.cpp
)

target_link_libraries(natch_fine
  PRIVATE
    fine::fine
    clickhouse-cpp-lib
)

# Set visibility
set_target_properties(natch_fine PROPERTIES
  CXX_VISIBILITY_PRESET hidden
  PREFIX ""
)

# Output to priv/
set_target_properties(natch_fine PROPERTIES
  LIBRARY_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/../../priv
)

Makefile (for Mix)

.PHONY: all clean

MIX_ENV ?= dev
BUILD_DIR = _build/$(MIX_ENV)

all:
	@mkdir -p $(BUILD_DIR)
	@cd $(BUILD_DIR) && cmake ../../native/natch_fine
	@cmake --build $(BUILD_DIR)

clean:
	@rm -rf $(BUILD_DIR)
	@rm -rf ../../priv/natch_fine.*

mix.exs

defmodule Natch.MixProject do
  use Mix.Project

  def project do
    [
      app: :natch,
      version: "0.2.0",
      elixir: "~> 1.18",
      compilers: [:elixir_make] ++ Mix.compilers(),
      make_targets: ["all"],
      make_clean: ["clean"],
      deps: deps()
    ]
  end

  def application do
    [extra_applications: [:logger]]
  end

  defp deps do
    [
      {:fine, "~> 0.1"},
      {:elixir_make, "~> 0.6", runtime: false},
      {:ex_doc, "~> 0.34", only: :dev, runtime: false}
    ]
  end
end

Testing Strategy

Unit Tests (per Phase)

defmodule Natch.ConnectionTest do
  use ExUnit.Case

  setup do
    {:ok, conn} = Natch.start_link(host: "localhost", port: 9000)

    on_exit(fn ->
      Natch.stop(conn)
    end)

    {:ok, conn: conn}
  end

  test "ping", %{conn: conn} do
    assert :ok = Natch.ping(conn)
  end

  test "execute DDL", %{conn: conn} do
    assert :ok = Natch.execute(conn, """
      CREATE TABLE IF NOT EXISTS test (
        id UInt64,
        name String
      ) ENGINE = Memory
    """)
  end
end

Integration Tests

defmodule Natch.IntegrationTest do
  use ExUnit.Case

  @moduletag :integration

  setup_all do
    # Start ClickHouse via Docker
    System.cmd("docker-compose", ["up", "-d", "clickhouse"])
    :timer.sleep(2000)

    on_exit(fn ->
      System.cmd("docker-compose", ["down"])
    end)

    :ok
  end

  test "full insert and select cycle" do
    {:ok, conn} = Natch.start_link(host: "localhost", port: 9000)

    # Create table
    Natch.execute(conn, """
      CREATE TABLE test (
        id UInt64,
        name String,
        amount Float64
      ) ENGINE = Memory
    """)

    # Insert data
    columns = %{
      id: [1, 2],
      name: ["Alice", "Bob"],
      amount: [100.5, 200.75]
    }

    schema = [id: :uint64, name: :string, amount: :float64]
    :ok = Natch.insert(conn, "test", columns, schema)

    # Query back
    {:ok, results} = Natch.query(conn, "SELECT * FROM test ORDER BY id")

    assert length(results) == 2
    assert hd(results).name == "Alice"
  end
end

Performance Benchmarks

defmodule Natch.BenchmarkTest do
  use ExUnit.Case

  @tag :benchmark
  test "bulk insert performance" do
    {:ok, conn} = Natch.start_link(host: "localhost", port: 9000)

    # Generate 100k rows
    rows = for i <- 1..100_000 do
      %{id: i, value: :rand.uniform(1000)}
    end

    schema = [id: :uint64, value: :uint64]

    {time_us, :ok} = :timer.tc(fn ->
      Natch.insert(conn, "benchmark", rows, schema)
    end)

    rows_per_sec = 100_000 / (time_us / 1_000_000)
    IO.puts("Inserted #{rows_per_sec} rows/sec")

    # Should be > 50k rows/sec for native protocol to be worthwhile
    assert rows_per_sec > 50_000
  end
end

Memory Leak Testing

# Run with valgrind
MIX_ENV=test valgrind --leak-check=full --track-origins=yes \
  mix test --only memory_leak

# Or use AddressSanitizer
CXXFLAGS="-fsanitize=address" mix compile
mix test

Deployment Considerations

Platform Support

Required Platforms:

  • Linux x86_64 (primary)
  • macOS ARM64 (development)
  • macOS x86_64 (optional)

Build Requirements:

  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2017+)
  • CMake 3.15+
  • ClickHouse-cpp dependencies: abseil, lz4, cityhash, zstd

Docker Development

# Dockerfile.dev
FROM elixir:1.18

# Install C++ build tools
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    git \
    liblz4-dev \
    libzstd-dev

WORKDIR /app

COPY mix.exs mix.lock ./
RUN mix deps.get

COPY . .
RUN mix compile

CMD ["iex", "-S", "mix"]

CI/CD Pipeline

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    services:
      clickhouse:
        image: clickhouse/clickhouse-server:latest
        ports:
          - 9000:9000
          - 8123:8123

    steps:
      - uses: actions/checkout@v3
        with:
          submodules: recursive

      - uses: erlef/setup-beam@v1
        with:
          otp-version: 27.2
          elixir-version: 1.18

      - name: Install C++ dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y build-essential cmake liblz4-dev libzstd-dev

      - name: Cache dependencies
        uses: actions/cache@v3
        with:
          path: deps
          key: deps-${{ runner.os }}-${{ hashFiles('mix.lock') }}

      - name: Install dependencies
        run: mix deps.get

      - name: Compile
        run: mix compile

      - name: Run tests
        run: mix test

Risk Mitigation

High-Risk Items

  1. Smart Pointer Lifetime Mismatch

    • Risk: Segfaults from dangling pointers
    • Mitigation: Comprehensive testing, valgrind, wrapper pattern
    • Fallback: Use copying instead of sharing
  2. Type Dispatch Complexity

    • Risk: Missing type support causes runtime errors
    • Mitigation: Explicit error for unsupported types, phased rollout
    • Fallback: Support common types only, document limitations
  3. Memory Leaks

    • Risk: Resources not freed properly
    • Mitigation: RAII, ResourcePtr, valgrind testing
    • Fallback: Regular process restarts in production

Medium-Risk Items

  1. Build System Complexity

    • Risk: Platform-specific build failures
    • Mitigation: Docker-based builds, comprehensive CI
    • Fallback: Provide pre-built binaries
  2. Upstream Changes

    • Risk: clickhouse-cpp breaking changes
    • Mitigation: Pin to specific version/tag, test before upgrading
    • Fallback: Fork if necessary

Low-Risk Items

  1. Performance Not Meeting Expectations
    • Risk: FFI overhead negates protocol benefits
    • Mitigation: Early benchmarking, compare to HTTP baseline
    • Fallback: Document actual performance, users decide

Success Metrics

✅ Phase 1 Success (ACHIEVED)

  • ✅ Connection established
  • ✅ Ping works
  • ✅ DDL operations succeed
  • ✅ 10 tests passing

✅ Phase 2 Success (ACHIEVED)

  • ✅ Core 5 types working (UInt64, Int64, String, Float64, DateTime)
  • ✅ Column creation and population
  • ✅ 33 tests passing

✅ Phase 3 Success (ACHIEVED)

  • ✅ Block building from Elixir maps
  • ✅ INSERT operations working
  • ✅ 17 tests passing

✅ Phase 4 Success (ACHIEVED - MVP Reached!)

  • ✅ SELECT queries returning data
  • ✅ Block-to-maps conversion
  • ✅ Large result sets (10k rows tested)
  • ✅ 12 tests passing
  • Total: 89 tests passing (2 PoC + 10 Phase1 + 33 Phase2 + 17 Phase3 + 12 Phase4 + 15 remaining)

✅ Phase 5A Success (ACHIEVED - Columnar API)

  • ✅ Bulk append NIFs implemented
  • ✅ Columnar insert API with map of lists
  • ✅ 100x performance improvement (N×M NIF calls → M NIF calls)
  • ✅ Zero transposition overhead

✅ Phase 5B Success (ACHIEVED - 8 Additional Types)

  • ✅ Date, Bool, Float32
  • ✅ UInt32/16, Int32/16/8
  • ✅ 160 tests passing total

✅ Phase 5C-D Success (COMPLETE - All 5 Advanced Types)

  • ✅ UUID (128-bit identifiers, flexible parsing)
  • ✅ DateTime64(6) (microsecond precision timestamps)
  • ✅ Decimal64(9) (fixed-point with Decimal library)
  • ✅ Nullable(T) (NULL support for UInt64, Int64, String, Float64)
  • ✅ Array(T) (100% type coverage with universal generic path)
  • Total: 213 tests passing (53 new tests added)

✅ Phase 5E Success (COMPLETE - Complex Type Nesting)

  • ✅ Enhanced Block.build_block to handle Tuple and Map INSERT
  • ✅ Fixed critical Map deserialization bug (incorrect size calculation)
  • ✅ Made Nullable handling generic via Slice() and recursion
  • ✅ Added missing type handlers: Map, Tuple, Enum8, Enum16, LowCardinality in block_to_maps_impl
  • ✅ Comprehensive integration tests (14 tests) with full INSERT→SELECT roundtrip validation:
    • ✅ Array(Nullable(String)) and Array(Nullable(UInt64))
    • ✅ LowCardinality(Nullable(String))
    • ✅ Tuple(Nullable(String), UInt64)
    • ✅ Map(String, Nullable(UInt64))
    • ✅ Array(LowCardinality(String))
    • ✅ Array(LowCardinality(Nullable(String))) - triple wrapper!
    • ✅ Map(String, Array(UInt64))
    • ✅ Map(String, Enum16)
    • ✅ Tuple(String, Array(UInt64))
    • ✅ Tuple(Enum8, UInt64)
    • ✅ Array(Array(Nullable(UInt64))) - triple nesting
    • ✅ Array(Array(Array(UInt64))) - deep nesting stress test
    • ✅ Array(Enum8)
  • Total: 227 tests passing (14 new integration tests added)

Phase 6A Success (Timeout Configuration)

  • ✅ Socket-level timeout configuration (connect, recv, send)
  • ✅ Conservative defaults matching clickhouse-cpp
  • ✅ Integration test suite (8 tests)
  • ✅ Documentation in README and moduledoc
  • Total: 316 tests passing (8 integration tests added)

✅ Phase 6D Success (Prebuilt Binary Releases) - COMPLETED

  • ✅ v0.2.0 published to Hex.pm with precompiled binaries
  • ✅ GitHub Actions workflow for 7 platforms
  • ✅ Semi-automated release process
  • ✅ Verification with fresh test installation

✅ Phase 6E Success (Memory Locality Optimizations) - COMPLETED

  • ✅ 36% INSERT performance improvement for large datasets
  • ✅ Optimized data generation helpers implemented
  • ✅ Comprehensive documentation in research/PERF.md
  • ✅ Updated benchmarks and performance stats
  • ✅ All 316 tests passing

Performance Targets (EXCEEDED)

  • INSERT: ✅ 1,338ms for 1M rows (3.8x faster than Pillar, 36% faster than original)
  • Latency: ✅ 12.9ms for 10k rows (4.8x faster than Pillar)
  • Memory: ✅ Minimal BEAM usage (940 bytes columnar, vs 4.3GB Pillar)
  • SELECT: ✅ 6.2x faster than Pillar for 1M row full scans

Comparison: FINE vs Alternatives

Approach Effort Performance Risk Maintainability
FINE Wrapper 4-6 weeks Native C++ Medium Medium
Pure Elixir 4-6 months 20-40% slower Low High
Rustler Wrapper 6-10 weeks Native Medium-High Medium
Pillar (HTTP) 0 (exists) Baseline None High

Open Questions

  1. Resource Lifetime: Best pattern for Column → Block → Insert lifecycle? → Solved: Wrapper pattern with shared_ptr inside ResourcePtr
  2. SELECT callback handling: How to bridge C++ callbacks to Elixir? → Solved: Convert to Erlang terms immediately in callback
  3. Schema Inference: Query ClickHouse for table schema automatically?
  4. Connection Pooling: At Elixir level or leverage clickhouse-cpp? → Investigate clickhouse-cpp capabilities first
  5. Compression Default: Enable LZ4 by default or opt-in?

Next Steps

With MVP achieved (Phases 1-4 complete), all advanced types complete (Phase 5A-G), timeout support complete (Phase 6A), and GitHub release preparation complete (Phase 6B), current priorities:

  1. ✅ Phase 5A-B: Columnar API & Performance (COMPLETED)

    • ✅ Bulk append NIFs (C++ implementation)
    • ✅ Columnar insert API (Elixir layer)
    • ✅ 8 additional types (Date, Bool, Float32, UInt32/16, Int32/16/8)
    • ✅ 100x performance improvement achieved
    • ✅ Breaking change implemented: Row-oriented → Columnar API
  2. ✅ Phase 5C-D: All Advanced Types (COMPLETED)

    • ✅ UUID (128-bit identifiers)
    • ✅ DateTime64(6) (microsecond precision)
    • ✅ Decimal64(9) (fixed-point with Decimal library)
    • ✅ Nullable(T) (NULL support for 4 types)
    • ✅ Array(T) (100% type coverage with universal generic path)
    • ✅ 53 new tests added (213 tests passing total)
    • ✅ Complete architecture documentation in ALL_TYPES.md
  3. ✅ Phase 5E: Complex Type Nesting (COMPLETED)

    • ✅ Fixed critical Map deserialization crash
    • ✅ Generic Nullable handling for all types
    • ✅ Complete type handler coverage in SELECT
    • ✅ Enhanced Block.build_block for Tuple and Map INSERT
    • ✅ 14 comprehensive integration tests with full roundtrip validation
    • ✅ 227 tests passing total
    • ✅ Production-ready support for arbitrarily complex nested types
  4. ✅ Phase 5F: SELECT API Redesign & Columnar Results (COMPLETED)

    • ✅ Implemented client_select_cols in C++ (lines 567-824 in select.cpp)
    • ✅ Returns columnar format: %{column_name => [values]}
    • ✅ Added select_cols/2 to Elixir Connection API
    • ✅ Renamed select/2 to select_rows/2 with backward compatibility
    • ✅ Updated benchmarks to compare row-major vs columnar performance
    • ✅ Updated README with comprehensive API documentation
    • ✅ All 309 tests passing
    • Performance characteristics:
      • Columnar format ideal for large result sets and analytics
      • Row-major format maintained for traditional use cases
      • Backward compatible via deprecated select/2 wrapper
  5. ✅ Phase 5G: SSL/TLS Support (COMPLETED)

    • ✅ OpenSSL integration with clickhouse-cpp
    • ✅ System CA certificate support
    • ✅ SNI (Server Name Indication) enabled
    • ✅ ClickHouse Cloud compatibility on port 9440
    • ✅ Comprehensive documentation in README
  6. ✅ Phase 6A: Socket-Level Timeout Configuration (COMPLETED)

    • ✅ Three timeout types: connect_timeout, recv_timeout, send_timeout
    • ✅ Conservative defaults matching clickhouse-cpp (5000ms, 0ms, 0ms)
    • ✅ Connection-level configuration (not per-operation)
    • ✅ Comprehensive timeout documentation in README and moduledoc
    • ✅ Integration test suite (8 tests, excluded by default)
    • ✅ All 316 tests passing (8 integration tests added)
  7. ✅ Phase 6B: GitHub Release Preparation (COMPLETED)

    • ✅ MIT License + third-party notices
    • ✅ clickhouse-cpp git submodule conversion
    • ✅ GitHub Actions CI/CD with matrix testing
    • ✅ Valgrind memory leak detection automation
    • ✅ Package metadata for Hex.pm
    • ✅ Code cleanup and .gitignore updates
  8. ✅ Phase 6D: Prebuilt Binary Releases (COMPLETED)

    • ✅ GitHub Actions workflow with 4 parallel jobs (testing + cross-compilation)
    • Tested platforms: Linux x86_64, Linux ARM64, macOS x86_64, macOS ARM64
    • Cross-compiled: Linux ARMv7, RISC-V 64, i686 (best effort, no tests)
    • ✅ NIF version matrix: 2.15 (OTP 24), 2.16 (OTP 25), 2.17 (OTP 26-28)
    • ✅ ~18-21 precompiled binaries per release
    • ✅ Semi-automated workflow: CI builds → manual checksum → manual Hex publish
    • ✅ v0.2.0 published to Hex.pm with precompiled binaries
  9. ✅ Phase 6E: Memory Locality Optimizations (COMPLETED)

    • ✅ 36% INSERT performance improvement for large datasets
    • ✅ Fixed memory allocation patterns causing cache misses
    • ✅ Added optimized data generation helpers (fresh_columnar_data, generate_test_data_optimized)
    • ✅ Comprehensive documentation in research/PERF.md Phase 6
    • ✅ Updated benchmarks and README with new performance stats
    • ✅ All 316 tests passing
  10. Phase 6C: Parameterized Queries (NEXT PRIORITY)

  • Support for query parameters to prevent SQL injection
  • Bind parameter syntax (e.g., SELECT * FROM table WHERE id = ?)
  • Type-safe parameter binding
  • Integration with existing query API
  1. Phase 6F: Additional Production Features (FUTURE)
  • Comprehensive error handling refinement
  • Performance optimization and profiling
  • Documentation polish for public consumption
  1. Phase 7: Ecto Integration Investigation (FUTURE - NEEDS RESEARCH)
  • Status: Under consideration - partial fit at best
  • Scope: Focus on query execution, not full schema/migration support
  • Rationale: ClickHouse is OLAP, not OLTP. Investigate limited integration:
    • Query builder integration for SELECT operations
    • Type casting and result mapping
    • Connection pooling via DBConnection
  • Limitations to document:
    • No transactions (ClickHouse doesn't support them)
    • No schema migrations (table structures are analytics-focused)
    • No associations/preloading (not OLTP use case)
    • Limited changesets (bulk inserts don't fit Ecto's row model)
  • Decision point: Determine if partial Ecto support provides value or creates confusion
  1. Phase 8: Explorer DataFrame Integration (FUTURE)
  • Direct DataFrame insert support
  • Zero-copy optimizations with Arrow
  • Schema inference from DataFrame types
  • Natural analytics workflow integration
  1. Phase 9: Advanced Query Features (NICE TO HAVE)
  • Streaming SELECT for large result sets
  • Batch operations
  • Async query support
  1. NOT IMPLEMENTING:
  • ❌ Distributed Queries (removed)

References


Appendix A: Minimal Working Example

This example demonstrates the absolute minimum code needed to verify FINE + clickhouse-cpp works:

native/natch_fine/minimal.cpp

#include <fine.hpp>
#include <clickhouse/client.h>

using namespace clickhouse;

FINE_RESOURCE(Client);

fine::ResourcePtr<Client> create_client(ErlNifEnv *env) {
  ClientOptions opts;
  opts.SetHost("localhost");
  opts.SetPort(9000);
  return fine::make_resource<Client>(opts);
}
FINE_NIF(create_client, 0);

std::string ping(ErlNifEnv *env, fine::ResourcePtr<Client> client) {
  try {
    client->Ping();
    return "pong";
  } catch (const std::exception& e) {
    return std::string("error: ") + e.what();
  }
}
FINE_NIF(ping, 0);

FINE_INIT("Elixir.NatchMinimal");

lib/natch_minimal.ex

defmodule NatchMinimal do
  @moduledoc """
  Minimal FINE wrapper proof-of-concept
  """

  @on_load :load_nifs

  def load_nifs do
    path = :filename.join(:code.priv_dir(:natch), 'natch_fine')
    :ok = :erlang.load_nif(path, 0)
  end

  def create_client, do: :erlang.nif_error(:not_loaded)
  def ping(_client), do: :erlang.nif_error(:not_loaded)
end

Test

{:ok, client} = NatchMinimal.create_client()
"pong" = NatchMinimal.ping(client)

If this works, you're ready to proceed with the full implementation!


End of Document