diff --git a/README.md b/README.md
index 0e2e0ea..ea684b6 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ You're welcome.
 - **Finds root causes** — Hypothesis-driven investigation. No hunches. No vibes. Data.
 - **Systematic triage** — Golden signals, USE/RED methods. The stuff you should already know.
 - **Remembers everything** — Persistent memory for patterns, queries, and incidents. Unlike you, I learn.
+- **Metrics querying** — OTel metrics via MPL. Logs via APL. One agent, both engines.
 - **Unified observability** — One config, all your tools. Because having four config files is amateur hour.
 
 ## Installation
@@ -67,9 +68,12 @@ Auth options per deployment:
 ## Usage
 
 ```bash
-# Query logs
+# Query logs (APL)
 scripts/axiom-query prod "['dataset'] | where _time > ago(1h) | where status >= 500 | project _time, message, status | take 10"
 
+# Query metrics (MPL)
+scripts/axiom-metrics-query prod --range 1h <<< "otel-metrics:http.server.request.duration | align to 5m using avg | group by service.name"
+
 # Check what's on fire
 scripts/grafana-alerts prod firing
 
@@ -84,7 +88,7 @@ scripts/slack default chat.postMessage channel=incidents text="Fixed. You're wel
 
 | Category | Scripts |
 |----------|---------|
-| **Axiom** | `axiom-query`, `axiom-api`, `axiom-link`, `axiom-deployments` |
+| **Axiom** | `axiom-query`, `axiom-metrics-query`, `axiom-api`, `axiom-link`, `axiom-deployments` |
 | **Grafana** | `grafana-query`, `grafana-alerts`, `grafana-datasources`, `grafana-api` |
 | **Pyroscope** | `pyroscope-flamegraph`, `pyroscope-diff`, `pyroscope-services`, `pyroscope-api` |
 | **Slack** | `slack` |
@@ -105,7 +109,7 @@ scripts/test-config-toml    # TOML parsing with indented sections
 2. **State facts.** "The logs show X" not "this is probably X."
 3. **Disprove, don't confirm.** Design queries to falsify your hypothesis.
 4. **Time filter first.** Always. No exceptions.
-5. **Discover schema.** Run `getschema` before querying unfamiliar datasets.
+5. **Discover schema.** Run `getschema` (APL) or `--spec` (MPL) before querying unfamiliar datasets.
 
 ## Memory
 
diff --git a/SKILL.core.md b/SKILL.core.md
index e69c7ff..325b70d 100644
--- a/SKILL.core.md
+++ b/SKILL.core.md
@@ -155,8 +155,10 @@ Follow this loop strictly.
 
 ### D. EXECUTE (Query)
 - **Select methodology:** Golden Signals (customer-facing health), RED (request-driven services), USE (infrastructure resources)
-- **Select telemetry:** Use whatever's available—metrics, logs, traces, profiles
-- **Run query:** `scripts/axiom-query` (logs), `scripts/grafana-query` (metrics), `scripts/pyroscope-diff` (profiles)
+- **Metrics:** Axiom MetricsDB (`[MPL]` datasets from `scripts/init`), Grafana/PromQL, alerts/dashboards via Grafana
+- **Discover metrics:** `scripts/axiom-metrics-discover` (list metrics, tags, tag values in MetricsDB datasets)
+- **Alerts & dashboards:** Grafana only — `scripts/grafana-alerts`, `scripts/grafana-dashboards`
+- **Run query:** `scripts/axiom-query` (logs/APL), `scripts/axiom-metrics-query` (metrics/MPL), `scripts/grafana-query` (PromQL), `scripts/pyroscope-diff` (profiles)
 
 ### E. VERIFY & REFLECT
 - **Methodology check:** Service → RED. Resource → USE.
@@ -309,7 +311,7 @@ For request-driven services. Measures the *work* the service does.
 | **Errors** | Error rate (5xx / total) |
 | **Duration** | Latency percentiles (p50, p95, p99) |
 
-Measure via logs (APL — see `reference/apl.md`) or metrics (PromQL — see `reference/grafana.md`).
+Measure via logs (APL — see `reference/apl.md`), OTel metrics (MPL — see `reference/metrics.md`), or PromQL fallback (see `reference/grafana.md`).
 
 ### C. USE METHOD (Resources)
 
@@ -321,7 +323,7 @@ For infrastructure resources (CPU, memory, disk, network). Measures the *capacit
 | **Saturation** | Queue depth, load average, waiting threads |
 | **Errors** | Hardware/network errors |
 
-Typically measured via metrics. See `reference/grafana.md` for PromQL patterns.
+Check Axiom MetricsDB first (OTel resource metrics). Fall back to Grafana/PromQL if not available. See `reference/grafana.md` for PromQL patterns.
 
 ### D. DIFFERENTIAL ANALYSIS
 
@@ -358,6 +360,8 @@ See `reference/apl.md` for full operator, function, and pattern reference.
 - **Avoid `search`**—scans ALL fields. Last resort only.
 - **Field escaping**—dots need `\\.`: `['kubernetes.node_labels.nodepool\\.axiom\\.co/name']`
 
+**MetricsDB/MPL:** For OTel metrics (`[MPL]` datasets), discover with `scripts/axiom-metrics-discover`, query with `scripts/axiom-metrics-query`. See `reference/metrics.md`.
+
 **Need more?** Open `reference/apl.md` for operators/functions, `reference/query-patterns.md` for ready-to-use investigation queries.
 
 ---
@@ -374,15 +378,16 @@ Every finding must link to its source — dashboards, queries, error reports, PR
 5. **Data responses**—Any answer citing tool-derived numbers (e.g. burn rates, error counts, usage stats, etc). Questions don't require investigation, but if you cite numbers from a query, include the source link.
 
 **Rule: If you ran a query and cite its results, generate a permalink.** Run the appropriate link tool for every query whose results appear in your response:
-- **Axiom:** `scripts/axiom-link`
+- **Axiom:** `scripts/axiom-link` (works for both APL and MPL queries)
 - **Grafana:** `scripts/grafana-link`
 - **Pyroscope:** `scripts/pyroscope-link`
 - **Sentry:** `scripts/sentry-link`
 
 **Permalinks:**
 ```bash
-# Axiom
+# Axiom (APL or MPL — same script handles both)
 scripts/axiom-link <env> "['logs'] | where status >= 500 | take 100" "1h"
+scripts/axiom-link <env> "dataset:metric.name | align to 5m using avg" "1h"
 # Grafana (metrics)
 scripts/grafana-link <env> <datasource-uid> "rate(http_requests_total[5m])" "1h"
 # Pyroscope (profiling)
@@ -480,20 +485,21 @@ See `reference/postmortem-template.md` for retrospective format.
 
 ## 15. TOOL REFERENCE
 
-### Axiom (Logs & Events)
+### Axiom (Logs & Events — APL)
 ```bash
 scripts/axiom-query <env> <<< "['dataset'] | getschema"
 scripts/axiom-query <env> <<< "['dataset'] | where _time > ago(1h) | project _time, message, level | take 5"
-scripts/axiom-query <env> --ndjson <<< "['dataset'] | where _time > ago(1h) | project _time, message | take 1"
 ```
 
-### Grafana (Metrics)
+### Axiom (MetricsDB — MPL)
 ```bash
-scripts/grafana-query <env> prometheus 'rate(http_requests_total[5m])'
+scripts/axiom-metrics-discover <env> <dataset> metrics|tags|tag-values|search
+scripts/axiom-metrics-query <env> --range 1h <<< "dataset:metric.name | align to 5m using avg"
 ```
 
-### Pyroscope (Profiling)
+### Grafana (PromQL fallback) / Pyroscope / Slack
 ```bash
+scripts/grafana-query <env> prometheus 'rate(http_requests_total[5m])'
 scripts/pyroscope-diff <env> <app_name> -2h -1h -1h now
 ```
 
@@ -518,6 +524,7 @@ scripts/slack-upload <env> <channel> ./file.png --comment "Description" --thread
 
 - `reference/apl.md`—APL operators, functions, and spotlight analysis
 - `reference/axiom.md`—Axiom API endpoints (70+)
+- `reference/metrics.md`—MetricsDB MPL querying, discovery, and patterns
 - `reference/blocks.md`—Slack Block Kit formatting
 - `reference/failure-modes.md`—Common failure patterns
 - `reference/grafana.md`—Grafana queries and PromQL patterns
diff --git a/scripts/axiom-metrics-discover b/scripts/axiom-metrics-discover
new file mode 100755
index 0000000..ab4ca14
--- /dev/null
+++ b/scripts/axiom-metrics-discover
@@ -0,0 +1,163 @@
+#!/usr/bin/env bash
+# Axiom MetricsDB info endpoint helper - discover metrics, tags, and tag values
+#
+# Usage: axiom-metrics-discover <deployment> <dataset> [options] <command> [args...]
+#
+# Commands:
+#   metrics                           List all metrics in dataset
+#   tags                              List all tags in dataset
+#   tag-values <tag>                  List values for a tag
+#   metric-tags <metric>              List tags for a metric
+#   metric-tag-values <metric> <tag>  List tag values for metric+tag
+#   search <value>                    Find metrics matching a tag value (POST)
+#
+# Options:
+#   --range <r>     Time range from now (e.g. 1h, 24h, 7d). Default: 1h
+#   --start <ts>    Start time (RFC3339)
+#   --end <ts>      End time (RFC3339)
+#
+# Examples:
+#   axiom-metrics-discover prod otel-metrics metrics
+#   axiom-metrics-discover prod otel-metrics --range 24h tags
+#   axiom-metrics-discover prod otel-metrics tag-values service.name
+#   axiom-metrics-discover prod otel-metrics metric-tags http.server.request.duration
+#   axiom-metrics-discover prod otel-metrics metric-tag-values http.server.request.duration service.name
+#   axiom-metrics-discover prod otel-metrics search "api-gateway"
+
+set -euo pipefail
+
+if [[ $# -lt 3 ]]; then
+  echo "Usage: axiom-metrics-discover <deployment> <dataset> [options] <command> [args...]" >&2
+  exit 1
+fi
+
+DEPLOYMENT="$1"
+DATASET="$2"
+shift 2
+
+START_TIME="${START_TIME:-}"
+END_TIME="${END_TIME:-}"
+RANGE="${RANGE:-}"
+
+# Parse options before command
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --start)
+      START_TIME="$2"
+      shift 2
+      ;;
+    --end)
+      END_TIME="$2"
+      shift 2
+      ;;
+    --range)
+      RANGE="$2"
+      shift 2
+      ;;
+    -*)
+      echo "Error: Unknown option '$1'." >&2
+      exit 1
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+if [[ $# -lt 1 ]]; then
+  echo "Error: No command specified. Use: metrics, tags, tag-values, metric-tags, metric-tag-values, search." >&2
+  exit 1
+fi
+
+COMMAND="$1"
+shift
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+# shellcheck disable=SC1091
+source "$SCRIPT_DIR/lib-time"
+
+# Validate time arguments
+if [[ -n "$RANGE" && ( -n "$START_TIME" || -n "$END_TIME" ) ]]; then
+  echo "Error: --range cannot be combined with --start/--end." >&2
+  exit 1
+fi
+
+if [[ -n "$RANGE" ]]; then
+  START_TIME=$(range_to_rfc3339 "$RANGE") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute time range from '$RANGE'." >&2
+    exit 1
+  fi
+elif [[ -n "$START_TIME" && -n "$END_TIME" ]]; then
+  : # explicit start/end provided
+elif [[ -n "$START_TIME" || -n "$END_TIME" ]]; then
+  echo "Error: Both --start and --end are required when specifying explicit times." >&2
+  exit 1
+else
+  # Default to 1h
+  START_TIME=$(range_to_rfc3339 "1h") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute default time range." >&2
+    exit 1
+  fi
+fi
+
+# URL-encode a path segment
+uriencode() {
+  jq -rn --arg x "$1" '$x|@uri'
+}
+
+DATASET_ENC=$(uriencode "$DATASET")
+START_ENC=$(uriencode "$START_TIME")
+END_ENC=$(uriencode "$END_TIME")
+BASE="/v1/query/metrics/info/datasets/${DATASET_ENC}"
+QS="start=${START_ENC}&end=${END_ENC}"
+
+case "$COMMAND" in
+  metrics)
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics?${QS}" | jq .
+    ;;
+  tags)
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/tags?${QS}" | jq .
+    ;;
+  tag-values)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: tag-values requires a <tag> argument." >&2
+      exit 1
+    fi
+    TAG_ENC=$(uriencode "$1")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/tags/${TAG_ENC}/values?${QS}" | jq .
+    ;;
+  metric-tags)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: metric-tags requires a <metric> argument." >&2
+      exit 1
+    fi
+    METRIC_ENC=$(uriencode "$1")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics/${METRIC_ENC}/tags?${QS}" | jq .
+    ;;
+  metric-tag-values)
+    if [[ $# -lt 2 ]]; then
+      echo "Error: metric-tag-values requires <metric> and <tag> arguments." >&2
+      exit 1
+    fi
+    METRIC_ENC=$(uriencode "$1")
+    TAG_ENC=$(uriencode "$2")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics/${METRIC_ENC}/tags/${TAG_ENC}/values?${QS}" | jq .
+    ;;
+  search)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: search requires a <value> argument." >&2
+      exit 1
+    fi
+    BODY=$(jq -nc --arg v "$1" '{"value": $v}')
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" POST "${BASE}/metrics?${QS}" "$BODY" | jq .
+    ;;
+  *)
+    echo "Error: Unknown command '$COMMAND'. Use: metrics, tags, tag-values, metric-tags, metric-tag-values, search." >&2
+    exit 1
+    ;;
+esac
diff --git a/scripts/axiom-metrics-query b/scripts/axiom-metrics-query
new file mode 100755
index 0000000..1318575
--- /dev/null
+++ b/scripts/axiom-metrics-query
@@ -0,0 +1,159 @@
+#!/usr/bin/env bash
+# Axiom MetricsDB MPL query helper - reads query from stdin
+#
+# Usage: axiom-metrics-query <deployment> [options] <<< "mpl query"
+#
+# Options:
+#   --start <ts>  Start time (RFC3339, e.g. 2025-01-01T00:00:00Z)
+#   --end <ts>    End time (RFC3339, e.g. 2025-01-02T00:00:00Z)
+#   --range <r>   Convenience range from now (e.g. 1h, 24h, 7d)
+#   --trace       Print x-axiom-trace-id on success
+#   --spec        Fetch MPL language specification (no query needed)
+#
+# Time: Either (--start + --end) or --range is required (not both).
+#       MPL does NOT support relative time expressions — RFC3339 only.
+#
+# Examples:
+#   axiom-metrics-query prod --range 1h <<< "dataset:metric.name | align to 5m using avg"
+#   axiom-metrics-query prod --start 2025-01-01T00:00:00Z --end 2025-01-02T00:00:00Z <<< "dataset:cpu.usage"
+#   axiom-metrics-query prod --spec
+
+set -euo pipefail
+
+if [[ $# -lt 1 ]]; then
+  echo "Usage: axiom-metrics-query <deployment> [options] <<< 'mpl query'" >&2
+  exit 1
+fi
+
+DEPLOYMENT="$1"
+shift
+
+PRINT_TRACE=false
+FETCH_SPEC=false
+START_TIME="${START_TIME:-}"
+END_TIME="${END_TIME:-}"
+RANGE="${RANGE:-}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --start)
+      START_TIME="$2"
+      shift 2
+      ;;
+    --end)
+      END_TIME="$2"
+      shift 2
+      ;;
+    --range)
+      RANGE="$2"
+      shift 2
+      ;;
+    --trace)
+      PRINT_TRACE=true
+      shift
+      ;;
+    --spec)
+      FETCH_SPEC=true
+      shift
+      ;;
+    *)
+      echo "Error: Unknown argument '$1'. Queries must be passed via stdin." >&2
+      exit 1
+      ;;
+  esac
+done
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+# Load config from unified config file
+# shellcheck disable=SC1090
+eval "$("$SCRIPT_DIR/config" axiom "$DEPLOYMENT")"
+
+RESP_HEADERS=$(mktemp)
+RESP_BODY=$(mktemp)
+cleanup() {
+  rm -f "$RESP_HEADERS" "$RESP_BODY"
+}
+trap cleanup EXIT
+
+# --spec: fetch MPL language specification via OPTIONS and exit
+if [[ "$FETCH_SPEC" == true ]]; then
+  HTTP_CODE=$(curl -sS -o "$RESP_BODY" -D "$RESP_HEADERS" -w "%{http_code}" \
+    -X OPTIONS "$AXIOM_URL/v1/query/_metrics" \
+    -H "Authorization: Bearer $AXIOM_TOKEN" \
+    -H "X-Axiom-Org-Id: $AXIOM_ORG_ID")
+
+  if [[ "$HTTP_CODE" -lt 200 || "$HTTP_CODE" -ge 300 ]]; then
+    msg=$(jq -r '.message // empty' "$RESP_BODY" 2>/dev/null)
+    trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+    echo "error: ${msg:-http $HTTP_CODE}" >&2
+    if [[ -n "$trace" ]]; then
+      echo "trace_id: $trace" >&2
+    fi
+    exit 1
+  fi
+
+  cat "$RESP_BODY"
+  exit 0
+fi
+
+# Require query from stdin
+if [[ -t 0 ]]; then
+  echo "Error: No query provided. Pipe a query to stdin." >&2
+  echo "" >&2
+  echo "Examples:" >&2
+  echo "  axiom-metrics-query $DEPLOYMENT --range 1h <<< \"dataset:metric.name | align to 5m using avg\"" >&2
+  exit 1
+fi
+
+# shellcheck disable=SC1091
+source "$SCRIPT_DIR/lib-time"
+
+# Validate time arguments
+if [[ -n "$RANGE" && ( -n "$START_TIME" || -n "$END_TIME" ) ]]; then
+  echo "Error: --range cannot be combined with --start/--end." >&2
+  exit 1
+fi
+
+if [[ -n "$RANGE" ]]; then
+  START_TIME=$(range_to_rfc3339 "$RANGE") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute time range from '$RANGE'." >&2
+    exit 1
+  fi
+elif [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+  echo "Error: Either (--start + --end) or --range is required." >&2
+  exit 1
+fi
+
+APL=$(cat)
+APL_JSON=$(printf '%s' "$APL" | jq -Rs .)
+START_JSON=$(printf '%s' "$START_TIME" | jq -Rs .)
+END_JSON=$(printf '%s' "$END_TIME" | jq -Rs .)
+
+HTTP_CODE=$(curl -sS -o "$RESP_BODY" -D "$RESP_HEADERS" -w "%{http_code}" \
+  -X POST "$AXIOM_URL/v1/query/_metrics?format=metrics-v1" \
+  -H "Authorization: Bearer $AXIOM_TOKEN" \
+  -H "X-Axiom-Org-Id: $AXIOM_ORG_ID" \
+  -H "Content-Type: application/json" \
+  -d "{\"apl\": $APL_JSON, \"startTime\": $START_JSON, \"endTime\": $END_JSON}")
+
+if [[ "$HTTP_CODE" -lt 200 || "$HTTP_CODE" -ge 300 ]]; then
+  msg=$(jq -r '.message // empty' "$RESP_BODY" 2>/dev/null)
+  trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+  echo "error: ${msg:-http $HTTP_CODE}" >&2
+  if [[ -n "$trace" ]]; then
+    echo "trace_id: $trace" >&2
+  fi
+  exit 1
+fi
+
+if [[ "$PRINT_TRACE" == true ]]; then
+  trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+  if [[ -n "$trace" ]]; then
+    echo "trace_id: $trace" >&2
+  fi
+fi
+
+cat "$RESP_BODY"
diff --git a/scripts/lib-time b/scripts/lib-time
new file mode 100755
index 0000000..3d2001a
--- /dev/null
+++ b/scripts/lib-time
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# Shared time utilities for Gilfoyle scripts
+# Source this file: source "$SCRIPT_DIR/lib-time"
+
+# range_to_rfc3339 converts a human range (e.g. 1h, 24h, 7d) to an RFC3339 timestamp that many seconds ago
+range_to_rfc3339() {
+  local range="$1"
+  local value="${range%[smhd]}"
+  local suffix="${range: -1}"
+
+  if ! [[ "$value" =~ ^[0-9]+$ ]]; then
+    echo "Error: Invalid range value '$range'. Expected number + suffix (s/m/h/d)." >&2
+    return 1
+  fi
+
+  local label
+  case "$suffix" in
+    s) label="second" ;;
+    h) label="hour" ;;
+    d) label="day" ;;
+    m) label="minute" ;;
+    *)
+      echo "Error: Invalid range suffix '$suffix'. Use s (seconds), m (minutes), h (hours), or d (days)." >&2
+      return 1
+      ;;
+  esac
+
+  # Pluralize for values other than 1
+  if [[ "$value" -ne 1 ]]; then
+    label="${label}s"
+  fi
+
+  # Try GNU date first (linux, or gdate on macOS), then fall back to macOS date
+  if date -u -d "1 hour ago" +%Y-%m-%dT%H:%M:%SZ &>/dev/null; then
+    # GNU date
+    date -u -d "$value $label ago" +%Y-%m-%dT%H:%M:%SZ
+  else
+    # macOS date: -v flag with uppercase suffix
+    local date_flag
+    case "$suffix" in
+      s) date_flag="-v-${value}S" ;;
+      h) date_flag="-v-${value}H" ;;
+      d) date_flag="-v-${value}d" ;;
+      m) date_flag="-v-${value}M" ;;
+    esac
+    date -u "$date_flag" +%Y-%m-%dT%H:%M:%SZ
+  fi
+}
diff --git a/scripts/sync-to-skills b/scripts/sync-to-skills
index f83997e..cadca98 100755
--- a/scripts/sync-to-skills
+++ b/scripts/sync-to-skills
@@ -88,6 +88,7 @@ find "$TARGET/scripts" "$TARGET/reference" "$TARGET/templates" -type f | while r
     # Skip binary files
     if file "$file" | grep -q "text"; then
         sedi \
+            -e 's|GILFOYLE_NO_CACHE|SRE_NO_CACHE|g' \
             -e 's|GILFOYLE_INIT_TIMEOUT|SRE_INIT_TIMEOUT|g' \
             -e 's|GILFOYLE_CONFIG_DIR|SRE_CONFIG_DIR|g' \
             -e 's|GILFOYLE_CONFIG|SRE_CONFIG|g' \
diff --git a/scripts/test-build b/scripts/test-build
index 88903c4..14ab47f 100755
--- a/scripts/test-build
+++ b/scripts/test-build
@@ -41,7 +41,7 @@ else
 fi
 
 # Test 4: correct frontmatter name
-if echo "$OUTPUT" | head -3 | grep -q "name: axiom-sre"; then
+if echo "$OUTPUT" | grep -m1 -q "name: axiom-sre"; then
     pass "axiom-sre frontmatter name correct"
 else
     fail "axiom-sre frontmatter name wrong"
diff --git a/skill/SKILL.md b/skill/SKILL.md
index aaf9f95..c58a780 100644
--- a/skill/SKILL.md
+++ b/skill/SKILL.md
@@ -181,8 +181,10 @@ Follow this loop strictly.
 
 ### D. EXECUTE (Query)
 - **Select methodology:** Golden Signals (customer-facing health), RED (request-driven services), USE (infrastructure resources)
-- **Select telemetry:** Use whatever's available—metrics, logs, traces, profiles
-- **Run query:** `scripts/axiom-query` (logs), `scripts/grafana-query` (metrics), `scripts/pyroscope-diff` (profiles)
+- **Metrics:** Axiom MetricsDB (`[MPL]` datasets from `scripts/init`), Grafana/PromQL, alerts/dashboards via Grafana
+- **Discover metrics:** `scripts/axiom-metrics-discover` (list metrics, tags, tag values in MetricsDB datasets)
+- **Alerts & dashboards:** Grafana only — `scripts/grafana-alerts`, `scripts/grafana-dashboards`
+- **Run query:** `scripts/axiom-query` (logs/APL), `scripts/axiom-metrics-query` (metrics/MPL), `scripts/grafana-query` (PromQL), `scripts/pyroscope-diff` (profiles)
 
 ### E. VERIFY & REFLECT
 - **Methodology check:** Service → RED. Resource → USE.
@@ -335,7 +337,7 @@ For request-driven services. Measures the *work* the service does.
 | **Errors** | Error rate (5xx / total) |
 | **Duration** | Latency percentiles (p50, p95, p99) |
 
-Measure via logs (APL — see `reference/apl.md`) or metrics (PromQL — see `reference/grafana.md`).
+Measure via logs (APL — see `reference/apl.md`), OTel metrics (MPL — see `reference/metrics.md`), or PromQL fallback (see `reference/grafana.md`).
 
 ### C. USE METHOD (Resources)
 
@@ -347,7 +349,7 @@ For infrastructure resources (CPU, memory, disk, network). Measures the *capacit
 | **Saturation** | Queue depth, load average, waiting threads |
 | **Errors** | Hardware/network errors |
 
-Typically measured via metrics. See `reference/grafana.md` for PromQL patterns.
+Check Axiom MetricsDB first (OTel resource metrics). Fall back to Grafana/PromQL if not available. See `reference/grafana.md` for PromQL patterns.
 
 ### D. DIFFERENTIAL ANALYSIS
 
@@ -384,6 +386,8 @@ See `reference/apl.md` for full operator, function, and pattern reference.
 - **Avoid `search`**—scans ALL fields. Last resort only.
 - **Field escaping**—dots need `\\.`: `['kubernetes.node_labels.nodepool\\.axiom\\.co/name']`
 
+**MetricsDB/MPL:** For OTel metrics (`[MPL]` datasets), discover with `scripts/axiom-metrics-discover`, query with `scripts/axiom-metrics-query`. See `reference/metrics.md`.
+
 **Need more?** Open `reference/apl.md` for operators/functions, `reference/query-patterns.md` for ready-to-use investigation queries.
 
 ---
@@ -400,15 +404,16 @@ Every finding must link to its source — dashboards, queries, error reports, PR
 5. **Data responses**—Any answer citing tool-derived numbers (e.g. burn rates, error counts, usage stats, etc). Questions don't require investigation, but if you cite numbers from a query, include the source link.
 
 **Rule: If you ran a query and cite its results, generate a permalink.** Run the appropriate link tool for every query whose results appear in your response:
-- **Axiom:** `scripts/axiom-link`
+- **Axiom:** `scripts/axiom-link` (works for both APL and MPL queries)
 - **Grafana:** `scripts/grafana-link`
 - **Pyroscope:** `scripts/pyroscope-link`
 - **Sentry:** `scripts/sentry-link`
 
 **Permalinks:**
 ```bash
-# Axiom
+# Axiom (APL or MPL — same script handles both)
 scripts/axiom-link <env> "['logs'] | where status >= 500 | take 100" "1h"
+scripts/axiom-link <env> "dataset:metric.name | align to 5m using avg" "1h"
 # Grafana (metrics)
 scripts/grafana-link <env> <datasource-uid> "rate(http_requests_total[5m])" "1h"
 # Pyroscope (profiling)
@@ -506,20 +511,21 @@ See `reference/postmortem-template.md` for retrospective format.
 
 ## 15. TOOL REFERENCE
 
-### Axiom (Logs & Events)
+### Axiom (Logs & Events — APL)
 ```bash
 scripts/axiom-query <env> <<< "['dataset'] | getschema"
 scripts/axiom-query <env> <<< "['dataset'] | where _time > ago(1h) | project _time, message, level | take 5"
-scripts/axiom-query <env> --ndjson <<< "['dataset'] | where _time > ago(1h) | project _time, message | take 1"
 ```
 
-### Grafana (Metrics)
+### Axiom (MetricsDB — MPL)
 ```bash
-scripts/grafana-query <env> prometheus 'rate(http_requests_total[5m])'
+scripts/axiom-metrics-discover <env> <dataset> metrics|tags|tag-values|search
+scripts/axiom-metrics-query <env> --range 1h <<< "dataset:metric.name | align to 5m using avg"
 ```
 
-### Pyroscope (Profiling)
+### Grafana (PromQL fallback) / Pyroscope / Slack
 ```bash
+scripts/grafana-query <env> prometheus 'rate(http_requests_total[5m])'
 scripts/pyroscope-diff <env> <app_name> -2h -1h -1h now
 ```
 
@@ -544,6 +550,7 @@ scripts/slack-upload <env> <channel> ./file.png --comment "Description" --thread
 
 - `reference/apl.md`—APL operators, functions, and spotlight analysis
 - `reference/axiom.md`—Axiom API endpoints (70+)
+- `reference/metrics.md`—MetricsDB MPL querying, discovery, and patterns
 - `reference/blocks.md`—Slack Block Kit formatting
 - `reference/failure-modes.md`—Common failure patterns
 - `reference/grafana.md`—Grafana queries and PromQL patterns
diff --git a/skill/reference/grafana.md b/skill/reference/grafana.md
index a6c2914..d657ec3 100644
--- a/skill/reference/grafana.md
+++ b/skill/reference/grafana.md
@@ -67,14 +67,14 @@ Summary view shows: Samples, Range, **Min/Max with timestamps**, Avg
 
 ## Integration with Axiom
 
-Use Grafana alongside Axiom queries for complete incident investigation. Axiom provides logs, Grafana provides infrastructure metrics.
+Grafana covers Prometheus-native metrics not shipped to Axiom and provides alerts/dashboards. For OTel metrics (application and infrastructure), Axiom MetricsDB (`[MPL]` datasets) is available.
 
-### Typical Workflow
+### Available Data Sources
 
-1. **Axiom**: Find errors/anomalies in application logs
-2. **Grafana**: Correlate with infrastructure metrics from Prometheus
-3. **Grafana**: Check what alerts fired during the incident window
-4. **Pyroscope**: If CPU/memory issue, get flame graphs
+- **Axiom MetricsDB**: OTel metrics — application and infrastructure (MPL)
+- **Axiom EventDB**: Logs, traces, error events (APL)
+- **Grafana**: Prometheus-native metrics, alerts, dashboards
+- **Pyroscope**: CPU and memory flame graphs
 
 ### Example: Investigating High Latency
 
diff --git a/skill/reference/metrics.md b/skill/reference/metrics.md
new file mode 100644
index 0000000..7055d66
--- /dev/null
+++ b/skill/reference/metrics.md
@@ -0,0 +1,178 @@
+# MetricsDB Reference
+
+## MetricsDB vs EventDB
+
+Axiom has two query engines with distinct query languages and endpoints.
+
+| | EventDB | MetricsDB |
+|--|---------|-----------|
+| **Data** | Logs, traces, spans | OTel metrics (counters, gauges, histograms) |
+| **Datasets** | Standard datasets | `otel-metrics-v1` datasets |
+| **Query language** | APL | MPL |
+| **Query script** | `scripts/axiom-query` | `scripts/axiom-metrics-query` |
+| **API endpoint** | `POST /v1/datasets/_apl` | `POST /v1/query/_metrics` |
+| **Time expressions** | `ago()`, `now()`, absolute | RFC3339 timestamps only — no relative expressions |
+
+EventDB is general-purpose event storage. MetricsDB is purpose-built for time-series metrics — optimized for aggregation, alignment, and high-cardinality tag queries on counter/gauge/histogram data.
+
+Do not query MetricsDB datasets with APL. Do not query EventDB datasets with MPL. They are separate systems.
+
+---
+
+## MPL Basics
+
+### Self-Describing Spec
+
+MPL's query endpoint documents itself. Always fetch the spec before writing queries:
+
+```bash
+scripts/axiom-metrics-query <env> --spec
+```
+
+This calls `OPTIONS /v1/query/_metrics` and returns the complete MPL language specification — syntax, operators, and examples.
+
+### Query Format
+
+```
+DATASET_NAME:METRIC_NAME | operator1 | operator2 | ...
+```
+
+The dataset and metric are specified as a single identifier separated by `:`, followed by a pipeline of operators.
+
+### Key Operators
+
+| Operator | Purpose | Example |
+|----------|---------|---------|
+| `align` | Align data to time buckets | `align to 5m using avg` |
+| `group` | Group by tag values | `group by service.name` |
+| `filter` | Filter by tag values | `filter service.name == "api"` |
+| `map` | Transform values | `map value * 100` |
+| `bucket` | Histogram bucket operations | `bucket percentile(0.99)` |
+
+### Time Constraint (CRITICAL)
+
+MPL requires RFC3339 timestamps. Relative expressions like `ago()`, `now()`, or `now-1h` are **not supported**.
+
+```bash
+# Correct: RFC3339 timestamps
+scripts/axiom-metrics-query prod --start "2025-06-01T00:00:00Z" --end "2025-06-01T01:00:00Z" <<< "my-dataset:cpu.usage | align to 5m using avg"
+
+# Wrong: relative time (will fail)
+scripts/axiom-metrics-query prod --start "now-1h" <<< "my-dataset:cpu.usage | align to 5m using avg"
+```
+
+Always use `--range` or explicit `--start`/`--end` with the query script.
+
+---
+
+## Discovery
+
+Use `scripts/axiom-metrics-discover` to explore metrics, tags, and tag values. Defaults to last 1 hour.
+
+```bash
+# List all metrics
+scripts/axiom-metrics-discover <env> <dataset> metrics
+
+# List all tags
+scripts/axiom-metrics-discover <env> <dataset> tags
+
+# List values for a tag
+scripts/axiom-metrics-discover <env> <dataset> tag-values service.name
+
+# List tags for a specific metric
+scripts/axiom-metrics-discover <env> <dataset> metric-tags http.server.request.duration
+
+# List tag values for a specific metric+tag
+scripts/axiom-metrics-discover <env> <dataset> metric-tag-values http.server.request.duration service.name
+
+# Find metrics matching a tag value (fastest path from "I know the service" to "what metrics exist")
+scripts/axiom-metrics-discover <env> <dataset> search "api-gateway"
+
+# Custom time range
+scripts/axiom-metrics-discover <env> <dataset> --range 24h metrics
+scripts/axiom-metrics-discover <env> <dataset> --start 2025-06-01T00:00:00Z --end 2025-06-02T00:00:00Z tags
+```
+
+Under the hood this calls `/v1/query/metrics/info/` endpoints via `scripts/axiom-api`. For raw access, see the API paths in the script header.
+
+---
+
+## Query Patterns
+
+### CPU usage by service
+
+```mpl
+otel-metrics:system.cpu.utilization | align to 5m using avg | group by service.name
+```
+
+### Request rate
+
+```mpl
+otel-metrics:http.server.request.duration | align to 1m using count | group by service.name
+```
+
+### Error rate from metrics
+
+```mpl
+otel-metrics:http.server.request.duration | filter http.status_code >= 500 | align to 5m using count | group by service.name
+```
+
+### Memory utilization
+
+```mpl
+otel-metrics:process.runtime.go.mem.heap_alloc | align to 5m using avg | group by service.name
+```
+
+### Histogram percentiles (p99 latency)
+
+```mpl
+otel-metrics:http.server.request.duration | align to 5m using avg | bucket percentile(0.99) | group by service.name
+```
+
+### Filter by service.name
+
+```mpl
+otel-metrics:http.server.request.duration | filter service.name == "api-gateway" | align to 1m using avg
+```
+
+### Combine filter and group
+
+```mpl
+otel-metrics:http.server.request.duration | filter service.namespace == "production" | align to 5m using count | group by service.name, http.method
+```
+
+Note: Metric and tag names depend on the OTel instrumentation. Use the discovery endpoints to find the actual names in your datasets.
+
+---
+
+## Error Handling
+
+| Code | Meaning | Action |
+|------|---------|--------|
+| 400 | Bad query syntax or invalid dataset | Check MPL syntax via `--spec` flag |
+| 401 | Missing or invalid authentication | Verify `AXIOM_TOKEN` is set and valid |
+| 403 | No permission to query this dataset | Check token scopes |
+| 404 | Dataset not found | Verify dataset name via `scripts/init` |
+| 429 | Rate limited | Back off and retry |
+| 500 | Internal server error | Report `x-axiom-trace-id` to backend team |
+
+On **500 errors**: the query script captures the `x-axiom-trace-id` response header automatically. Report this trace ID — it is essential for backend debugging.
+
+On **400 errors**: the most common cause is invalid MPL syntax. Fetch the spec (`--spec`) and compare your query against it. Common mistakes:
+- Using relative time expressions (`ago()`, `now()`)
+- Missing `align` operator (most queries need one)
+- Wrong metric or tag names (use discovery endpoints to verify)
+
+---
+
+## Workflow
+
+1. **Identify metrics datasets.** Run `scripts/init` — Axiom deployments list their datasets, including `otel-metrics-v1` types.
+
+2. **Learn MPL syntax.** Run `scripts/axiom-metrics-query <env> --spec` to get the full language specification. Read it before writing queries.
+
+3. **Discover available metrics.** Use info endpoints via `scripts/axiom-api` to list metrics and tags in the target dataset. If you know a service name, use the search endpoint to find matching metrics.
+
+4. **Compose and execute MPL query.** Build the query incrementally — start with the metric, add `align`, then `filter`/`group` as needed.
+
+5. **Iterate.** Refine filters, aggregations, and time ranges based on results. Narrow the time window for faster responses.
diff --git a/skill/scripts/axiom-metrics-discover b/skill/scripts/axiom-metrics-discover
new file mode 100755
index 0000000..ab4ca14
--- /dev/null
+++ b/skill/scripts/axiom-metrics-discover
@@ -0,0 +1,163 @@
+#!/usr/bin/env bash
+# Axiom MetricsDB info endpoint helper - discover metrics, tags, and tag values
+#
+# Usage: axiom-metrics-discover <deployment> <dataset> [options] <command> [args...]
+#
+# Commands:
+#   metrics                           List all metrics in dataset
+#   tags                              List all tags in dataset
+#   tag-values <tag>                  List values for a tag
+#   metric-tags <metric>              List tags for a metric
+#   metric-tag-values <metric> <tag>  List tag values for metric+tag
+#   search <value>                    Find metrics matching a tag value (POST)
+#
+# Options:
+#   --range <r>     Time range from now (e.g. 1h, 24h, 7d). Default: 1h
+#   --start <ts>    Start time (RFC3339)
+#   --end <ts>      End time (RFC3339)
+#
+# Examples:
+#   axiom-metrics-discover prod otel-metrics metrics
+#   axiom-metrics-discover prod otel-metrics --range 24h tags
+#   axiom-metrics-discover prod otel-metrics tag-values service.name
+#   axiom-metrics-discover prod otel-metrics metric-tags http.server.request.duration
+#   axiom-metrics-discover prod otel-metrics metric-tag-values http.server.request.duration service.name
+#   axiom-metrics-discover prod otel-metrics search "api-gateway"
+
+set -euo pipefail
+
+if [[ $# -lt 3 ]]; then
+  echo "Usage: axiom-metrics-discover <deployment> <dataset> [options] <command> [args...]" >&2
+  exit 1
+fi
+
+DEPLOYMENT="$1"
+DATASET="$2"
+shift 2
+
+START_TIME="${START_TIME:-}"
+END_TIME="${END_TIME:-}"
+RANGE="${RANGE:-}"
+
+# Parse options before command
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --start)
+      START_TIME="$2"
+      shift 2
+      ;;
+    --end)
+      END_TIME="$2"
+      shift 2
+      ;;
+    --range)
+      RANGE="$2"
+      shift 2
+      ;;
+    -*)
+      echo "Error: Unknown option '$1'." >&2
+      exit 1
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+if [[ $# -lt 1 ]]; then
+  echo "Error: No command specified. Use: metrics, tags, tag-values, metric-tags, metric-tag-values, search." >&2
+  exit 1
+fi
+
+COMMAND="$1"
+shift
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+# shellcheck disable=SC1091
+source "$SCRIPT_DIR/lib-time"
+
+# Validate time arguments
+if [[ -n "$RANGE" && ( -n "$START_TIME" || -n "$END_TIME" ) ]]; then
+  echo "Error: --range cannot be combined with --start/--end." >&2
+  exit 1
+fi
+
+if [[ -n "$RANGE" ]]; then
+  START_TIME=$(range_to_rfc3339 "$RANGE") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute time range from '$RANGE'." >&2
+    exit 1
+  fi
+elif [[ -n "$START_TIME" && -n "$END_TIME" ]]; then
+  : # explicit start/end provided
+elif [[ -n "$START_TIME" || -n "$END_TIME" ]]; then
+  echo "Error: Both --start and --end are required when specifying explicit times." >&2
+  exit 1
+else
+  # Default to 1h
+  START_TIME=$(range_to_rfc3339 "1h") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute default time range." >&2
+    exit 1
+  fi
+fi
+
+# URL-encode a path segment
+uriencode() {
+  jq -rn --arg x "$1" '$x|@uri'
+}
+
+DATASET_ENC=$(uriencode "$DATASET")
+START_ENC=$(uriencode "$START_TIME")
+END_ENC=$(uriencode "$END_TIME")
+BASE="/v1/query/metrics/info/datasets/${DATASET_ENC}"
+QS="start=${START_ENC}&end=${END_ENC}"
+
+case "$COMMAND" in
+  metrics)
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics?${QS}" | jq .
+    ;;
+  tags)
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/tags?${QS}" | jq .
+    ;;
+  tag-values)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: tag-values requires a <tag> argument." >&2
+      exit 1
+    fi
+    TAG_ENC=$(uriencode "$1")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/tags/${TAG_ENC}/values?${QS}" | jq .
+    ;;
+  metric-tags)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: metric-tags requires a <metric> argument." >&2
+      exit 1
+    fi
+    METRIC_ENC=$(uriencode "$1")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics/${METRIC_ENC}/tags?${QS}" | jq .
+    ;;
+  metric-tag-values)
+    if [[ $# -lt 2 ]]; then
+      echo "Error: metric-tag-values requires <metric> and <tag> arguments." >&2
+      exit 1
+    fi
+    METRIC_ENC=$(uriencode "$1")
+    TAG_ENC=$(uriencode "$2")
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" GET "${BASE}/metrics/${METRIC_ENC}/tags/${TAG_ENC}/values?${QS}" | jq .
+    ;;
+  search)
+    if [[ $# -lt 1 ]]; then
+      echo "Error: search requires a <value> argument." >&2
+      exit 1
+    fi
+    BODY=$(jq -nc --arg v "$1" '{"value": $v}')
+    "$SCRIPT_DIR/axiom-api" "$DEPLOYMENT" POST "${BASE}/metrics?${QS}" "$BODY" | jq .
+    ;;
+  *)
+    echo "Error: Unknown command '$COMMAND'. Use: metrics, tags, tag-values, metric-tags, metric-tag-values, search." >&2
+    exit 1
+    ;;
+esac
diff --git a/skill/scripts/axiom-metrics-query b/skill/scripts/axiom-metrics-query
new file mode 100755
index 0000000..1318575
--- /dev/null
+++ b/skill/scripts/axiom-metrics-query
@@ -0,0 +1,159 @@
+#!/usr/bin/env bash
+# Axiom MetricsDB MPL query helper - reads query from stdin
+#
+# Usage: axiom-metrics-query <deployment> [options] <<< "mpl query"
+#
+# Options:
+#   --start <ts>  Start time (RFC3339, e.g. 2025-01-01T00:00:00Z)
+#   --end <ts>    End time (RFC3339, e.g. 2025-01-02T00:00:00Z)
+#   --range <r>   Convenience range from now (e.g. 1h, 24h, 7d)
+#   --trace       Print x-axiom-trace-id on success
+#   --spec        Fetch MPL language specification (no query needed)
+#
+# Time: Either (--start + --end) or --range is required (not both).
+#       MPL does NOT support relative time expressions — RFC3339 only.
+#
+# Examples:
+#   axiom-metrics-query prod --range 1h <<< "dataset:metric.name | align to 5m using avg"
+#   axiom-metrics-query prod --start 2025-01-01T00:00:00Z --end 2025-01-02T00:00:00Z <<< "dataset:cpu.usage"
+#   axiom-metrics-query prod --spec
+
+set -euo pipefail
+
+if [[ $# -lt 1 ]]; then
+  echo "Usage: axiom-metrics-query <deployment> [options] <<< 'mpl query'" >&2
+  exit 1
+fi
+
+DEPLOYMENT="$1"
+shift
+
+PRINT_TRACE=false
+FETCH_SPEC=false
+START_TIME="${START_TIME:-}"
+END_TIME="${END_TIME:-}"
+RANGE="${RANGE:-}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --start)
+      START_TIME="$2"
+      shift 2
+      ;;
+    --end)
+      END_TIME="$2"
+      shift 2
+      ;;
+    --range)
+      RANGE="$2"
+      shift 2
+      ;;
+    --trace)
+      PRINT_TRACE=true
+      shift
+      ;;
+    --spec)
+      FETCH_SPEC=true
+      shift
+      ;;
+    *)
+      echo "Error: Unknown argument '$1'. Queries must be passed via stdin." >&2
+      exit 1
+      ;;
+  esac
+done
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+# Load config from unified config file
+# shellcheck disable=SC1090
+eval "$("$SCRIPT_DIR/config" axiom "$DEPLOYMENT")"
+
+RESP_HEADERS=$(mktemp)
+RESP_BODY=$(mktemp)
+cleanup() {
+  rm -f "$RESP_HEADERS" "$RESP_BODY"
+}
+trap cleanup EXIT
+
+# --spec: fetch MPL language specification via OPTIONS and exit
+if [[ "$FETCH_SPEC" == true ]]; then
+  HTTP_CODE=$(curl -sS -o "$RESP_BODY" -D "$RESP_HEADERS" -w "%{http_code}" \
+    -X OPTIONS "$AXIOM_URL/v1/query/_metrics" \
+    -H "Authorization: Bearer $AXIOM_TOKEN" \
+    -H "X-Axiom-Org-Id: $AXIOM_ORG_ID")
+
+  if [[ "$HTTP_CODE" -lt 200 || "$HTTP_CODE" -ge 300 ]]; then
+    msg=$(jq -r '.message // empty' "$RESP_BODY" 2>/dev/null)
+    trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+    echo "error: ${msg:-http $HTTP_CODE}" >&2
+    if [[ -n "$trace" ]]; then
+      echo "trace_id: $trace" >&2
+    fi
+    exit 1
+  fi
+
+  cat "$RESP_BODY"
+  exit 0
+fi
+
+# Require query from stdin
+if [[ -t 0 ]]; then
+  echo "Error: No query provided. Pipe a query to stdin." >&2
+  echo "" >&2
+  echo "Examples:" >&2
+  echo "  axiom-metrics-query $DEPLOYMENT --range 1h <<< \"dataset:metric.name | align to 5m using avg\"" >&2
+  exit 1
+fi
+
+# shellcheck disable=SC1091
+source "$SCRIPT_DIR/lib-time"
+
+# Validate time arguments
+if [[ -n "$RANGE" && ( -n "$START_TIME" || -n "$END_TIME" ) ]]; then
+  echo "Error: --range cannot be combined with --start/--end." >&2
+  exit 1
+fi
+
+if [[ -n "$RANGE" ]]; then
+  START_TIME=$(range_to_rfc3339 "$RANGE") || exit 1
+  END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) || exit 1
+  if [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+    echo "Error: Failed to compute time range from '$RANGE'." >&2
+    exit 1
+  fi
+elif [[ -z "$START_TIME" || -z "$END_TIME" ]]; then
+  echo "Error: Either (--start + --end) or --range is required." >&2
+  exit 1
+fi
+
+APL=$(cat)
+APL_JSON=$(printf '%s' "$APL" | jq -Rs .)
+START_JSON=$(printf '%s' "$START_TIME" | jq -Rs .)
+END_JSON=$(printf '%s' "$END_TIME" | jq -Rs .)
+
+HTTP_CODE=$(curl -sS -o "$RESP_BODY" -D "$RESP_HEADERS" -w "%{http_code}" \
+  -X POST "$AXIOM_URL/v1/query/_metrics?format=metrics-v1" \
+  -H "Authorization: Bearer $AXIOM_TOKEN" \
+  -H "X-Axiom-Org-Id: $AXIOM_ORG_ID" \
+  -H "Content-Type: application/json" \
+  -d "{\"apl\": $APL_JSON, \"startTime\": $START_JSON, \"endTime\": $END_JSON}")
+
+if [[ "$HTTP_CODE" -lt 200 || "$HTTP_CODE" -ge 300 ]]; then
+  msg=$(jq -r '.message // empty' "$RESP_BODY" 2>/dev/null)
+  trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+  echo "error: ${msg:-http $HTTP_CODE}" >&2
+  if [[ -n "$trace" ]]; then
+    echo "trace_id: $trace" >&2
+  fi
+  exit 1
+fi
+
+if [[ "$PRINT_TRACE" == true ]]; then
+  trace=$(grep -i '^x-axiom-trace-id:' "$RESP_HEADERS" | tail -1 | awk '{print $2}' | tr -d '\r')
+  if [[ -n "$trace" ]]; then
+    echo "trace_id: $trace" >&2
+  fi
+fi
+
+cat "$RESP_BODY"
diff --git a/skill/scripts/discover-axiom b/skill/scripts/discover-axiom
index b92f060..f324b22 100755
--- a/skill/scripts/discover-axiom
+++ b/skill/scripts/discover-axiom
@@ -30,6 +30,48 @@ echo -e "${BLUE}=== Axiom Deployments ===${NC}"
 TMP_DIR=$(mktemp -d)
 trap 'rm -rf "$TMP_DIR"' EXIT
 
+# Cache config
+CACHE_DIR="${GILFOYLE_CONFIG_DIR:-$HOME/.config/gilfoyle}/cache/axiom"
+CACHE_TTL=600  # 10 minutes
+mkdir -p "$CACHE_DIR"
+
+# Get file mtime as epoch seconds (Linux first, then macOS)
+# GNU stat -f means --file-system, not format — must try GNU form first
+file_mtime() {
+    local f="$1"
+    stat -c %Y "$f" 2>/dev/null || stat -f %m "$f" 2>/dev/null
+}
+
+# Fetch /v1/datasets with per-deployment caching
+get_catalog() {
+    local dep="$1"
+    local cache_file="$CACHE_DIR/$dep/datasets.json"
+
+    if [[ "${GILFOYLE_NO_CACHE:-}" != "1" && -f "$cache_file" ]]; then
+        local now mtime age
+        now=$(date +%s)
+        mtime=$(file_mtime "$cache_file")
+        age=$(( now - mtime ))
+        if [[ "$age" -lt "$CACHE_TTL" ]]; then
+            cat "$cache_file"
+            return
+        fi
+    fi
+
+    local data
+    data=$("$SCRIPT_DIR/axiom-api" "$dep" GET "/v1/datasets" 2>/dev/null || echo "")
+
+    if [[ -n "$data" ]]; then
+        mkdir -p "$CACHE_DIR/$dep"
+        local tmp_file="$cache_file.tmp.$$"
+        printf '%s' "$data" > "$tmp_file"
+        chmod 600 "$tmp_file"
+        mv "$tmp_file" "$cache_file"
+    fi
+
+    printf '%s' "$data"
+}
+
 # Helper for millisecond timestamp using Bash built-in
 current_time_ms() {
     # EPOCHREALTIME is available in Bash 5.0+
@@ -60,11 +102,47 @@ discover_dep() {
         
         if [[ -n "$POPULAR_DATASETS" ]]; then
             count=$(echo "$POPULAR_DATASETS" | grep -c .)
-            echo -e "  ${GREEN}Top datasets found ($count)${NC} (${DURATION_QUERY}ms)"
-            echo "$POPULAR_DATASETS" | sed 's/^/  - /'
+
+            # Fetch dataset catalog to identify MetricsDB datasets
+            catalog=$(get_catalog "$dep")
+            metrics_set=$(echo "$catalog" | jq -r '.[] | select(.kind == "otel:metrics:v1") | .name' 2>/dev/null || echo "")
+
+            END_CATALOG=$(current_time_ms)
+            DURATION_CATALOG=$(( END_CATALOG - END_QUERY ))
+
+            echo -e "  ${GREEN}Top datasets found ($count)${NC} (query: ${DURATION_QUERY}ms, catalog: ${DURATION_CATALOG}ms)"
+
+            # Tag popular datasets: [MPL] for MetricsDB, plain for EventDB
+            while IFS= read -r ds; do
+                if echo "$metrics_set" | grep -qxF "$ds"; then
+                    echo "  - [MPL] $ds"
+                else
+                    echo "  - $ds"
+                fi
+            done <<< "$POPULAR_DATASETS"
+
+            # Surface MetricsDB datasets not in the popular list
+            if [[ -n "$metrics_set" ]]; then
+                unlisted=""
+                while IFS= read -r mds; do
+                    if ! echo "$POPULAR_DATASETS" | grep -qxF "$mds"; then
+                        unlisted="${unlisted:+$unlisted
+}$mds"
+                    fi
+                done <<< "$metrics_set"
+
+                metrics_total=$(echo "$metrics_set" | grep -c .)
+                if [[ -n "$unlisted" ]]; then
+                    unlisted_count=$(echo "$unlisted" | grep -c .)
+                    echo -e "  ${GREEN}MetricsDB datasets ($metrics_total total, $unlisted_count not in top):${NC}"
+                    echo "$unlisted" | sort | head -n 10 | sed 's/^/  - [MPL] /' || true
+                else
+                    echo -e "  ${GREEN}MetricsDB datasets ($metrics_total total, all in top list)${NC}"
+                fi
+            fi
         else
             # Strategy 2: Fallback
-            response=$("$SCRIPT_DIR/axiom-api" "$dep" GET "/v1/datasets" 2>/dev/null || echo "")
+            response=$(get_catalog "$dep")
             END_FALLBACK=$(current_time_ms)
             DURATION_FALLBACK=$(( END_FALLBACK - END_QUERY ))
             
@@ -72,11 +150,27 @@ discover_dep() {
             
             if [[ "$count" -gt 0 ]]; then
                 echo -e "  ${GREEN}$count datasets found${NC} (query: ${DURATION_QUERY}ms, fallback: ${DURATION_FALLBACK}ms)"
-                echo "$response" | jq -r '.[] | "  - " + .name' | sort | head -n 10
+
+                # Identify MetricsDB datasets (otel-metrics-v1)
+                metrics_datasets=$(echo "$response" | jq -r '.[] | select(.kind == "otel:metrics:v1") | .name' 2>/dev/null || echo "")
+
+                # Tag MetricsDB datasets inline, consistent with Strategy 1
+                echo "$response" | jq -r '.[] | .name' | sort | while IFS= read -r ds; do
+                    if [[ -n "$metrics_datasets" ]] && echo "$metrics_datasets" | grep -qxF "$ds"; then
+                        echo "  - [MPL] $ds"
+                    else
+                        echo "  - $ds"
+                    fi
+                done | head -n 10 || true
                 if [[ "$count" -gt 10 ]]; then
                     echo "  - ... (and $((count - 10)) more)"
                     echo -e "  ${BOLD}To search:${NC} scripts/axiom-api $dep GET \"/v1/datasets\" | jq -r '.[].name' | grep \"pattern\""
                 fi
+
+                if [[ -n "$metrics_datasets" ]]; then
+                    metrics_count=$(echo "$metrics_datasets" | grep -c .)
+                    echo -e "  ${GREEN}MetricsDB datasets ($metrics_count total)${NC}"
+                fi
             else
                 echo -e "  ${RED}No datasets found or auth failed${NC} (total: $((DURATION_QUERY + DURATION_FALLBACK))ms)"
             fi
diff --git a/skill/scripts/lib-time b/skill/scripts/lib-time
new file mode 100755
index 0000000..3d2001a
--- /dev/null
+++ b/skill/scripts/lib-time
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# Shared time utilities for Gilfoyle scripts
+# Source this file: source "$SCRIPT_DIR/lib-time"
+
+# range_to_rfc3339 converts a human range (e.g. 1h, 24h, 7d) to an RFC3339 timestamp that many seconds ago
+range_to_rfc3339() {
+  local range="$1"
+  local value="${range%[smhd]}"
+  local suffix="${range: -1}"
+
+  if ! [[ "$value" =~ ^[0-9]+$ ]]; then
+    echo "Error: Invalid range value '$range'. Expected number + suffix (s/m/h/d)." >&2
+    return 1
+  fi
+
+  local label
+  case "$suffix" in
+    s) label="second" ;;
+    h) label="hour" ;;
+    d) label="day" ;;
+    m) label="minute" ;;
+    *)
+      echo "Error: Invalid range suffix '$suffix'. Use s (seconds), m (minutes), h (hours), or d (days)." >&2
+      return 1
+      ;;
+  esac
+
+  # Pluralize for values other than 1
+  if [[ "$value" -ne 1 ]]; then
+    label="${label}s"
+  fi
+
+  # Try GNU date first (linux, or gdate on macOS), then fall back to macOS date
+  if date -u -d "1 hour ago" +%Y-%m-%dT%H:%M:%SZ &>/dev/null; then
+    # GNU date
+    date -u -d "$value $label ago" +%Y-%m-%dT%H:%M:%SZ
+  else
+    # macOS date: -v flag with uppercase suffix
+    local date_flag
+    case "$suffix" in
+      s) date_flag="-v-${value}S" ;;
+      h) date_flag="-v-${value}H" ;;
+      d) date_flag="-v-${value}d" ;;
+      m) date_flag="-v-${value}M" ;;
+    esac
+    date -u "$date_flag" +%Y-%m-%dT%H:%M:%SZ
+  fi
+}