Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
74f4365
feat(grep): integrate VikingDB bm25 keyword search for grep engine
ByteDanceLiuYang May 20, 2026
af0423e
fix(grep): address CI review feedback: max-size eviction to _count_ca…
ByteDanceLiuYang May 20, 2026
0a4f7c3
fix(schema): use dynamic __version__ for schema_version and handle de…
ByteDanceLiuYang May 21, 2026
416aced
fix(schema): upsert data to vikingdb lack of content
ByteDanceLiuYang May 21, 2026
84bdd49
chore: add benchmark for retrieval
ByteDanceLiuYang May 22, 2026
df5a376
fix(grep): vikingdb return 200 and no results means no matching conte…
ByteDanceLiuYang May 25, 2026
9d07865
fix(benchmark): sub uri args; add report
ByteDanceLiuYang May 25, 2026
7be7697
refactor: code format by ruff
ByteDanceLiuYang May 25, 2026
261aaf2
optimize: move grep config (engine and switch_to_remote_threshold) to…
ByteDanceLiuYang May 26, 2026
3ea7b3f
optimize: auto adapt remote_return_limit by agg API; rm unnecessary p…
ByteDanceLiuYang May 26, 2026
1c65d64
fix: adjust benchmark scripts
ByteDanceLiuYang May 27, 2026
f9b4065
fix(grep): store full content for BM25; use PathScope depth; reduce r…
ByteDanceLiuYang May 27, 2026
240fd27
refactor: new benchmark
ByteDanceLiuYang May 28, 2026
7337653
fix: step1 add resource by real code data
ByteDanceLiuYang May 29, 2026
a03c61b
feat(benchmark): split grep benchmark into effectiveness/performance …
ByteDanceLiuYang May 29, 2026
6a80745
optimize (benchmark): adjust keywords and ground truth for testing
ByteDanceLiuYang Jun 1, 2026
599ae64
fix: truncate 64KB for content field
ByteDanceLiuYang Jun 1, 2026
ae9c078
Merge branch 'main' into grep_vikingdb
ByteDanceLiuYang Jun 1, 2026
f13617b
optimize: effectiveness add resource plainly
ByteDanceLiuYang Jun 2, 2026
0fe3c1e
optimize: change param use of SearchByKeywords from "keywords" to "qu…
ByteDanceLiuYang Jun 3, 2026
02d18d8
optimize(benchmark): refactor effectiveness scripts
ByteDanceLiuYang Jun 4, 2026
a6aefef
Merge branch 'main' into grep_vikingdb
ByteDanceLiuYang Jun 10, 2026
0c8eef0
optimize: ensure raw data for content field
ByteDanceLiuYang Jun 11, 2026
a570241
Merge branch 'main' into grep_vikingdb
ByteDanceLiuYang Jun 11, 2026
4b8481a
optimize: fulltext analyzer's stop-words only use symbols
ByteDanceLiuYang Jun 11, 2026
3ae5dbf
fix: adapt to new ov cli for benchmark
ByteDanceLiuYang Jun 12, 2026
211d04b
optimize: reuse file content to avoid re-read AGFS file
ByteDanceLiuYang Jun 12, 2026
8f0a7dc
Merge branch 'main' into grep_vikingdb
ByteDanceLiuYang Jun 12, 2026
b132249
optimize: tune grep vikingdb defaults and refresh bm25 benchmark scripts
ByteDanceLiuYang Jun 15, 2026
f135133
optimize: benchmark client timeout
ByteDanceLiuYang Jun 17, 2026
4ec3ec9
update README
ByteDanceLiuYang Jun 17, 2026
75a5483
Merge branch 'main' into grep_vikingdb
ByteDanceLiuYang Jun 23, 2026
98424e1
fix: rm unused param
ByteDanceLiuYang Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion benchmark/locomo/vikingbot/preflight_eval_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ def _resolve_ov_conf_path() -> Path:
return Path(configured_path).expanduser()

resolved = resolve_config_path(None, OPENVIKING_CONFIG_ENV, DEFAULT_OV_CONF)
default_path = str(resolved) if resolved is not None else str(Path.home() / ".openviking" / "ov.conf")
default_path = (
str(resolved) if resolved is not None else str(Path.home() / ".openviking" / "ov.conf")
)

if _is_interactive():
_log_info(f"OpenViking 配置默认路径: {default_path}")
Expand Down
118 changes: 118 additions & 0 deletions benchmark/retrieval/grep/vikingdb_bm25/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# VikingDB BM25 Grep Benchmark

Benchmark suite for evaluating OpenViking's grep retrieval with VikingDB BM25 engine.

## Directory Structure

```
vikingdb_bm25/
├── ai_wiki.txt # Source text for synthetic data generation
├── effectiveness/ # Retrieval effectiveness (recall/precision/F1)
│ ├── step1_add_resource.py
│ └── step2_quality.py
└── performance/ # Retrieval performance (latency + returned match count at scale)
├── step0_prepare_data.py
├── step1_add_resource.py
├── step2_reindex.py
└── step3_benchmark.py
```

## Effectiveness — Retrieval Quality

Tests whether grep can find **all** matching files in real code repositories.

**Data source:** Real code repos (download manually, place under `~/.openviking/data/benchmark/`).

| Step | Script | Description |
|------|--------|-------------|
| 1 | `step1_add_resource.py` | Import code repos (with indexing, single import) |
| 2 | `step2_quality.py` | Compare grep results vs ground truth (fs engine, cached) |

### Usage

```bash
# Step 1: Import repos (with VLM/embedding, single import)
cd effectiveness/
python3 step1_add_resource.py --source ~/.openviking/data/benchmark/OpenViking-main

# Step 2: Evaluate retrieval quality
# First run MUST use engine=fs in ov.conf to generate ground truth cache:
# 1. Set ov.conf: "grep": {"engine": "fs"}
# 2. Restart server
python3 step2_quality.py --keywords grep reindex SyncHTTPClient

# Subsequent runs can use any engine (ground truth is read from cache):
# 1. Set ov.conf: "grep": {"engine": "auto", "switch_to_remote_threshold": 0}
# 2. Restart server
python3 step2_quality.py --keywords grep reindex SyncHTTPClient

# Optional: --regenerate-ground-truth (force recompute, requires engine=fs)
```

## Performance — Latency at Scale

Tests grep speed and returned match count on a large synthetic dataset (default: 200K files).

**Data source:** Generated from `ai_wiki.txt` with target words injected at known probabilities.

| Step | Script | Description |
|------|--------|-------------|
| 0 | `step0_prepare_data.py` | Generate synthetic dataset (dir_xxx/wiki_xxx.txt) |
| 1 | `step1_add_resource.py` | Import data (no VLM/embedding, fast) |
| 2 | `step2_reindex.py` | Async reindex via openviking-server (concurrency=16, polling) |
| 3 | `step3_benchmark.py` | Measure latency and returned match count with `node_limit=256` |

### Target Words

15 words across 5 probability tiers:

These word groups are defined in `performance/step0_prepare_data.py` and reused by `performance/step3_benchmark.py`.

| Probability | Words | Expected hits (per 200K files) |
|-------------|-------|-------------------------------|
| 1% | heliofract, prismcache, fluxkernel | ~2,000 |
| 0.1% | auroracode, kiteshade, glyphvector | ~200 |
| 0.1% | cortexmint, latticewave, spiralsync | ~200 |
| 0.05% | ripplehash, embertrace, novaframe | ~100 |
| 0.01% | zephyrloom, quartzrelay, nebulaindex | ~20 |

### Usage

```bash
cd performance/

# Step 0: Generate data (default: 200 dirs x 1000 files = 200K files)
python3 step0_prepare_data.py

# Optional: append more data for scale-out without overwriting existing dirs
python3 step0_prepare_data.py --start-dir 100 --num-dirs 100

# Step 1: Import without indexing (fast)
python3 step1_add_resource.py

# Step 2: Build vector indexes (requires openviking-server running)
python3 step2_reindex.py
# Optional: --concurrency N (default: 16)

# Step 3: Benchmark — run with different engine configs
# Run A: fs engine
# 1. Set ov.conf: "grep": {"engine": "fs"}
# 2. Restart server
python3 step3_benchmark.py --engine-label fs

# Run B: auto engine (bm25)
# 1. Set ov.conf: "grep": {"engine": "auto", "switch_to_remote_threshold": 0}
# 2. Restart server
python3 step3_benchmark.py --engine-label auto --compare step3_result_fs.json
```

## Key Concepts

- **Effectiveness** tests compare grep results against ground truth from fs-engine grep (cached locally)
- **Performance** tests compare grep latency and returned match counts between engine configs; no ground truth is generated
- **Effectiveness** imports real repos with indexing in a single step, then evaluates quality
- **Performance** imports synthetic data without indexing, builds vector indexes asynchronously, then benchmarks latency
- **Performance** import/reindex steps support resumable execution via progress files
- Change grep engine via `ov.conf` and restart the server between benchmark runs
- To horizontally scale the synthetic dataset, run Step 0 again with a new `--start-dir`,
then rerun Step 1 and Step 2.
117 changes: 117 additions & 0 deletions benchmark/retrieval/grep/vikingdb_bm25/README_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# VikingDB BM25 Grep 基准测试

用于评估 OpenViking grep 检索配合 VikingDB BM25 引擎的基准测试套件。

## 目录结构

```
vikingdb_bm25/
├── ai_wiki.txt # 合成数据生成的原始文本
├── effectiveness/ # 检索效果测试(召回率/精确率/F1)
│ ├── step1_add_resource.py
│ └── step2_quality.py
└── performance/ # 检索性能测试(延迟 + 大规模返回匹配数)
├── step0_prepare_data.py
├── step1_add_resource.py
├── step2_reindex.py
└── step3_benchmark.py
```

## Effectiveness — 检索效果

测试 grep 在真实代码仓库中是否能找到**所有**匹配文件。

**数据来源:** 真实代码仓库(手动下载,放置于 `~/.openviking/data/benchmark/`)。

| 步骤 | 脚本 | 说明 |
|------|------|------|
| 1 | `step1_add_resource.py` | 导入代码仓库(含建索引,一次性导入) |
| 2 | `step2_quality.py` | SDK grep 与 fs 引擎 ground truth 对比(缓存) |

### 使用方法

```bash
# 步骤 1:导入代码仓库(含建索引,一次性导入)
cd effectiveness/
python3 step1_add_resource.py --source ~/.openviking/data/benchmark/OpenViking-main

# 步骤 2:评估检索质量
# 首次运行必须使用 engine=fs 生成 ground truth 缓存:
# 1. 设置 ov.conf: "grep": {"engine": "fs"}
# 2. 重启服务
python3 step2_quality.py --keywords grep reindex SyncHTTPClient

# 后续运行可使用任意引擎(ground truth 从缓存读取):
# 1. 设置 ov.conf: "grep": {"engine": "auto", "switch_to_remote_threshold": 0}
# 2. 重启服务
python3 step2_quality.py --keywords grep reindex SyncHTTPClient

# 可选参数:--regenerate-ground-truth (强制重算,需 engine=fs)
```

## Performance — 检索延迟

在大规模合成数据集(默认 20 万文件)上测试 grep 速度和返回匹配数。

**数据来源:** 从 `ai_wiki.txt` 生成,按已知概率注入目标单词。

| 步骤 | 脚本 | 说明 |
|------|------|------|
| 0 | `step0_prepare_data.py` | 生成合成数据集(dir_xxx/wiki_xxx.txt) |
| 1 | `step1_add_resource.py` | 导入数据(不建索引,速度快) |
| 2 | `step2_reindex.py` | 通过 openviking-server 异步构建索引(并发=16,轮询) |
| 3 | `step3_benchmark.py` | 使用 `node_limit=256` 测量延迟和返回匹配数 |

### 目标单词

15 个单词,分 5 个概率层级:

这些词组定义在 `performance/step0_prepare_data.py` 中,并由 `performance/step3_benchmark.py` 复用。

| 概率 | 单词 | 预期命中数(每 20 万文件) |
|------|------|---------------------------|
| 1% | heliofract, prismcache, fluxkernel | ~2,000 |
| 0.1% | auroracode, kiteshade, glyphvector | ~200 |
| 0.1% | cortexmint, latticewave, spiralsync | ~200 |
| 0.05% | ripplehash, embertrace, novaframe | ~100 |
| 0.01% | zephyrloom, quartzrelay, nebulaindex | ~20 |

### 使用方法

```bash
cd performance/

# 步骤 0:生成数据(默认:200 目录 x 1000 文件 = 20 万文件)
python3 step0_prepare_data.py

# 可选:追加更多数据,用于水平扩容,不覆盖已有目录
python3 step0_prepare_data.py --start-dir 100 --num-dirs 100

# 步骤 1:导入数据(不建索引,速度快)
python3 step1_add_resource.py

# 步骤 2:构建向量索引(需 openviking-server 运行中)
python3 step2_reindex.py
# 可选参数:--concurrency N (默认:16)

# 步骤 3:基准测试 — 用不同引擎配置各跑一次
# 运行 A:fs 引擎
# 1. 设置 ov.conf: "grep": {"engine": "fs"}
# 2. 重启服务
python3 step3_benchmark.py --engine-label fs

# 运行 B:auto 引擎(bm25)
# 1. 设置 ov.conf: "grep": {"engine": "auto", "switch_to_remote_threshold": 0}
# 2. 重启服务
python3 step3_benchmark.py --engine-label auto --compare step3_result_fs.json
```

## 核心概念

- **Effectiveness(效果测试)** 将 grep 结果与 fs 引擎的 ground truth 对比(本地缓存)
- **Performance(性能测试)** 对比不同引擎的延迟和返回匹配数,不生成 ground truth
- **Effectiveness** 直接一次性导入真实代码仓并建索引,然后执行效果评估
- **Performance** 先导入合成数据(不建索引),再异步建向量索引,最后执行延迟基准测试
- **Performance** 的导入与 reindex 步骤支持**断点续传**(各有独立进度文件)
- 切换 grep 引擎需修改 `ov.conf` 并重启服务,在不同运行之间对比
- 如需水平扩展合成数据集,可用新的 `--start-dir` 再运行步骤 0,然后重跑步骤 1 和步骤 2。
Loading
Loading