added architecture diagram in README#90
Conversation
|
@Siddhant-K-code was playing around with Distill and found it an interesting project. Felt like it missed a high-level architecture diagram, which would fasten the understanding of the product. |
|
Hey @AmitKarnam, thanks for the PR! I've just pushed couple of PRs & releases, so you might need to update your diagram! |
|
@Siddhant-K-code have updated the architecture diagram. PTAL |
|
@Siddhant-K-code PTAL |
Diagram reviewThe core dedup flow is accurate — the node labels, field names, and logic all match the code. A few things worth fixing before merging. ❌ The write-time dedup arrow is wrongThe dashed red box ("Write-time dedupe / semantic dedup & conflict detection") has an arrow pointing back into the main pipeline. That connection doesn't exist. Fix: Remove the arrow. If you want to show the memory subsystem, draw it as a standalone box with a note that it's a separate endpoint.
|
|
|
||
| **Result:** Deterministic, diverse context. No LLM calls. Fully auditable. | ||
|
|
||
| ### Dedup pipeline architecture diagram |
There was a problem hiding this comment.
Remove this heading — it duplicates the section it's inside.
The parent section is already ### Dedup pipeline. Adding another heading called ### Dedup pipeline architecture diagram immediately inside it is redundant.
Pick one:
- Dedup-only diagram → drop the heading entirely, the image is self-explanatory.
- Full system architecture → move the image above
## Installationand give it its own### Architecturesection.
|
|
||
| ### Dedup pipeline architecture diagram | ||
|  | ||
|
|
There was a problem hiding this comment.
Extra blank line — one is enough here. Delete this line.
|
Thanks for adding the diagram — visuals help here. Two things to sort out before this merges: 1. What does the diagram actually show?
2. The heading is redundant (see inline comment on line 63). Once those two are resolved this should be good to go. |
|
Diagram accuracy review — checked against the implementation. The overall flow is correct and the diagram is a useful addition. A few inaccuracies to fix: ❌ Input fields — The diagram shows ❌ Selection node — "by score" is hardcoded, not a general rule The diagram says "pick representative per cluster (by score)". This is accurate for the current HTTP handler (it hardcodes ❌ MMR node — trigger condition is wrong The diagram says "Optional MMR — if If Also: ❌ Output — missing fields The output box shows Also, ✅ Things that are correct
|
Added an architecture diagram in the README, which would help with understanding the product