Like zipping a project, but keeping it searchable by meaning. Ask questions, then open only the files that matter.
Install.rag is a binary file format. It stores original file bytes (zstd-compressed), per-file metadata, and a semantic embedding table in a single archive. Files are preserved byte-for-byte and extractable individually.
rag pack walks a directory, extracts text from each file, computes 384-dimensional embeddings using a local ONNX model, compresses everything, and writes a single .rag file.
rag query reads the manifest and embeddings (typically a few hundred KB), classifies the query, and routes to the cheapest retrieval path. Metadata queries never decompress blobs. Semantic queries decompress only the top-K matching files.
dotRAG is like zipping a project, but making it searchable by meaning.
The original files are still inside the archive. Nothing gets flattened into a vague summary.
When you ask a question, dotRAG opens only the relevant files, so the model sees the same code and docs without scanning the whole folder.
179 MB → 456 KB is a storage reduction: 171 MB was node_modules, excluded by default smart packing. Compression ratio on included source files is 2.69×.
Six retrieval tasks on the same codebase. Compared against a zero-context agent using filesystem tools (Glob, Grep, Read). Token counts measured from response byte sizes.
| Task | dotRAG | Filesystem | Ratio |
|---|---|---|---|
| File discovery | 1,914 | 68,073 | 35.6× |
| Semantic search | 592 | 5,418 | 9.2× |
| File read | 343 | 332 | 1.0× |
| Conceptual query | 503 | 1,429 | 2.8× |
| Exact symbol lookup | 432 | 200 | 0.5× |
| Multi-step workflow | 593 | 3,052 | 5.1× |
Lower token counts on discovery, semantic, and conceptual tasks. Grep is cheaper for exact symbol lookup. The two are complementary.
Cosine similarity over 384-dim embeddings. Queries match files by meaning, not string overlap. Embedding model runs locally via ONNX runtime.
Respects .gitignore. Default exclusions for node_modules, .git, build artifacts, vendor dirs. Configurable include/exclude patterns and max file size.
Per-file blob addressing via byte offsets. Metadata queries read only the manifest. Semantic queries decompress only top-K blobs.
Original file bytes stored with zstd compression. Extracted files are identical to the source. SHA-256 content hashes for verification.
Local embedding model (~80 MB, cached after first download). No network, no API keys, no external services at query time.
Exposes six tools over stdio transport: search, read_file, list_files, list_archives, inspect, extract. Compatible with Claude Code, Cursor, Windsurf.
dotRAG implements the Model Context Protocol over stdio. It exposes six tools: dotrag_search, dotrag_read_file, dotrag_list_files, dotrag_list_archives, dotrag_inspect, dotrag_extract. Configuration for each client below.
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Workspace: .vscode/mcp.json
Global: ~/.cursor/mcp.json | Project: .cursor/mcp.json
~/.codeium/windsurf/mcp_config.json
Add to ~/.config/zed/settings.json
VS Code: ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
Project: .roo/mcp.json
Workspace: .continue/mcpServers/dotrag.json
Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add
rag must be on $PATH. If not, use the absolute path (e.g. /usr/local/bin/rag) in the command field.
The .rag format is a binary archive with five sections laid out sequentially. All section offsets are stored in a fixed 512-byte header. No external indexes, no sidecar files.
| Decision | Rationale |
|---|---|
| Embeddings in binary float32, not JSON | 3× smaller archives. A 384-dim vector is 1,536 bytes in binary vs ~4,600 bytes in JSON. |
| Manifest separate from embeddings | Metadata queries (file counts, types, summaries) load only the manifest. Fast, no vector math needed. |
| Per-file blob compression | Random access. Decompress one file without touching the rest. Critical for selective query routing. |
| Fixed 512-byte header | All section offsets readable in a single seek. No scanning, no variable-length preamble. |
| Zstd compression throughout | Best ratio-to-speed tradeoff for mixed content. Decompression is ~1 GB/s on modern hardware. |
| Content hash in header | Integrity verification without reading the full archive. Detect corruption or tampering early. |
When you run rag query, the engine classifies your question and picks the cheapest path:
| Route | What loads | When used |
|---|---|---|
| MANIFEST_ONLY | Header + manifest | "How many files?" / "What types?" / "When was this packed?" |
| SELECTIVE_DECOMPRESS | Header + manifest + embeddings + top-K blobs | "How does auth work?" / "Find the database layer" |
| FULL_SCAN | Everything | Broad questions that need cross-file context |
The format is open. The archive layout and routing model are documented in this section so the file structure is inspectable without any external service.
No runtime dependencies. The embedding model (~80 MB ONNX) downloads on first use and caches at ~/.cache/huggingface/.
| rag pack <dir> | Create a .rag archive from a directory |
| rag query <archive> <q> | Semantic search within an archive |
| rag query-all <q> | Search across all registered archives |
| rag inspect <archive> | View metadata and verify integrity |
| rag extract <archive> | Extract files to disk |
| rag list-files <archive> | List files with summaries |
| rag index | Rebuild the machine-wide archive registry |
| rag mcp | Launch MCP server for AI agent integration |
Not a grep replacement. Exact symbol lookup with line numbers is faster with grep. Semantic search returns top-K approximations, not exhaustive results.
Archives are point-in-time snapshots. Source changes make the archive stale. Repack when the directory changes.
--include-everything exists for forensic use. On real codebases it degrades search quality because generated files dilute the embedding space.