dotRAG
.rag

Like zipping a project, but keeping it searchable by meaning. Ask questions, then open only the files that matter.

Install
What it does

A compressed archive format with built-in search

.rag is a binary file format. It stores original file bytes (zstd-compressed), per-file metadata, and a semantic embedding table in a single archive. Files are preserved byte-for-byte and extractable individually.

rag pack walks a directory, extracts text from each file, computes 384-dimensional embeddings using a local ONNX model, compresses everything, and writes a single .rag file.

rag query reads the manifest and embeddings (typically a few hundred KB), classifies the query, and routes to the cheapest retrieval path. Metadata queries never decompress blobs. Semantic queries decompress only the top-K matching files.

What This Means

dotRAG is like zipping a project, but making it searchable by meaning.

The original files are still inside the archive. Nothing gets flattened into a vague summary.

When you ask a question, dotRAG opens only the relevant files, so the model sees the same code and docs without scanning the whole folder.

Usage
# Pack a project $ rag pack ./loopsy -o loopsy.rag Files: 124 Original: 668 KB (179 MB dir, node_modules excluded) Compressed: 249 KB Archive: 456 KB # Search by meaning $ rag query loopsy.rag "how does peer discovery work?" Route: SELECTIVE_DECOMPRESS (5 files) Time: 20ms packages/discovery/src/mdns.ts 2.33 packages/discovery/src/peer-registry.ts 1.89 packages/discovery/src/health-checker.ts 1.74 # Extract, build, test $ rag extract loopsy.rag -o /tmp/loopsy $ cd /tmp/loopsy && pnpm install && pnpm test 26/26 tests passed

179 MB → 456 KB is a storage reduction: 171 MB was node_modules, excluded by default smart packing. Compression ratio on included source files is 2.69×.

Benchmarks

Token cost comparison

Six retrieval tasks on the same codebase. Compared against a zero-context agent using filesystem tools (Glob, Grep, Read). Token counts measured from response byte sizes.

456KB
from 179 MB directory
16.6×
fewer tokens overall
20ms
semantic query time
Task dotRAG Filesystem Ratio
File discovery1,91468,07335.6×
Semantic search5925,4189.2×
File read3433321.0×
Conceptual query5031,4292.8×
Exact symbol lookup4322000.5×
Multi-step workflow5933,0525.1×

Lower token counts on discovery, semantic, and conceptual tasks. Grep is cheaper for exact symbol lookup. The two are complementary.

Properties

What the format provides

Semantic retrieval

Cosine similarity over 384-dim embeddings. Queries match files by meaning, not string overlap. Embedding model runs locally via ONNX runtime.

Filtered packing

Respects .gitignore. Default exclusions for node_modules, .git, build artifacts, vendor dirs. Configurable include/exclude patterns and max file size.

Selective decompression

Per-file blob addressing via byte offsets. Metadata queries read only the manifest. Semantic queries decompress only top-K blobs.

Lossless extraction

Original file bytes stored with zstd compression. Extracted files are identical to the source. SHA-256 content hashes for verification.

Offline operation

Local embedding model (~80 MB, cached after first download). No network, no API keys, no external services at query time.

MCP server

Exposes six tools over stdio transport: search, read_file, list_files, list_archives, inspect, extract. Compatible with Claude Code, Cursor, Windsurf.

MCP integration

Agent integration via MCP

dotRAG implements the Model Context Protocol over stdio. It exposes six tools: dotrag_search, dotrag_read_file, dotrag_list_files, dotrag_list_archives, dotrag_inspect, dotrag_extract. Configuration for each client below.

Claude Code

$ claude mcp add dotrag -- rag mcp # project scope (creates .mcp.json, shareable via git) $ claude mcp add --scope project dotrag -- rag mcp # user scope (available across all projects) $ claude mcp add --scope user dotrag -- rag mcp

Claude Desktop

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{ "mcpServers": { "dotrag": { "command": "rag", "args": ["mcp"] } } }

VS Code / GitHub Copilot

Workspace: .vscode/mcp.json

{ "servers": { "dotrag": { "type": "stdio", "command": "rag", "args": ["mcp"] } } }

Cursor

Global: ~/.cursor/mcp.json  |  Project: .cursor/mcp.json

{ "mcpServers": { "dotrag": { "command": "rag", "args": ["mcp"] } } }

Windsurf

~/.codeium/windsurf/mcp_config.json

{ "mcpServers": { "dotrag": { "command": "rag", "args": ["mcp"] } } }

Zed

Add to ~/.config/zed/settings.json

{ "context_servers": { "dotrag": { "command": { "path": "rag", "args": ["mcp"] } } } }

Cline

VS Code: ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

{ "mcpServers": { "dotrag": { "command": "rag", "args": ["mcp"], "disabled": false } } }

Roo Code

Project: .roo/mcp.json

{ "mcpServers": { "dotrag": { "command": "rag", "args": ["mcp"], "disabled": false } } }

Continue

Workspace: .continue/mcpServers/dotrag.json

{ "mcpServers": { "dotrag": { "type": "stdio", "command": "rag", "args": ["mcp"] } } }

JetBrains (IntelliJ, WebStorm, PyCharm, etc.)

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add

Transport: STDIO Name: dotrag Command: rag Arguments: mcp

rag must be on $PATH. If not, use the absolute path (e.g. /usr/local/bin/rag) in the command field.

.rag file format v1.0

Five contiguous sections

The .rag format is a binary archive with five sections laid out sequentially. All section offsets are stored in a fixed 512-byte header. No external indexes, no sidecar files.

1 Fixed header 512 bytes. Magic (RAG\x01), version, offsets to all sections, embedding model ID, content hash, creation timestamp
2 Manifest Zstd-compressed JSON. Per-file metadata, summaries, entities, categories. No embeddings.
3 Blob index Zstd-compressed JSON. File IDs mapped to byte offsets for random access into the blob section
4 Embedding table Zstd-compressed float32 vectors. 384 dimensions per file, packed binary, not JSON
5 Blob payloads Zstd-compressed original file bytes. Independently addressable, decompressed on demand

Design decisions

DecisionRationale
Embeddings in binary float32, not JSON 3× smaller archives. A 384-dim vector is 1,536 bytes in binary vs ~4,600 bytes in JSON.
Manifest separate from embeddings Metadata queries (file counts, types, summaries) load only the manifest. Fast, no vector math needed.
Per-file blob compression Random access. Decompress one file without touching the rest. Critical for selective query routing.
Fixed 512-byte header All section offsets readable in a single seek. No scanning, no variable-length preamble.
Zstd compression throughout Best ratio-to-speed tradeoff for mixed content. Decompression is ~1 GB/s on modern hardware.
Content hash in header Integrity verification without reading the full archive. Detect corruption or tampering early.

Query routing

When you run rag query, the engine classifies your question and picks the cheapest path:

RouteWhat loadsWhen used
MANIFEST_ONLY Header + manifest "How many files?" / "What types?" / "When was this packed?"
SELECTIVE_DECOMPRESS Header + manifest + embeddings + top-K blobs "How does auth work?" / "Find the database layer"
FULL_SCAN Everything Broad questions that need cross-file context

The format is open. The archive layout and routing model are documented in this section so the file structure is inspectable without any external service.

Install

Single static binary

# macOS (Apple Silicon) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-macos-arm64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # macOS (Intel) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-macos-x86_64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # Linux (x86_64) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-linux-x86_64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # Windows (x86_64, PowerShell) Invoke-WebRequest -Uri https://github.com/todience/dotrag-releases/releases/latest/download/rag-windows-x86_64.zip -OutFile rag.zip Expand-Archive rag.zip -DestinationPath . Move-Item rag.exe C:\Windows\System32\ # or add to $env:PATH

No runtime dependencies. The embedding model (~80 MB ONNX) downloads on first use and caches at ~/.cache/huggingface/.

CLI
rag pack <dir>Create a .rag archive from a directory
rag query <archive> <q>Semantic search within an archive
rag query-all <q>Search across all registered archives
rag inspect <archive>View metadata and verify integrity
rag extract <archive>Extract files to disk
rag list-files <archive>List files with summaries
rag indexRebuild the machine-wide archive registry
rag mcpLaunch MCP server for AI agent integration
Limitations

Known tradeoffs

Not a grep replacement. Exact symbol lookup with line numbers is faster with grep. Semantic search returns top-K approximations, not exhaustive results.

Archives are point-in-time snapshots. Source changes make the archive stale. Repack when the directory changes.

--include-everything exists for forensic use. On real codebases it degrades search quality because generated files dilute the embedding space.