dotRAG - compressed archives with semantic search

What it does

A compressed archive format with built-in search

.rag is a binary file format. It stores original file bytes (zstd-compressed), per-file metadata, and a semantic embedding table in a single archive. Files are preserved byte-for-byte and extractable individually.

rag pack walks a directory, extracts text from each file, computes 384-dimensional embeddings using a local ONNX model, compresses everything, and writes a single .rag file.

rag query reads the manifest and embeddings (typically a few hundred KB), classifies the query, and routes to the cheapest retrieval path. Metadata queries never decompress blobs. Semantic queries decompress only the top-K matching files.

What This Means

dotRAG is like zipping a project, but making it searchable by meaning.

The original files are still inside the archive. Nothing gets flattened into a vague summary.

When you ask a question, dotRAG opens only the relevant files, so the model sees the same code and docs without scanning the whole folder.

Usage

# Pack a project $ rag pack ./loopsy -o loopsy.rag Files: 124 Original: 668 KB (179 MB dir, node_modules excluded) Compressed: 249 KB Archive: 440 KB # Semantic search — find files by meaning $ rag query loopsy.rag "how does peer discovery work?" Route: SELECTIVE_DECOMPRESS (5 files) Time: 20ms packages/discovery/src/mdns.ts 2.33 packages/discovery/src/peer-registry.ts 1.89 packages/discovery/src/health-checker.ts 1.74 # Literal search — find exact symbols with line numbers $ rag query loopsy.rag "class PeerRegistry" --mode literal Route: LITERAL_MATCH (124 files scanned) packages/discovery/src/peer-registry.ts:3 export class PeerRegistry { packages/discovery/tests/registry.test.ts:8 const registry = new PeerRegistry(); # Extract, build, test $ rag extract loopsy.rag -o /tmp/loopsy $ cd /tmp/loopsy && pnpm install && pnpm test 26/26 tests passed

179 MB → 440 KB. 171 MB was node_modules, excluded by default. The 440 KB archive includes real semantic embeddings — that overhead is the cost of searchability.

Benchmarks

Independently verified retrieval

Tested across 4 real-world repositories (Django, rust-analyzer, Rails, Loopsy) with 14 retrieval queries and 6 MCP agent tasks. All results independently verified — not self-reported.

4.5×

fewer tokens vs filesystem

79%

Recall@5 across 14 queries

0/24

bytes corrupted on extract

Retrieval accuracy by query category

Category	Queries	Recall@5	MRR
Architecture	4	100%	0.75
Symbol lookup (literal)	4	100%	1.00
Semantic concept	3	67%	0.33
Metadata	3	0%	0.00
Overall (14 queries)	14	79%	0.57

Agent token efficiency (MCP tasks)

Task	dotRAG	Filesystem	Ratio
File discovery	1,914	68,073	35.6×
Semantic search	592	5,418	9.2×
File read	343	332	1.0×
Conceptual query	503	1,429	2.8×
Exact symbol lookup	432	525	1.2×
Multi-step workflow	593	3,052	5.1×

Early beta lost symbol lookup to grep. Literal mode was added — dotRAG now wins or ties all 6 MCP tasks. Average 4.5× fewer tokens.

Properties

What the format provides

Semantic + literal search

Semantic mode finds files by meaning via cosine similarity over 384-dim embeddings. Literal mode finds exact symbols with line numbers. Both run locally — no network, no API keys.

Filtered packing

Respects .gitignore. Default exclusions for node_modules, .git, build artifacts, vendor dirs. Configurable include/exclude patterns and max file size.

Selective decompression

Per-file blob addressing via byte offsets. Metadata queries read only the manifest. Semantic queries decompress only top-K blobs.

Lossless extraction

Original file bytes stored with zstd compression. Extracted files are identical to the source. SHA-256 content hashes for verification.

Offline operation

Local embedding model (~80 MB, cached after first download). No network, no API keys, no external services at query time.

MCP server

Exposes six tools over stdio transport: search, read_file, list_files, list_archives, inspect, extract. Compatible with Claude Code, Cursor, Windsurf.

MCP integration

Agent integration via MCP

dotRAG implements the Model Context Protocol over stdio. It exposes six tools: dotrag_search, dotrag_read_file, dotrag_list_files, dotrag_list_archives, dotrag_inspect, dotrag_extract. Configuration for each client below.

$ claude mcp add dotrag -- rag mcp # project scope (creates .mcp.json, shareable via git) $ claude mcp add --scope project dotrag -- rag mcp # user scope (available across all projects) $ claude mcp add --scope user dotrag -- rag mcp

rag must be on $PATH. If not, use the absolute path (e.g. /usr/local/bin/rag) in the command field.

.rag file format v1.0

Five contiguous sections

The .rag format is a binary archive with five sections laid out sequentially. All section offsets are stored in a fixed 512-byte header. No external indexes, no sidecar files.

1 Fixed header 512 bytes. Magic (RAG\x01), version, offsets to all sections, embedding model ID, content hash, creation timestamp

2 Manifest Zstd-compressed JSON. Per-file metadata, summaries, entities, categories. No embeddings.

3 Blob index Zstd-compressed JSON. File IDs mapped to byte offsets for random access into the blob section

4 Embedding table Zstd-compressed float32 vectors. 384 dimensions per file, packed binary, not JSON

5 Blob payloads Zstd-compressed original file bytes. Independently addressable, decompressed on demand

Design decisions

Decision	Rationale
Embeddings in binary float32, not JSON	3× smaller archives. A 384-dim vector is 1,536 bytes in binary vs ~4,600 bytes in JSON.
Manifest separate from embeddings	Metadata queries (file counts, types, summaries) load only the manifest. Fast, no vector math needed.
Per-file blob compression	Random access. Decompress one file without touching the rest. Critical for selective query routing.
Fixed 512-byte header	All section offsets readable in a single seek. No scanning, no variable-length preamble.
Zstd compression throughout	Best ratio-to-speed tradeoff for mixed content. Decompression is ~1 GB/s on modern hardware.
Content hash in header	Integrity verification without reading the full archive. Detect corruption or tampering early.

Query routing

When you run rag query, the engine classifies your question and picks the cheapest path:

Route	What loads	When used
MANIFEST_ONLY	Header + manifest	"How many files?" / "What types?" / "When was this packed?"
SELECTIVE_DECOMPRESS	Header + manifest + embeddings + top-K blobs	"How does auth work?" / "Find the database layer"
LITERAL_MATCH	Header + all blobs (text scan)	"class PeerRegistry" / exact symbol names / code identifiers
FULL_SCAN	Everything	Broad questions that need cross-file context

The format is open. The archive layout and routing model are documented in this section so the file structure is inspectable without any external service. View full spec on GitHub →

Desktop App

Browse, search, and manage archives

A native desktop app for creating, searching, and extracting .rag archives. Drag-and-drop folders to compress, search across all archives, and browse file contents with syntax highlighting.

 macOS Apple Silicon  macOS Intel ⊞ Windows x86_64 Linux x86_64 AppImage

Double-click any .rag file to open it directly. The desktop app registers as a file handler during install.

CLI

Single static binary

# macOS (Apple Silicon) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-macos-arm64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # macOS (Intel) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-macos-x86_64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # Linux (x86_64) curl -L https://github.com/todience/dotrag-releases/releases/latest/download/rag-linux-x86_64.tar.gz | tar xz sudo mv rag /usr/local/bin/ # Windows (x86_64, PowerShell) Invoke-WebRequest -Uri https://github.com/todience/dotrag-releases/releases/latest/download/rag-windows-x86_64.zip -OutFile rag.zip Expand-Archive rag.zip -DestinationPath . Move-Item rag.exe C:\Windows\System32\ # or add to $env:PATH

No runtime dependencies. The embedding model (~80 MB ONNX) downloads on first use and caches at ~/.cache/huggingface/.

CLI

rag pack <dir>	Create a .rag archive from a directory
rag query <archive> <q>	Semantic search within an archive
rag query-all <q>	Search across all registered archives
rag inspect <archive>	View metadata and verify integrity
rag extract <archive>	Extract files to disk
rag list-files <archive>	List files with summaries
rag index	Rebuild the machine-wide archive registry
rag mcp	Launch MCP server for AI agent integration

Limitations

Known tradeoffs

Not a compression tool. A .rag archive is larger than a .tar.zst of the same files because it includes per-file embeddings and metadata. The overhead is the cost of searchability.

Archives are point-in-time snapshots. Source changes make the archive stale. Repack when the directory changes.

Semantic search works best on natural-language queries ("how does auth work?"). For exact identifiers, use --mode literal. Auto mode now falls back to literal when semantic returns no results.

--include-everything exists for forensic use. On real codebases it degrades search quality because generated files dilute the embedding space.