Reduces token consumption by compressing web content through local LLMs before feeding it to Claude Code.
curl -fsSL https://ai-summary.agent-tools.org/install.sh | sh
How It Works
Pipeline:
Web Search → Fetch Pages → Readability Extract → LLM Summary → Compressed output (60–98% smaller)
Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1–4K summary.
Features
- Search + Summarize — Gemini (Google Search grounding), DuckDuckGo, or Brave
- Fetch + Summarize — Fetch any URL, extract article content, summarize with LLM
- Stdin Summarize — Pipe any text through for compression
- Fast Compress — No-LLM text extraction for instant compression
- JS-heavy Pages — agent-browser and Cloudflare Browser Rendering
- Pipe-friendly —
cat urls.txt | ai-summary fetch,--jsonoutput, standard exit codes - GitHub Code Search — Search code and read files from GitHub repos via
ghCLI + LLM summarization - Repo Summarize — Pack remote GitHub repos with repomix and summarize via LLM
- Test Output Compression —
wrapsubcommand compresses passing test output (cargo test, npm test, pytest, etc.) - Claude Code Integration — Prompt injection for subagent coverage + PreToolUse hooks for real token savings
- Rich Statistics — Time-period breakdown, ROI tracking
- Multiple LLM Backends — opencode (free), oMLX (local), OpenAI, Groq, DeepSeek
Quick Start
ai-summary "what is the latest Rust version"
ai-summary fetch https://example.com/article -p "key points"
echo "large text" | ai-summary compress -m 4000
ai-summary github "error handling" -r tokio-rs/tokio -l rust
ai-summary github owner/repo src/main.rs -p "explain this"
ai-summary repo user/repo -p "explain the architecture"
ai-summary stats
Subcommands
ai-summary <query>— Search + summarizeai-summary fetch <urls> -p ...— Fetch + summarizeai-summary sum <prompt>— Summarize stdin via LLMai-summary compress -m <chars>— Fast compression (no LLM)ai-summary wrap <command>— Run command, compress passing test outputai-summary github <query> [-r repo] [-l lang]— Search GitHub codeai-summary github <owner/repo> [path]— Read file / browse repoai-summary repo <owner/repo> -p ...— Pack remote repo + summarizeai-summary crawl <url>— Crawl via CF Browser Renderingai-summary stats— Token savings statsai-summary init— Install Claude Code integrationai-summary config— Show/create config
Flags: --deep, --raw, --json, --browser, --cf, --api-url, --api-key, --model
Claude Code Integration
One-command setup
ai-summary init # Install prompt + hook
ai-summary init --with-repomix # Also install repomix (for repo command)
ai-summary init --uninstall # Remove
Installs prompt injection (Claude + subagents use ai-summary), Bash hook (test commands compressed), and WebFetch/WebSearch hooks (one-time education per session).
Without hook: cargo test → 3000 tokens (raw)
With hook: cargo test → 15 tokens (compressed)
Tee mode
Failed commands save full output to /tmp/ai-summary-tee/ — AI can read the raw log if the summary isn't enough.
Benchmarks (v2.6.0)
Real-world evaluation run on 2026-03-14. All tests use default config with free LLM backends.
| Scenario | Input | Output | Compression | Time |
|---|---|---|---|---|
| Repo (ai-summary, 15 .rs files) | 59.1K chars | 1.6K chars | 97% | 59s |
| Repo (repomix, src/**/*.ts) | 182.3K chars | 1.7K chars | 99% | 36s |
| Fetch (docs.rs/reqwest) | ~1K tokens | ~516 tokens | 51% | 10s |
| Fetch (react.dev/learn) | ~1K tokens | ~308 tokens | 69% | 10s |
| Fetch (Wikipedia/Rust) | ~1K tokens | ~378 tokens | 62% | 10s |
| Fetch (Hacker News) | ~938 tokens | ~262 tokens | 72% | 10s |
Cumulative Stats (166 queries)
| Metric | Value |
|---|---|
| Total tokens saved | 300,700 |
| Overall compression | 84% |
| Estimated Claude cost saved | $0.90 (at $3/M input tokens) |
| LLM cost | $0.18 (mostly free backends) |
| ROI | 5x return |
By Mode
| Mode | Queries | Tokens Saved | Avg Compression |
|---|---|---|---|
| gemini-cli (search) | 58 | 171,600 | 85% |
| repo (repomix) | 2 | 59,500 | 99% |
| fetch (single URL) | 93 | 48,700 | 76% |
| compress (no LLM) | 4 | 10,100 | 83% |
| gemini API | 2 | 6,100 | 88% |
| stdin pipe | 5 | 1,700 | 71% |
| hook-bash (auto) | 2 | 867 | 30% |
Repo mode achieves 97-99% compression using repomix Tree-sitter extraction + LLM summarization. Compress mode uses no LLM — pure text extraction at near-instant speed.
Installation
Quick install (recommended)
curl -fsSL https://ai-summary.agent-tools.org/install.sh | sh
Downloads prebuilt binary for your platform (macOS/Linux, x86_64/aarch64). Set AI_SUMMARY_INSTALL_DIR to customize install path (default: ~/.local/bin).
From crates.io
cargo install ai-summary
From source
git clone https://github.com/agent-tools-org/ai-summary
cd ai-summary
cargo install --path .
Releases: GitHub Releases