Route expensive AI inference to your users' (or your own) existing Claude subscriptions. Your app maintains the orchestration logic to protect your proprietary software and IP. Their subscription handles the compute.
import 'inference-relay/auto'; // Done. Relay is active.
Data stays within your company's existing, approved Claude subscription. No new vendor approvals. No data processing reviews. No security overhead. Your app becomes compliant by default — skip the 6-month procurement cycle.
We never see your prompts. The library is a dumb pipe — metadata only.promptContent: false
Stop paying high monthly API credits for private research engines or personal agentic pipelines. inference-relay routes heavy workloads to an existing flat-rate subscription. This enables enterprise-grade performance for the cost of a single seat.
| Workflow | Traditional | Relay | Savings |
|---|---|---|---|
| High-Context Analysis Large document processing | $1.20 | $0.02 | 98.3% |
| Iterative Research Multi-step chained queries | $0.50 | $0.01 | 98.0% |
| Multi-Step Audit Fact-checking & cross-referencing | $0.07 | $0.005 | 92.9% |
| Standard Chat Single-turn responses | $0.04 | $0.003 | 92.5% |
| Comparative Synthesis 50+ Sonnet calls per run | $1.20 | $0.02 | 98.3% |
AI margins are notoriously thin. By moving execution costs to the user's flat-rate subscription, your gross margin moves from ~20% to 95%+.
DATASET: LEGAL_MEMO ≈ 15,000 WORDS| Metric | Direct API | inference-relay |
|---|---|---|
| Orchestration Cost | $0.0009 | $0.0009 |
| Execution Cost | $0.0856 | $0.0000 |
| Extraction Quality | Flat lists | Structured tables |
| Output Volume | 3,721 chars | 6,707 chars (+80%) |
Direct API calls often truncate or over-summarize large documents to manage compute. Because inference-relay utilizes the official Claude Code binary, it inherits native prompt caching and optimized context management. In our benchmarks, the relay produced 80% more detailed extractions with structured cross-references — not flat summaries.
We stress-tested the Two-Envelope protocol by processing a document with six proprietary trade-secret terms embedded in the orchestration layer.
Anthropic recently restricted third-party “harnesses” that utilized subscription tokens for direct API access. inference-relay maintains a stable, compliant architecture by utilizing official binary protocols rather than deprecated scraping methods.
claude CLI binary.