Decouple intelligence from inference costs.

Route expensive AI inference to your users' (or your own) existing Claude subscriptions. Your app maintains the orchestration logic to protect your proprietary software and IP. Their subscription handles the compute.

import 'inference-relay/auto';

// Done. Relay is active.
Get StartedDocumentation →
The Billing Boundary
App Pays — Intelligence Routing
Triage, classification, instruction compilation. Haiku-class models classify queries and compile instructions. ~$0.003 per workflow. The app owns the logic.
User Pays — Execution
Heavy workflows like deep research and data extraction are powered by your own AI subscription. The application provides the logic; you provide the processing power.
Both parties pay Anthropic. Compliant with April 2026 third-party policy.
Providers
Claude CLI
Subscription
Desktop
Anthropic API
API Key
Any
OpenAI
API Key
Any
Ollama
None
Desktop
Enterprise: Safe by Design

Data stays within your company's existing, approved Claude subscription. No new vendor approvals. No data processing reviews. No security overhead. Your app becomes compliant by default — skip the 6-month procurement cycle.

We never see your prompts. The library is a dumb pipe — metadata only.promptContent: false

Self-Hosted Automation

Stop paying high monthly API credits for private research engines or personal agentic pipelines. inference-relay routes heavy workloads to an existing flat-rate subscription. This enables enterprise-grade performance for the cost of a single seat.

Internal tools at zero marginal cost.
Claude Max ($100/mo) + inference-relay ($50/mo) = unlimited private automation for $150/mo
Financial Orchestration
WorkflowTraditionalRelaySavings
High-Context Analysis
Large document processing
$1.20$0.0298.3%
Iterative Research
Multi-step chained queries
$0.50$0.0198.0%
Multi-Step Audit
Fact-checking & cross-referencing
$0.07$0.00592.9%
Standard Chat
Single-turn responses
$0.04$0.00392.5%
Comparative Synthesis
50+ Sonnet calls per run
$1.20$0.0298.3%
Break-even: 2–3 active usersAnnual savings (100 users): $10,000+
Break-even: 2–3 active users
Annual savings (100 users): $10,000+
The SaaS Economics

AI margins are notoriously thin. By moving execution costs to the user's flat-rate subscription, your gross margin moves from ~20% to 95%+.

Benchmark: High-Context Document Analysis
DATASET: LEGAL_MEMO ≈ 15,000 WORDS
MetricDirect APIinference-relay
Orchestration Cost$0.0009$0.0009
Execution Cost$0.0856$0.0000
Extraction QualityFlat listsStructured tables
Output Volume3,721 chars6,707 chars (+80%)
Gross Margin: ~15%98.9%
High-Context Superiority

Direct API calls often truncate or over-summarize large documents to manage compute. Because inference-relay utilizes the official Claude Code binary, it inherits native prompt caching and optimized context management. In our benchmarks, the relay produced 80% more detailed extractions with structured cross-references — not flat summaries.

IP Protection Verified

We stress-tested the Two-Envelope protocol by processing a document with six proprietary trade-secret terms embedded in the orchestration layer.

Proprietary terms in relay logs: 0 / 6
The logic stayed in the app. The cost moved to the user.
Pricing
Solo
$50/mo
1 developer seat. All providers, fallback cascade, streaming, auto-patch, analytics. 1,000 calls/month.
Pro
$100/mo
5 developer seats. Unlimited calls. Warm process pool, advanced routing DSL, tamper-evident audit trail.
Enterprise
Custom
Fleet policy via MDM. Org-wide key management. SSO/SCIM. SLA. Dedicated support.
The Post-April 4 Billing Boundary

Anthropic recently restricted third-party “harnesses” that utilized subscription tokens for direct API access. inference-relay maintains a stable, compliant architecture by utilizing official binary protocols rather than deprecated scraping methods.

Orchestration — App Key
The developer's API key funds triage, classification, and instruction compilation.
Execution — User Subscription
The user's subscription funds high-volume inference via the official claude CLI binary.
Revenue Alignment: Anthropic receives revenue from both the developer (API) and the end-user (Subscription). This architecture preserves platform integrity and utilizes official caching infrastructure.
Decouple intelligence from inference costs.
Get StartedRead the Docs →