Enterprise Security Whitepaper
This document describes the Data Sovereignty Architecture of inference-relay for enterprise security review. It is intended for CTOs, CISOs, and security auditors evaluating inference-relay as a dependency. All claims are verifiable under NDA via our 48-hour read-only repository access program.
I. Executive Summary
inference-relay implements a Data Sovereignty Architecture that fundamentally changes the security posture of AI-assisted applications. The core guarantee: inference-relay eliminates the “Third-Party Data Processor” risk by utilizing a Zero-Interception Routing model.
The library routes inference calls from a developer's application to the end user's existing AI subscription (Claude Pro, Claude Max, Claude Team, Claude Enterprise, or OpenAI equivalents). No prompt content, no completion content, and no conversation context touches inference-relay servers at any point in the request lifecycle. The developer pays inference-relay pennies for orchestration metadata. The user's own subscription handles model execution through the provider's own infrastructure.
The practical consequence for your organization: adopting inference-relay reduces your attack surface relative to any architecture that routes prompts through a third-party API gateway. There is no new data processor to vet, no new DPA to negotiate, no new vector for prompt exfiltration. The inference path runs entirely on the user's machine, through the user's credentials, to the user's provider.
inference-relay is not a proxy. It is a local binary dependency that orchestrates execution on infrastructure your organization already trusts.
II. Hardware-Bound Credential Isolation
OS-Mediated Consent
inference-relay maintains zero persistent credential storage. It does not write tokens to disk, embed them in configuration files, or cache them in application memory beyond the lifetime of a single task. Instead, it requests a transient session token from the operating system's secure enclave at the moment of use.
| Platform | Secure Storage | Mechanism |
|---|---|---|
| macOS | Hardware-Authorized Secure Enclave | OS-mediated secure storage API |
| Linux | libsecret / credential file | D-Bus Secret Service |
| Windows | PasswordVault | Windows Credential Manager |
| Electron | safeStorage | OS-level encryption via Chromium |
| Browser | AES-GCM via Web Crypto | Non-extractable CryptoKey |
Volatile Memory Injection
Credentials are injected into the Native Subscription Gateway as transient environment states within an isolated execution context. They exist only in volatile memory for the duration of the task and are purged upon process termination. There is no window during which credentials are written to disk, logged to stdout, or accessible to sibling processes.
The user explicitly consents to credential access through the operating system's native permission dialog. inference-relay cannot silently acquire credentials—the OS mediates every access request, and the user sees exactly which application is requesting access to the Hardware-Authorized Secure Enclave.
III. Process Sandboxing Architecture
Shell-Free Invocation
inference-relay invokes the Native Subscription Gateway using parameterized argument arrays passed directly to the binary. The system shell is never involved—no /bin/sh on Unix, no cmd.exe on Windows. Common injection attacks that rely on shell metacharacter interpretation are physically impossible at the architectural level.
Shell injection requires a shell. inference-relay never invokes one. Arguments are parameterized arrays, never string-concatenated commands.
Standard I/O Pipe Isolation
The Native Gateway binary runs as an isolated process with a scoped environment. stdin, stdout, and stderr are discrete pipes—no shared memory, no shared state, no file descriptor leakage between the parent application and the execution context. The gateway cannot read the parent's memory space, and the parent consumes only the structured output written to its stdout pipe.
Process Lifecycle Management
Orphan prevention is enforced through deterministic process tracking. On parent exit, every tracked execution context receives a termination signal. There is no scenario in which an orphaned gateway process continues executing inference after the parent application terminates.
- Arguments: Parameterized arrays passed directly to the binary. No string concatenation, no template interpolation.
- Environment: Scoped per execution context. Credentials injected as environment variables, invisible to other processes on the system.
- Teardown: Deterministic. Process tracking guarantees no orphans survive parent termination.
IV. Asymmetric Logic Verification (RS256)
Signed Trust Chain
The inference-relay server signs all manifests and license validation payloads with an RSA-2048 private key. The client library verifies these signatures using an embedded public key. This creates an Asymmetric Decoupling: the library can verify that a payload originated from the legitimate server, but it can never sign payloads itself. The private key is air-gapped on secure infrastructure and never distributed.
This architecture provides Logic Tampering protection. A forged manifest—whether injected by a man-in-the-middle, a compromised CDN, or a malicious package registry—cannot silently alter library behavior. The RSA-2048 signature verification will reject it before any instructions are parsed.
Protocol Integrity Enforcement
The library maintains four security states that govern its behavior when verification fails:
| State | Trigger | Behavior |
|---|---|---|
| SEC_NOMINAL | Signature verified | Full operation, cache refreshed |
| SEC_CACHE | Verification unsuccessful | Operates from last-known-good cached configuration |
| SEC_RECOVERY | 1-2 consecutive verification failures | Staggered retry with 50-100ms jitter |
| SEC_DEGRADED | 3 consecutive verification failures | High-entropy recovery state, restricted operation |
The SEC_DEGRADED state is the terminal protection: after three consecutive verification failures, the library assumes the trust chain has been compromised. Staggered delays with randomized jitter (50-100ms) prevent timing-based attacks against the recovery mechanism. License validation responses are cached with a 7-day hard maximum staleness, ensuring that even in prolonged network partition scenarios, the library will not operate on arbitrarily stale authorization.
V. The Dumb Pipe Guarantee
Type-Level Enforcement of Privacy
inference-relay's telemetry schema contains a Static Analysis Guardrail that makes accidental content logging a compile-time error, not a runtime oversight. The AuditEvent interface declares:
promptContent and completionContent are typed as the literal value false, not as boolean or string. A developer cannot “accidentally” assign prompt text to these fields because the TypeScript compiler will reject it. The compiler acts as an automated security auditor—the type system makes content leakage a static analysis failure, not a code review finding.
Telemetry transmits only: provider name, model identifier, token counts, estimated cost, request duration, and fallback status. Zero content fields exist in the schema. There is no mechanism—accidental or intentional—to attach prompt or completion data to a telemetry event without modifying the type definition itself, which is a tracked, reviewable change.
Binary String Entropy Scan
The CI pipeline includes an entropy scanner that examines compiled JavaScript output for high-entropy strings that could indicate inadvertently embedded content, API keys, or credential material. This provides a secondary verification layer beyond the type system: even if a developer circumvented the TypeScript guardrail through type assertions, the compiled output would trigger the entropy scan before reaching production.
VI. Compliance Positioning
Direct Subscription Utilization (DSU)
inference-relay implements what we term Direct Subscription Utilization: the library enables employees to use AI capabilities through the organization's existing Anthropic or OpenAI data processing agreements. Rather than introducing a new data processor into your supply chain, inference-relay brings Shadow AI (the use of personal AI subscriptions for work tasks, outside corporate visibility and governance) into compliance by routing it through subscriptions your legal team has already vetted.
Client-Side Software Classification
Because inference-relay operates as a Local Binary Dependency—executing entirely on the end user's machine with no server-side prompt processing—it qualifies as Client-Side Software rather than a Cloud Service under most procurement frameworks. This distinction simplifies vendor onboarding significantly:
- No new DPA required. inference-relay does not process, store, or have access to prompt or completion content. The data processing relationship remains exclusively between the user and their AI provider.
- No new data processor introduced. Your existing Anthropic DPA governs the inference. inference-relay is a local tool, not a data intermediary.
- Simplified SOC 2 scoping. Client-side dependencies that do not process customer data fall outside the trust boundary for most audit frameworks.
- Platform Integrity. inference-relay utilizes official binary protocols, ensuring that the AI provider's internal safety filters and prompt caching remain active and effective.
Risk Transfer
The security model creates a clean Risk Transfer boundary. inference-relay is responsible for: orchestration correctness, credential isolation, process sandboxing, and manifest integrity. inference-relay is not responsible for—and has no access to—prompt content, completion content, model behavior, or data retention. Those responsibilities remain with the AI provider under your existing agreements.
Enterprise Audit Access
For organizations requiring source code review prior to adoption, inference-relay offers a structured audit pathway:
- NDA workflow: Standard mutual NDA executed prior to access grant.
- 48-hour read-only repository access: Your security team receives time-boxed access to the full source repository for independent verification of every claim in this document.
- Reproducible builds: Published packages can be verified against source through deterministic build output.
VII. Summary of Guarantees
| Property | Guarantee | Enforcement |
|---|---|---|
| Prompt privacy | Zero content transmission | Type-level literal false fields |
| Credential storage | Zero persistent storage | OS secure enclave, volatile injection |
| Shell injection | Physically impossible | Parameterized binary invocation bypasses shell entirely |
| Manifest integrity | RSA-2048 signed trust chain | Asymmetric verification, air-gapped key |
| Process isolation | Discrete I/O channels, scoped env | No shared memory or state |
| Zombie prevention | Deterministic teardown | Process registry with platform-appropriate termination signals |
| License cache staleness | 7-day hard maximum | SEC_DEGRADED after 3 failures |
| Content leakage in CI | Binary entropy scanning | Automated high-entropy string detection |
Every guarantee in this document is a verifiable architectural property, not a policy promise. The type system enforces privacy. The process model enforces isolation. The cryptographic chain enforces integrity. These properties hold regardless of developer intent, configuration errors, or runtime conditions.