Enterprise Security Whitepaper

Classification

This document describes the Data Sovereignty Architecture of inference-relay for enterprise security review. It is intended for CTOs, CISOs, and security auditors evaluating inference-relay as a dependency. All claims are verifiable under NDA via our 48-hour read-only repository access program.

Prepared: April 6, 2026
Author: inference-relay security team
Scope: Credential handling, process isolation, cryptographic verification, telemetry boundaries, compliance posture
Version: 1.0

I. Executive Summary

inference-relay implements a Data Sovereignty Architecture that fundamentally changes the security posture of AI-assisted applications. The core guarantee: inference-relay eliminates the “Third-Party Data Processor” risk by utilizing a Zero-Interception Routing model.

The library routes inference calls from a developer's application to the end user's existing AI subscription (Claude Pro, Claude Max, Claude Team, Claude Enterprise, or OpenAI equivalents). No prompt content, no completion content, and no conversation context touches inference-relay servers at any point in the request lifecycle. The developer pays inference-relay pennies for orchestration metadata. The user's own subscription handles model execution through the provider's own infrastructure.

The practical consequence for your organization: adopting inference-relay reduces your attack surface relative to any architecture that routes prompts through a third-party API gateway. There is no new data processor to vet, no new DPA to negotiate, no new vector for prompt exfiltration. The inference path runs entirely on the user's machine, through the user's credentials, to the user's provider.

inference-relay is not a proxy. It is a local binary dependency that orchestrates execution on infrastructure your organization already trusts.

II. Hardware-Bound Credential Isolation

OS-Mediated Consent

inference-relay maintains zero persistent credential storage. It does not write tokens to disk, embed them in configuration files, or cache them in application memory beyond the lifetime of a single task. Instead, it requests a transient session token from the operating system's secure enclave at the moment of use.

Platform	Secure Storage	Mechanism
macOS	Hardware-Authorized Secure Enclave	OS-mediated secure storage API
Linux	libsecret / credential file	D-Bus Secret Service
Windows	PasswordVault	Windows Credential Manager
Electron	safeStorage	OS-level encryption via Chromium
Browser	AES-GCM via Web Crypto	Non-extractable CryptoKey

Volatile Memory Injection

Credentials are injected into the Native Subscription Gateway as transient environment states within an isolated execution context. They exist only in volatile memory for the duration of the task and are purged upon process termination. There is no window during which credentials are written to disk, logged to stdout, or accessible to sibling processes.

The user explicitly consents to credential access through the operating system's native permission dialog. inference-relay cannot silently acquire credentials—the OS mediates every access request, and the user sees exactly which application is requesting access to the Hardware-Authorized Secure Enclave.

III. Process Sandboxing Architecture

Shell-Free Invocation

inference-relay invokes the Native Subscription Gateway using parameterized argument arrays passed directly to the binary. The system shell is never involved—no /bin/sh on Unix, no cmd.exe on Windows. Common injection attacks that rely on shell metacharacter interpretation are physically impossible at the architectural level.

Shell injection requires a shell. inference-relay never invokes one. Arguments are parameterized arrays, never string-concatenated commands.

Standard I/O Pipe Isolation

The Native Gateway binary runs as an isolated process with a scoped environment. stdin, stdout, and stderr are discrete pipes—no shared memory, no shared state, no file descriptor leakage between the parent application and the execution context. The gateway cannot read the parent's memory space, and the parent consumes only the structured output written to its stdout pipe.

Process Lifecycle Management

Orphan prevention is enforced through deterministic process tracking. On parent exit, every tracked execution context receives a termination signal. There is no scenario in which an orphaned gateway process continues executing inference after the parent application terminates.

Arguments: Parameterized arrays passed directly to the binary. No string concatenation, no template interpolation.
Environment: Scoped per execution context. Credentials injected as environment variables, invisible to other processes on the system.
Teardown: Deterministic. Process tracking guarantees no orphans survive parent termination.

IV. Asymmetric Logic Verification (RS256)

Signed Trust Chain

The inference-relay server signs all manifests and license validation payloads with an RSA-2048 private key. The client library verifies these signatures using an embedded public key. This creates an Asymmetric Decoupling: the library can verify that a payload originated from the legitimate server, but it can never sign payloads itself. The private key is air-gapped on secure infrastructure and never distributed.

This architecture provides Logic Tampering protection. A forged manifest—whether injected by a man-in-the-middle, a compromised CDN, or a malicious package registry—cannot silently alter library behavior. The RSA-2048 signature verification will reject it before any instructions are parsed.

Protocol Integrity Enforcement

The library maintains four security states that govern its behavior when verification fails:

State	Trigger	Behavior
SEC_NOMINAL	Signature verified	Full operation, cache refreshed
SEC_CACHE	Verification unsuccessful	Operates from last-known-good cached configuration
SEC_RECOVERY	1-2 consecutive verification failures	Staggered retry with 50-100ms jitter
SEC_DEGRADED	3 consecutive verification failures	High-entropy recovery state, restricted operation

The SEC_DEGRADED state is the terminal protection: after three consecutive verification failures, the library assumes the trust chain has been compromised. Staggered delays with randomized jitter (50-100ms) prevent timing-based attacks against the recovery mechanism. License validation responses are cached with a 7-day hard maximum staleness, ensuring that even in prolonged network partition scenarios, the library will not operate on arbitrarily stale authorization.

V. The Dumb Pipe Guarantee

Type-Level Enforcement of Privacy

inference-relay's telemetry schema contains a Static Analysis Guardrail that makes accidental content logging a compile-time error, not a runtime oversight. The AuditEvent interface declares:

// TypeScript literal types as security enforcement
interface AuditEvent {
promptContent: false;    // literal type — compile error if assigned
completionContent: false; // literal type — compile error if assigned
// ... operational metadata fields (no content fields exist)
}

promptContent and completionContent are typed as the literal value false, not as boolean or string. A developer cannot “accidentally” assign prompt text to these fields because the TypeScript compiler will reject it. The compiler acts as an automated security auditor—the type system makes content leakage a static analysis failure, not a code review finding.

Telemetry transmits only: provider name, model identifier, token counts, estimated cost, request duration, and fallback status. Zero content fields exist in the schema. There is no mechanism—accidental or intentional—to attach prompt or completion data to a telemetry event without modifying the type definition itself, which is a tracked, reviewable change.

Binary String Entropy Scan

The CI pipeline includes an entropy scanner that examines compiled JavaScript output for high-entropy strings that could indicate inadvertently embedded content, API keys, or credential material. This provides a secondary verification layer beyond the type system: even if a developer circumvented the TypeScript guardrail through type assertions, the compiled output would trigger the entropy scan before reaching production.

VI. Compliance Positioning

Direct Subscription Utilization (DSU)

inference-relay implements what we term Direct Subscription Utilization: the library enables employees to use AI capabilities through the organization's existing Anthropic or OpenAI data processing agreements. Rather than introducing a new data processor into your supply chain, inference-relay brings Shadow AI (the use of personal AI subscriptions for work tasks, outside corporate visibility and governance) into compliance by routing it through subscriptions your legal team has already vetted.

Client-Side Software Classification

Because inference-relay operates as a Local Binary Dependency—executing entirely on the end user's machine with no server-side prompt processing—it qualifies as Client-Side Software rather than a Cloud Service under most procurement frameworks. This distinction simplifies vendor onboarding significantly:

No new DPA required. inference-relay does not process, store, or have access to prompt or completion content. The data processing relationship remains exclusively between the user and their AI provider.
No new data processor introduced. Your existing Anthropic DPA governs the inference. inference-relay is a local tool, not a data intermediary.
Simplified SOC 2 scoping. Client-side dependencies that do not process customer data fall outside the trust boundary for most audit frameworks.
Platform Integrity. inference-relay utilizes official binary protocols, ensuring that the AI provider's internal safety filters and prompt caching remain active and effective.

Risk Transfer

The security model creates a clean Risk Transfer boundary. inference-relay is responsible for: orchestration correctness, credential isolation, process sandboxing, and manifest integrity. inference-relay is not responsible for—and has no access to—prompt content, completion content, model behavior, or data retention. Those responsibilities remain with the AI provider under your existing agreements.

Enterprise Audit Access

For organizations requiring source code review prior to adoption, inference-relay offers a structured audit pathway:

NDA workflow: Standard mutual NDA executed prior to access grant.
48-hour read-only repository access: Your security team receives time-boxed access to the full source repository for independent verification of every claim in this document.
Reproducible builds: Published packages can be verified against source through deterministic build output.

VII. Summary of Guarantees

Property	Guarantee	Enforcement
Prompt privacy	Zero content transmission	Type-level literal false fields
Credential storage	Zero persistent storage	OS secure enclave, volatile injection
Shell injection	Physically impossible	Parameterized binary invocation bypasses shell entirely
Manifest integrity	RSA-2048 signed trust chain	Asymmetric verification, air-gapped key
Process isolation	Discrete I/O channels, scoped env	No shared memory or state
Zombie prevention	Deterministic teardown	Process registry with platform-appropriate termination signals
License cache staleness	7-day hard maximum	SEC_DEGRADED after 3 failures
Content leakage in CI	Binary entropy scanning	Automated high-entropy string detection

Every guarantee in this document is a verifiable architectural property, not a policy promise. The type system enforces privacy. The process model enforces isolation. The cryptographic chain enforces integrity. These properties hold regardless of developer intent, configuration errors, or runtime conditions.

For enterprise audit requests, contact security@inference-relay.com. 48-hour read-only repository access is available under mutual NDA. All architectural claims in this document are verifiable against source.