Local Inference

What local inference provides

Local inference engines run model inference on the operator's own hardware, keeping data entirely on-device. Syndicate Code supports any local inference engine that exposes an OpenAI-compatible endpoint, such as Ollama, llama.cpp, or LM Studio (provider_routing_and_model_abstraction_spec.md §4.4).

Local providers are assigned provider_class: local by default.

Provider class semantics

The local provider class means the process is on the same host as the control plane (open_decisions_log.md OD-022). Any inference endpoint on a different host, even if operator-controlled, is treated as hosted or requires a custom class.

Class	Meaning
`local`	Process on the same host as the control plane
`approved_hosted`	Cloud or remote provider approved by policy
`restricted_hosted`	Hosted provider with additional policy restrictions
`blocked`	Provider explicitly blocked by policy

Registering a local inference engine

Local inference engines that expose an OpenAI-compatible endpoint are registered as custom providers with wire format openai:

syndicate provider register ./providers/ollama-local.yaml

The registration config must include:

base URL (e.g., http://localhost:11434)
wire format declaration (openai)
declared model list
declared capability deviations
provider class assignment (local by default)

Alternatively, enable a catalog-discovered provider:

syndicate provider add ollama

Wire format compatibility

Syndicate Code supports two API wire formats (provider_routing_and_model_abstraction_spec.md §4.4):

Wire Format	Notes
`openai`	Widely implemented by local inference engines; used by Ollama, llama.cpp, LM Studio
`anthropic`	Anthropic Messages API; supports extended features

Local inference engines typically use the openai wire format.

Evidence fidelity

Local inference runtimes may have weak or absent token accounting. Evidence fidelity is dimensioned — missing token accounting is represented as not_available without collapsing all fidelity dimensions (open_decisions_log.md OD-023).

Evidence fidelity dimensions:

Dimension	Description
Token accounting	Prompt and completion token counts; may be `not_available` for local runtimes
Verbatim capture	Raw request and response envelopes
Tool-call fidelity	Tool call raw payloads and result capture
Non-determinism markers	Timeouts, rate limits, partial responses

Capability deviations

Where a local provider does not fully conform to the canonical capability model, the adapter must declare explicit deviations. Common deviations for local runtimes include:

approximate token counts
partial streaming reliability
weak JSON adherence
unsupported seed control
non-standard tool-call serialization

Undeclared deviations are a registration defect. If discovered at invocation time, the deviation is recorded and the provider's capability record is updated.

Routing and fallback

Routing decisions consider local vs hosted providers based on policy (provider_routing_and_model_abstraction_spec.md §7):

trust tier
workflow class
data sensitivity
required capability fidelity
cost constraints
latency constraints

Syndicate Code does not assume that local is always preferable or that hosted is always preferable. Default preference is policy-driven.

Fallback from a hosted provider to a local provider is permitted only if it remains within the same or stricter provider class and does not widen data exposure.

Mid-session model switching

Operators may switch to a local model during an active session:

syndicate provider switch local-ollama:llama3

Session context is preserved. If the local model has a smaller context window, the control plane applies context compaction before the switch. The operator is notified if compaction occurs.

If the incoming model has a higher capability class, the active envelope is re-evaluated. If the envelope's approved maximum is exceeded, the envelope is invalidated and a new checkpoint is triggered.

Provider testing

Verify connectivity and capability:

syndicate provider test local-ollama
syndicate provider capabilities local-ollama

The test command sends a minimal well-formed request and validates the response against the provider's declared capability descriptor, reporting any deviations detected.

Security considerations

Local providers are not exempt from governed execution. Every invocation requires a valid permit.
The base URL and port are part of the registration record; changes require a new registration version.
Network access by local inference processes must be subject to egress policy enforcement.
Evidence capture requirements are identical to hosted providers — verbatim request and response envelopes must be preserved.