Local Inference
What local inference provides
Local inference engines run model inference on the operator's own hardware, keeping data entirely on-device. Syndicate Code supports any local inference engine that exposes an OpenAI-compatible endpoint, such as Ollama, llama.cpp, or LM Studio (provider_routing_and_model_abstraction_spec.md §4.4).
Local providers are assigned provider_class: local by default.
Provider class semantics
The local provider class means the process is on the same host as the control plane (open_decisions_log.md OD-022). Any inference endpoint on a different host, even if operator-controlled, is treated as hosted or requires a custom class.
| Class | Meaning |
|---|---|
local | Process on the same host as the control plane |
approved_hosted | Cloud or remote provider approved by policy |
restricted_hosted | Hosted provider with additional policy restrictions |
blocked | Provider explicitly blocked by policy |
Registering a local inference engine
Local inference engines that expose an OpenAI-compatible endpoint are registered as custom providers with wire format openai:
syndicate provider register ./providers/ollama-local.yaml
The registration config must include:
- base URL (e.g.,
http://localhost:11434) - wire format declaration (
openai) - declared model list
- declared capability deviations
- provider class assignment (
localby default)
Alternatively, enable a catalog-discovered provider:
syndicate provider add ollama
Wire format compatibility
Syndicate Code supports two API wire formats (provider_routing_and_model_abstraction_spec.md §4.4):
| Wire Format | Notes |
|---|---|
openai | Widely implemented by local inference engines; used by Ollama, llama.cpp, LM Studio |
anthropic | Anthropic Messages API; supports extended features |
Local inference engines typically use the openai wire format.
Evidence fidelity
Local inference runtimes may have weak or absent token accounting. Evidence fidelity is dimensioned — missing token accounting is represented as not_available without collapsing all fidelity dimensions (open_decisions_log.md OD-023).
Evidence fidelity dimensions:
| Dimension | Description |
|---|---|
| Token accounting | Prompt and completion token counts; may be not_available for local runtimes |
| Verbatim capture | Raw request and response envelopes |
| Tool-call fidelity | Tool call raw payloads and result capture |
| Non-determinism markers | Timeouts, rate limits, partial responses |
Capability deviations
Where a local provider does not fully conform to the canonical capability model, the adapter must declare explicit deviations. Common deviations for local runtimes include:
- approximate token counts
- partial streaming reliability
- weak JSON adherence
- unsupported seed control
- non-standard tool-call serialization
Undeclared deviations are a registration defect. If discovered at invocation time, the deviation is recorded and the provider's capability record is updated.
Routing and fallback
Routing decisions consider local vs hosted providers based on policy (provider_routing_and_model_abstraction_spec.md §7):
- trust tier
- workflow class
- data sensitivity
- required capability fidelity
- cost constraints
- latency constraints
Syndicate Code does not assume that local is always preferable or that hosted is always preferable. Default preference is policy-driven.
Fallback from a hosted provider to a local provider is permitted only if it remains within the same or stricter provider class and does not widen data exposure.
Mid-session model switching
Operators may switch to a local model during an active session:
syndicate provider switch local-ollama:llama3
Session context is preserved. If the local model has a smaller context window, the control plane applies context compaction before the switch. The operator is notified if compaction occurs.
If the incoming model has a higher capability class, the active envelope is re-evaluated. If the envelope's approved maximum is exceeded, the envelope is invalidated and a new checkpoint is triggered.
Provider testing
Verify connectivity and capability:
syndicate provider test local-ollama
syndicate provider capabilities local-ollama
The test command sends a minimal well-formed request and validates the response against the provider's declared capability descriptor, reporting any deviations detected.
Security considerations
- Local providers are not exempt from governed execution. Every invocation requires a valid permit.
- The base URL and port are part of the registration record; changes require a new registration version.
- Network access by local inference processes must be subject to egress policy enforcement.
- Evidence capture requirements are identical to hosted providers — verbatim request and response envelopes must be preserved.