Skip to main content

Local Inference

What local inference provides

Local inference engines run model inference on the operator's own hardware, keeping data entirely on-device. Syndicate Code supports any local inference engine that exposes an OpenAI-compatible endpoint, such as Ollama, llama.cpp, or LM Studio (provider_routing_and_model_abstraction_spec.md §4.4).

Local providers are assigned provider_class: local by default.

Provider class semantics

The local provider class means the process is on the same host as the control plane (open_decisions_log.md OD-022). Any inference endpoint on a different host, even if operator-controlled, is treated as hosted or requires a custom class.

ClassMeaning
localProcess on the same host as the control plane
approved_hostedCloud or remote provider approved by policy
restricted_hostedHosted provider with additional policy restrictions
blockedProvider explicitly blocked by policy

Registering a local inference engine

Local inference engines that expose an OpenAI-compatible endpoint are registered as custom providers with wire format openai:

syndicate provider register ./providers/ollama-local.yaml

The registration config must include:

  • base URL (e.g., http://localhost:11434)
  • wire format declaration (openai)
  • declared model list
  • declared capability deviations
  • provider class assignment (local by default)

Alternatively, enable a catalog-discovered provider:

syndicate provider add ollama

Wire format compatibility

Syndicate Code supports two API wire formats (provider_routing_and_model_abstraction_spec.md §4.4):

Wire FormatNotes
openaiWidely implemented by local inference engines; used by Ollama, llama.cpp, LM Studio
anthropicAnthropic Messages API; supports extended features

Local inference engines typically use the openai wire format.

Evidence fidelity

Local inference runtimes may have weak or absent token accounting. Evidence fidelity is dimensioned — missing token accounting is represented as not_available without collapsing all fidelity dimensions (open_decisions_log.md OD-023).

Evidence fidelity dimensions:

DimensionDescription
Token accountingPrompt and completion token counts; may be not_available for local runtimes
Verbatim captureRaw request and response envelopes
Tool-call fidelityTool call raw payloads and result capture
Non-determinism markersTimeouts, rate limits, partial responses

Capability deviations

Where a local provider does not fully conform to the canonical capability model, the adapter must declare explicit deviations. Common deviations for local runtimes include:

  • approximate token counts
  • partial streaming reliability
  • weak JSON adherence
  • unsupported seed control
  • non-standard tool-call serialization

Undeclared deviations are a registration defect. If discovered at invocation time, the deviation is recorded and the provider's capability record is updated.

Routing and fallback

Routing decisions consider local vs hosted providers based on policy (provider_routing_and_model_abstraction_spec.md §7):

  • trust tier
  • workflow class
  • data sensitivity
  • required capability fidelity
  • cost constraints
  • latency constraints

Syndicate Code does not assume that local is always preferable or that hosted is always preferable. Default preference is policy-driven.

Fallback from a hosted provider to a local provider is permitted only if it remains within the same or stricter provider class and does not widen data exposure.

Mid-session model switching

Operators may switch to a local model during an active session:

syndicate provider switch local-ollama:llama3

Session context is preserved. If the local model has a smaller context window, the control plane applies context compaction before the switch. The operator is notified if compaction occurs.

If the incoming model has a higher capability class, the active envelope is re-evaluated. If the envelope's approved maximum is exceeded, the envelope is invalidated and a new checkpoint is triggered.

Provider testing

Verify connectivity and capability:

syndicate provider test local-ollama
syndicate provider capabilities local-ollama

The test command sends a minimal well-formed request and validates the response against the provider's declared capability descriptor, reporting any deviations detected.

Security considerations

  • Local providers are not exempt from governed execution. Every invocation requires a valid permit.
  • The base URL and port are part of the registration record; changes require a new registration version.
  • Network access by local inference processes must be subject to egress policy enforcement.
  • Evidence capture requirements are identical to hosted providers — verbatim request and response envelopes must be preserved.