Tools & agents
The gateway supports tool-calling in three modes. You choose by the format of the request you send.
OpenAI tools format (recommended)
Send tools in the standard OpenAI schema. We forward them to the model and
parse the response back into canonical tool_calls, regardless of whether the
upstream model emits OpenAI-style or Hermes-style output.
resp = client.chat.completions.create(
model="moonshotai/Kimi-K2.6",
messages=[{"role": "user", "content": "Read /etc/hosts"}],
tools=[{
"type": "function",
"function": {
"name": "read_file",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
}],
)
print(resp.choices[0].message.tool_calls)
# [ChoiceDeltaToolCall(id='call_1', function=Function(name='read_file', arguments='{"path":"/etc/hosts"}'), type='function')]
Raw Hermes (for OpenClaw and Hermes-native runtimes)
If your runtime already produces and consumes raw Hermes XML
(<tool_call>{...}</tool_call>), set the X-Mingo-Tool-Mode: raw header.
We will not rewrite the prompt or parse the output — you get exactly what the
model emits.
curl https://mingo.mingles.ai/v1/chat/completions \
-H "Authorization: Bearer $MINGO_API_KEY" \
-H "X-Mingo-Tool-Mode: raw" \
-H "Content-Type: application/json" \
-d '{ "model": "moonshotai/Kimi-K2.6", "messages": [...] }'
Streaming tool-calls
Streaming returns standard OpenAI delta.tool_calls chunks. The gateway
buffers Hermes XML internally and emits incremental JSON to your client, so
runtimes like Cline and Cursor get live indicators.
Known limits
- Parallel tool-calls: at most 1 tool-call per assistant message today.
- Streaming tool-calls: best-effort; we emit chunks as JSON segments arrive.
- Tool schemas: kept under 8 KB total per request for parsing reliability.
Runtime configs
For ready-to-paste configs for OpenClaw, Cline, Continue.dev, Cursor, Aider and n8n, use the Agents Hub generator.