Tools & agents

The gateway supports tool-calling in three modes. You choose by the format of the request you send.

Send tools in the standard OpenAI schema. We forward them to the model and parse the response back into canonical tool_calls, regardless of whether the upstream model emits OpenAI-style or Hermes-style output.

resp = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[{"role": "user", "content": "Read /etc/hosts"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "read_file",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        },
    }],
)
print(resp.choices[0].message.tool_calls)
# [ChoiceDeltaToolCall(id='call_1', function=Function(name='read_file', arguments='{"path":"/etc/hosts"}'), type='function')]

Raw Hermes (for OpenClaw and Hermes-native runtimes)

If your runtime already produces and consumes raw Hermes XML (<tool_call>{...}</tool_call>), set the X-Mingo-Tool-Mode: raw header. We will not rewrite the prompt or parse the output — you get exactly what the model emits.

curl https://mingo.mingles.ai/v1/chat/completions \
  -H "Authorization: Bearer $MINGO_API_KEY" \
  -H "X-Mingo-Tool-Mode: raw" \
  -H "Content-Type: application/json" \
  -d '{ "model": "moonshotai/Kimi-K2.6", "messages": [...] }'

Streaming tool-calls

Streaming returns standard OpenAI delta.tool_calls chunks. The gateway buffers Hermes XML internally and emits incremental JSON to your client, so runtimes like Cline and Cursor get live indicators.

Known limits

  • Parallel tool-calls: at most 1 tool-call per assistant message today.
  • Streaming tool-calls: best-effort; we emit chunks as JSON segments arrive.
  • Tool schemas: kept under 8 KB total per request for parsing reliability.

Runtime configs

For ready-to-paste configs for OpenClaw, Cline, Continue.dev, Cursor, Aider and n8n, use the Agents Hub generator.