API Reference

Chat Completions

Create AI-powered chat completions with blazing fast response times.

POSThttps://api.infe.io/v1/chat/completions

Basic Example

curl https://api.infe.io/v1/chat/completions \
  -H "Authorization: Bearer $INFE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "infe-pulse",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Request Body

Parameter	Type	Required	Description
model	string	Yes	Model ID (e.g., `infe-pulse`)
messages	array	Yes	List of messages in the conversation
stream	boolean	No	Enable streaming responses. Default: `false`
temperature	number	No	Sampling temperature (0-2). Default: `1`
max_tokens	integer	No	Maximum tokens to generate
top_p	number	No	Nucleus sampling probability. Default: `1`
stop	string \| array	No	Stop sequences
tools	array	No	List of tools/functions the model can call
response_format	object	No	Force JSON output with `{"type": "json_object"}`
seed	integer	No	Seed for deterministic generation

Message Object

Each message in the messages array has this structure:

Field	Type	Description
role	string	`system`, `user`, `assistant`, or `tool`
content	string	The message content
name	string	Optional name for the participant

StreamingRecommended

Streaming delivers responses as they're generated. This is where Infe's speed advantage is most visible—instant response as tokens stream in.

from openai import OpenAI
client = OpenAI(
    api_key="your-infe-api-key",
    base_url="https://api.infe.io/v1"
)
stream = client.chat.completions.create(
    model="infe-pulse",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Tool Calling

Let the model call functions in your application:

from openai import OpenAI
client = OpenAI(
    api_key="your-infe-api-key",
    base_url="https://api.infe.io/v1"
)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]
response = client.chat.completions.create(
    model="infe-titan",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)
# Check if model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

JSON Mode

Force the model to output valid JSON:

response = client.chat.completions.create(
    model="infe-pulse",
    messages=[
        {"role": "system", "content": "Output valid JSON only."},
        {"role": "user", "content": "List 3 colors with hex codes"}
    ],
    response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data)

Response

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705369200,
  "model": "infe-pulse",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}