Skip to content
API Reference

Chat Completions

Create AI-powered chat completions with blazing fast response times.

POSThttps://api.infe.io/v1/chat/completions

Basic Example

curl https://api.infe.io/v1/chat/completions \
-H "Authorization: Bearer $INFE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "infe-pulse",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
}'

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., infe-pulse)
messagesarrayYesList of messages in the conversation
streambooleanNoEnable streaming responses. Default: false
temperaturenumberNoSampling temperature (0-2). Default: 1
max_tokensintegerNoMaximum tokens to generate
top_pnumberNoNucleus sampling probability. Default: 1
stopstring | arrayNoStop sequences
toolsarrayNoList of tools/functions the model can call
response_formatobjectNoForce JSON output with {"type": "json_object"}
seedintegerNoSeed for deterministic generation

Message Object

Each message in the messages array has this structure:

FieldTypeDescription
rolestringsystem, user, assistant, or tool
contentstringThe message content
namestringOptional name for the participant

StreamingRecommended

Streaming delivers responses as they're generated. This is where Infe's speed advantage is most visible—instant response as tokens stream in.

from openai import OpenAI
client = OpenAI(
api_key="your-infe-api-key",
base_url="https://api.infe.io/v1"
)
stream = client.chat.completions.create(
model="infe-pulse",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)

Tool Calling

Let the model call functions in your application:

from openai import OpenAI
client = OpenAI(
api_key="your-infe-api-key",
base_url="https://api.infe.io/v1"
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="infe-titan",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
# Check if model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")

JSON Mode

Force the model to output valid JSON:

response = client.chat.completions.create(
model="infe-pulse",
messages=[
{"role": "system", "content": "Output valid JSON only."},
{"role": "user", "content": "List 3 colors with hex codes"}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data)

Response

JSON
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705369200,
"model": "infe-pulse",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}