Skip to main content

OpenAI

LiteLLM supports OpenAI Chat + Embedding calls.

tip

We recommend using litellm.responses() / Responses API for the latest OpenAI models (GPT-5, gpt-5-codex, o3-mini, etc.)

Required API Keysโ€‹

import os 
os.environ["OPENAI_API_KEY"] = "your-api-key"

Usageโ€‹

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
response = completion(
model = "gpt-4o",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Usage - LiteLLM Proxy Serverโ€‹

Here's how to call OpenAI models with the LiteLLM Proxy Server

1. Save key in your environmentโ€‹

export OPENAI_API_KEY=""

2. Start the proxyโ€‹

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo # The `openai/` prefix will call openai.chat.completions.create
api_key: os.environ/OPENAI_API_KEY
- model_name: gpt-3.5-turbo-instruct
litellm_params:
model: text-completion-openai/gpt-3.5-turbo-instruct # The `text-completion-openai/` prefix will call openai.completions.create
api_key: os.environ/OPENAI_API_KEY

3. Test itโ€‹

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'

Optional Keys - OpenAI Organization, OpenAI API Baseโ€‹

import os 
os.environ["OPENAI_ORGANIZATION"] = "your-org-id" # OPTIONAL
os.environ["OPENAI_BASE_URL"] = "https://your_host/v1" # OPTIONAL

OpenAI Chat Completion Modelsโ€‹

Model NameFunction Call
gpt-5response = completion(model="gpt-5", messages=messages)
gpt-5-miniresponse = completion(model="gpt-5-mini", messages=messages)
gpt-5-nanoresponse = completion(model="gpt-5-nano", messages=messages)
gpt-5-chatresponse = completion(model="gpt-5-chat", messages=messages)
gpt-5-chat-latestresponse = completion(model="gpt-5-chat-latest", messages=messages)
gpt-5-2025-08-07response = completion(model="gpt-5-2025-08-07", messages=messages)
gpt-5-mini-2025-08-07response = completion(model="gpt-5-mini-2025-08-07", messages=messages)
gpt-5-nano-2025-08-07response = completion(model="gpt-5-nano-2025-08-07", messages=messages)
gpt-5-proresponse = completion(model="gpt-5-pro", messages=messages)
gpt-4.1response = completion(model="gpt-4.1", messages=messages)
gpt-4.1-miniresponse = completion(model="gpt-4.1-mini", messages=messages)
gpt-4.1-nanoresponse = completion(model="gpt-4.1-nano", messages=messages)
o4-miniresponse = completion(model="o4-mini", messages=messages)
o3-miniresponse = completion(model="o3-mini", messages=messages)
o3response = completion(model="o3", messages=messages)
o1-miniresponse = completion(model="o1-mini", messages=messages)
o1-previewresponse = completion(model="o1-preview", messages=messages)
gpt-4o-miniresponse = completion(model="gpt-4o-mini", messages=messages)
gpt-4o-mini-2024-07-18response = completion(model="gpt-4o-mini-2024-07-18", messages=messages)
gpt-4oresponse = completion(model="gpt-4o", messages=messages)
gpt-4o-2024-08-06response = completion(model="gpt-4o-2024-08-06", messages=messages)
gpt-4o-2024-05-13response = completion(model="gpt-4o-2024-05-13", messages=messages)
gpt-4-turboresponse = completion(model="gpt-4-turbo", messages=messages)
gpt-4-turbo-previewresponse = completion(model="gpt-4-0125-preview", messages=messages)
gpt-4-0125-previewresponse = completion(model="gpt-4-0125-preview", messages=messages)
gpt-4-1106-previewresponse = completion(model="gpt-4-1106-preview", messages=messages)
gpt-3.5-turbo-1106response = completion(model="gpt-3.5-turbo-1106", messages=messages)
gpt-3.5-turboresponse = completion(model="gpt-3.5-turbo", messages=messages)
gpt-3.5-turbo-0301response = completion(model="gpt-3.5-turbo-0301", messages=messages)
gpt-3.5-turbo-0613response = completion(model="gpt-3.5-turbo-0613", messages=messages)
gpt-3.5-turbo-16kresponse = completion(model="gpt-3.5-turbo-16k", messages=messages)
gpt-3.5-turbo-16k-0613response = completion(model="gpt-3.5-turbo-16k-0613", messages=messages)
gpt-4response = completion(model="gpt-4", messages=messages)
gpt-4-0314response = completion(model="gpt-4-0314", messages=messages)
gpt-4-0613response = completion(model="gpt-4-0613", messages=messages)
gpt-4-32kresponse = completion(model="gpt-4-32k", messages=messages)
gpt-4-32k-0314response = completion(model="gpt-4-32k-0314", messages=messages)
gpt-4-32k-0613response = completion(model="gpt-4-32k-0613", messages=messages)

These also support the OPENAI_BASE_URL environment variable, which can be used to specify a custom API endpoint.

OpenAI Vision Modelsโ€‹

Model NameFunction Call
gpt-4oresponse = completion(model="gpt-4o", messages=messages)
gpt-4-turboresponse = completion(model="gpt-4-turbo", messages=messages)
gpt-4-vision-previewresponse = completion(model="gpt-4-vision-preview", messages=messages)

Usageโ€‹

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
response = completion(
model = "gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whatโ€™s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)

PDF File Parsingโ€‹

OpenAI has a new file message type that allows you to pass in a PDF file and have it parsed into a structured output. Read more

import base64
from litellm import completion

with open("draconomicon.pdf", "rb") as f:
data = f.read()

base64_string = base64.b64encode(data).decode("utf-8")

completion = completion(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "file",
"file": {
"filename": "draconomicon.pdf",
"file_data": f"data:application/pdf;base64,{base64_string}",
}
},
{
"type": "text",
"text": "What is the first dragon in the book?",
}
],
},
],
)

print(completion.choices[0].message.content)

OpenAI Fine Tuned Modelsโ€‹

Model NameFunction Call
fine tuned gpt-4-0613response = completion(model="ft:gpt-4-0613", messages=messages)
fine tuned gpt-4o-2024-05-13response = completion(model="ft:gpt-4o-2024-05-13", messages=messages)
fine tuned gpt-3.5-turbo-0125response = completion(model="ft:gpt-3.5-turbo-0125", messages=messages)
fine tuned gpt-3.5-turbo-1106response = completion(model="ft:gpt-3.5-turbo-1106", messages=messages)
fine tuned gpt-3.5-turbo-0613response = completion(model="ft:gpt-3.5-turbo-0613", messages=messages)

Getting Reasoning Content in /chat/completionsโ€‹

GPT-5 models return reasoning content when called via the Responses API. You can call these models via the /chat/completions endpoint by using the openai/responses/ prefix.

response = litellm.completion(
model="openai/responses/gpt-5-mini", # tells litellm to call the model via the Responses API
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
)

Expected Response:

{
"id": "chatcmpl-6382a222-43c9-40c4-856b-22e105d88075",
"created": 1760146746,
"model": "gpt-5-mini",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Paris",
"role": "assistant",
"tool_calls": null,
"function_call": null,
"reasoning_content": "**Identifying the capital**\n\nThe user wants me to think of the capital of France and write it down. That's pretty straightforward: it's Paris. There aren't any safety issues to consider here. I think it would be best to keep it concise, so maybe just \"Paris\" would suffice. I feel confident that I should just stick to that without adding anything else. So, let's write it down!",
"provider_specific_fields": null
}
}
],
"usage": {
"completion_tokens": 7,
"prompt_tokens": 18,
"total_tokens": 25,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0,
"text_tokens": null,
"image_tokens": null
}
}
}

Advanced: Using reasoning_effort with summary fieldโ€‹

By default, reasoning_effort accepts a string value ("low", "medium", "high", "minimal") and only sets the effort level without including a reasoning summary.

To opt-in to the summary feature, you can pass reasoning_effort as a dictionary. Note: The summary field requires your OpenAI organization to have verification status. Using summary without verification will result in a 400 error from OpenAI.

# Option 1: String format (default - no summary)
response = litellm.completion(
model="openai/responses/gpt-5-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="high" # Only sets effort level
)

# Option 2: Dict format (with optional summary - requires org verification)
response = litellm.completion(
model="openai/responses/gpt-5-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort={"effort": "high", "summary": "auto"} # "auto", "detailed", or "concise" (not all supported by all models)
)

Summary field options:

  • "auto": System automatically determines the appropriate summary level based on the model
  • "concise": Provides a shorter summary (not supported by GPT-5 series models)
  • "detailed": Offers a comprehensive reasoning summary

Note: GPT-5 series models support "auto" and "detailed", but do not support "concise". O-series models (o3-pro, o4-mini, o3) support all three options. Some models like o3-mini and o1 do not support reasoning summaries at all.

Supported reasoning_effort values by model:

ModelDefault (when not set)Supported Values
gpt-5mediumminimal, low, medium, high
gpt-5-minimediumminimal, low, medium, high
gpt-5-codexadaptivelow, medium, high (no minimal)
gpt-5-prohighhigh only

Note: gpt-5-pro only accepts reasoning_effort="high". Other values will return an error. When reasoning_effort is not set (None), OpenAI defaults to the value shown in the "Default" column.

See OpenAI Reasoning documentation for more details on organization verification requirements.

OpenAI Chat Completion to Responses API Bridgeโ€‹

Call any Responses API model from OpenAI's /chat/completions endpoint.

import litellm
import os

os.environ["OPENAI_API_KEY"] = "sk-1234"

response = litellm.completion(
model="o3-deep-research-2025-06-26",
messages=[{"role": "user", "content": "What is the capital of France?"}],
tools=[
{"type": "web_search_preview"},
{"type": "code_interpreter", "container": {"type": "auto"}},
],
)
print(response)

OpenAI Audio Transcriptionโ€‹

LiteLLM supports OpenAI Audio Transcription endpoint.

Supported models:

Model NameFunction Call
whisper-1response = completion(model="whisper-1", file=audio_file)
gpt-4o-transcriberesponse = completion(model="gpt-4o-transcribe", file=audio_file)
gpt-4o-mini-transcriberesponse = completion(model="gpt-4o-mini-transcribe", file=audio_file)
from litellm import transcription
import os

# set api keys
os.environ["OPENAI_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="gpt-4o-transcribe", file=audio_file)

print(f"response: {response}")

Advancedโ€‹

Getting OpenAI API Response Headersโ€‹

Set litellm.return_response_headers = True to get raw response headers from OpenAI

You can expect to always get the _response_headers field from litellm.completion(), litellm.embedding() functions

litellm.return_response_headers = True

# /chat/completion
response = completion(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "hi",
}
],
)
print(f"response: {response}")
print("_response_headers=", response._response_headers)

Expected Response Headers from OpenAI

{
"date": "Sat, 20 Jul 2024 22:05:23 GMT",
"content-type": "application/json",
"transfer-encoding": "chunked",
"connection": "keep-alive",
"access-control-allow-origin": "*",
"openai-model": "text-embedding-ada-002",
"openai-organization": "*****",
"openai-processing-ms": "20",
"openai-version": "2020-10-01",
"strict-transport-security": "max-age=15552000; includeSubDomains; preload",
"x-ratelimit-limit-requests": "5000",
"x-ratelimit-limit-tokens": "5000000",
"x-ratelimit-remaining-requests": "4999",
"x-ratelimit-remaining-tokens": "4999999",
"x-ratelimit-reset-requests": "12ms",
"x-ratelimit-reset-tokens": "0s",
"x-request-id": "req_cc37487bfd336358231a17034bcfb4d9",
"cf-cache-status": "DYNAMIC",
"set-cookie": "__cf_bm=E_FJY8fdAIMBzBE2RZI2.OkMIO3lf8Hz.ydBQJ9m3q8-1721513123-1.0.1.1-6OK0zXvtd5s9Jgqfz66cU9gzQYpcuh_RLaUZ9dOgxR9Qeq4oJlu.04C09hOTCFn7Hg.k.2tiKLOX24szUE2shw; path=/; expires=Sat, 20-Jul-24 22:35:23 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, *cfuvid=SDndIImxiO3U0aBcVtoy1TBQqYeQtVDo1L6*Nlpp7EU-1721513123215-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
"x-content-type-options": "nosniff",
"server": "cloudflare",
"cf-ray": "8a66409b4f8acee9-SJC",
"content-encoding": "br",
"alt-svc": "h3=\":443\"; ma=86400"
}

Parallel Function callingโ€‹

See a detailed walthrough of parallel function calling with litellm here

import litellm
import json
# set openai api key
import os
os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""
if "tokyo" in location.lower():
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
elif "san francisco" in location.lower():
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
elif "paris" in location.lower():
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
else:
return json.dumps({"location": location, "temperature": "unknown"})

messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]

response = litellm.completion(
model="gpt-3.5-turbo-1106",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
print("\nLLM Response1:\n", response)
response_message = response.choices[0].message
tool_calls = response.choices[0].message.tool_calls

Setting extra_headers for completion callsโ€‹

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
model = "gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
extra_headers={"AI-Resource Group": "ishaan-resource"}
)

Setting Organization-ID for completion callsโ€‹

This can be set in one of the following ways:

  • Environment Variable OPENAI_ORGANIZATION
  • Params to litellm.completion(model=model, organization="your-organization-id")
  • Set as litellm.organization="your-organization-id"
import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_ORGANIZATION"] = "your-org-id" # OPTIONAL

response = completion(
model = "gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Set ssl_verify=Falseโ€‹

This is done by setting your own httpx.Client

  • For litellm.completion set litellm.client_session=httpx.Client(verify=False)
  • For litellm.acompletion set litellm.aclient_session=AsyncClient.Client(verify=False)
import litellm, httpx

# for completion
litellm.client_session = httpx.Client(verify=False)
response = litellm.completion(
model="gpt-3.5-turbo",
messages=messages,
)

# for acompletion
litellm.aclient_session = httpx.AsyncClient(verify=False)
response = litellm.acompletion(
model="gpt-3.5-turbo",
messages=messages,
)

Using OpenAI Proxy with LiteLLMโ€‹

import os 
import litellm
from litellm import completion

os.environ["OPENAI_API_KEY"] = ""

# set custom api base to your proxy
# either set .env or litellm.api_base
# os.environ["OPENAI_BASE_URL"] = "https://your_host/v1"
litellm.api_base = "https://your_host/v1"


messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion("openai/your-model-name", messages)

If you need to set api_base dynamically, just pass it in completions instead - completions(...,api_base="your-proxy-api-base")

For more check out setting API Base/Keys

Forwarding Org ID for Proxy requestsโ€‹

Forward openai Org ID's from the client to OpenAI with forward_openai_org_id param.

  1. Setup config.yaml
model_list:
- model_name: "gpt-3.5-turbo"
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

general_settings:
forward_openai_org_id: true # ๐Ÿ‘ˆ KEY CHANGE
  1. Start Proxy
litellm --config config.yaml --detailed_debug

# RUNNING on http://0.0.0.0:4000
  1. Make OpenAI call
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
organization="my-special-org",
base_url="http://0.0.0.0:4000"
)

client.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])

In your logs you should see the forwarded org id

LiteLLM:DEBUG: utils.py:255 - Request to litellm:
LiteLLM:DEBUG: utils.py:255 - litellm.acompletion(... organization='my-special-org',)

GPT-5 Pro Special Notesโ€‹

GPT-5 Pro is OpenAI's most advanced reasoning model with unique characteristics:

  • Responses API Only: GPT-5 Pro is only available through the /v1/responses endpoint
  • No Streaming: Does not support streaming responses
  • High Reasoning: Designed for complex reasoning tasks with highest effort reasoning
  • Context Window: 400,000 tokens input, 272,000 tokens output
  • Pricing: $15.00 input / $120.00 output per 1M tokens (Standard), $7.50 input / $60.00 output (Batch)
  • Tools: Supports Web Search, File Search, Image Generation, MCP (but not Code Interpreter or Computer Use)
  • Modalities: Text and Image input, Text output only
# GPT-5 Pro usage example
response = completion(
model="gpt-5-pro",
messages=[{"role": "user", "content": "Solve this complex reasoning problem..."}]
)

Video Generationโ€‹

LiteLLM supports OpenAI's video generation models including Sora.

For detailed documentation on video generation, see OpenAI Video Generation โ†’