Skip to content
  • Auto
  • Light
  • Dark

Serverless Inference

Gradient allows access to serverless inference models. You can access models by using providing an inference key.

You can generate a new model access key for serverless inference in the console.

For example, access serverless inference using the SDK:

Python
import os
from do_gradientai import GradientAI
inference_client = GradientAI(
inference_key=os.environ.get(
"GRADIENTAI_INFERENCE_KEY"
),
)
inference_response = inference_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the capital of France?",
}
],
model="llama3.3-70b-instruct",
)
print(inference_response.choices[0].message.content)

The async client uses the exact same interface.

Python
import os
from do_gradientai import AsyncGradientAI
inference_client = AsyncGradientAI(
inference_key=os.environ.get(
"GRADIENTAI_INFERENCE_KEY"
),
)
inference_response = await inference_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the capital of France?",
}
],
model="llama3.3-70b-instruct",
)
print(inference_response.choices[0].message.content)