Serverless Inference

Gradient allows access to serverless inference models. You can access models by using providing an inference key.

Model Access Key

You can generate a new model access key for serverless inference in the console.

Examples

Sync Client

For example, access serverless inference using the SDK:

import os
from do_gradientai import GradientAI

inference_client = GradientAI(
    inference_key=os.environ.get(
        "GRADIENTAI_INFERENCE_KEY"
    ),
)

inference_response = inference_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?",
        }
    ],
    model="llama3.3-70b-instruct",
)

print(inference_response.choices[0].message.content)

Async Client

The async client uses the exact same interface.

import os
from do_gradientai import AsyncGradientAI

inference_client = AsyncGradientAI(
    inference_key=os.environ.get(
        "GRADIENTAI_INFERENCE_KEY"
    ),
)

inference_response = await inference_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?",
        }
    ],
    model="llama3.3-70b-instruct",
)

print(inference_response.choices[0].message.content)