Serverless Inference
Gradient allows access to serverless inference models. You can access models by using providing an inference key.
Model Access Key
Section titled “Model Access Key”You can generate a new model access key for serverless inference in the console.
Examples
Section titled “Examples”Sync Client
Section titled “Sync Client”For example, access serverless inference using the SDK:
import osfrom do_gradientai import GradientAI
inference_client = GradientAI( inference_key=os.environ.get( "GRADIENTAI_INFERENCE_KEY" ),)
inference_response = inference_client.chat.completions.create( messages=[ { "role": "user", "content": "What is the capital of France?", } ], model="llama3.3-70b-instruct",)
print(inference_response.choices[0].message.content)
Async Client
Section titled “Async Client”The async client uses the exact same interface.
import osfrom do_gradientai import AsyncGradientAI
inference_client = AsyncGradientAI( inference_key=os.environ.get( "GRADIENTAI_INFERENCE_KEY" ),)
inference_response = await inference_client.chat.completions.create( messages=[ { "role": "user", "content": "What is the capital of France?", } ], model="llama3.3-70b-instruct",)
print(inference_response.choices[0].message.content)