Evaluation Runs

Run an Evaluation Test Case

agents.evaluation_runs.create() -> EvaluationRunCreateResponse

post/v2/gen-ai/evaluation_runs

Retrieve Information About an Existing Evaluation Run

agents.evaluation_runs.retrieve() -> EvaluationRunRetrieveResponse

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}

Retrieve Results of an Evaluation Run

agents.evaluation_runs.list_results(, ) -> EvaluationRunListResultsResponse

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results

Retrieve Results of an Evaluation Run Prompt

agents.evaluation_runs.retrieve_results(, ) -> EvaluationRunRetrieveResultsResponse

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}

ModelsExpand Collapse

class APIEvaluationMetric: …

category: Optional[Literal["METRIC_CATEGORY_UNSPECIFIED", "METRIC_CATEGORY_CORRECTNESS", "METRIC_CATEGORY_USER_OUTCOMES", 3 more]]

Accepts one of the following:

"METRIC_CATEGORY_UNSPECIFIED"

"METRIC_CATEGORY_CORRECTNESS"

"METRIC_CATEGORY_USER_OUTCOMES"

"METRIC_CATEGORY_SAFETY_AND_SECURITY"

"METRIC_CATEGORY_CONTEXT_QUALITY"

"METRIC_CATEGORY_MODEL_FIT"

description: Optional[str]

inverted: Optional[bool]

If true, the metric is inverted, meaning that a lower value is better.

is_metric_goal: Optional[bool]

metric_name: Optional[str]

metric_rank: Optional[int]

formatint64

metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]

Accepts one of the following:

"METRIC_TYPE_UNSPECIFIED"

"METRIC_TYPE_GENERAL_QUALITY"

"METRIC_TYPE_RAG_AND_TOOL"

metric_uuid: Optional[str]

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

range_max: Optional[float]

The maximum value for the metric.

formatfloat

range_min: Optional[float]

The minimum value for the metric.

formatfloat

class APIEvaluationMetricResult: …

error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value: Optional[float]

The value of the metric as a number.

formatdouble

reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

class APIEvaluationPrompt: …

ground_truth: Optional[str]

The ground truth for the prompt.

input: Optional[str]

input_tokens: Optional[str]

The number of input tokens used in the prompt.

formatuint64

output: Optional[str]

output_tokens: Optional[str]

The number of output tokens used in the prompt.

formatuint64

prompt_chunks: Optional[List[PromptChunk]]

The list of prompt chunks.

chunk_usage_pct: Optional[float]

The usage percentage of the chunk.

formatdouble

chunk_used: Optional[bool]

Indicates if the chunk was used in the prompt.

index_uuid: Optional[str]

The index uuid (Knowledge Base) of the chunk.

source_name: Optional[str]

The source name for the chunk, e.g., the file name or document title.

text: Optional[str]

Text content of the chunk.

prompt_id: Optional[int]

Prompt ID

formatint64

prompt_level_metric_results: Optional[List[APIEvaluationMetricResult]]

The metric results for the prompt.

error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value: Optional[float]

The value of the metric as a number.

formatdouble

reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

class APIEvaluationRun: …

agent_deleted: Optional[bool]

Whether agent is deleted

agent_name: Optional[str]

Agent name

agent_uuid: Optional[str]

Agent UUID.

agent_version_hash: Optional[str]

Version hash

agent_workspace_uuid: Optional[str]

Agent workspace uuid

created_by_user_email: Optional[str]

created_by_user_id: Optional[str]

formatuint64

error_description: Optional[str]

The error description

evaluation_run_uuid: Optional[str]

Evaluation run UUID.

evaluation_test_case_workspace_uuid: Optional[str]

Evaluation test case workspace uuid

finished_at: Optional[datetime]

Run end time.

formatdate-time

pass_status: Optional[bool]

The pass status of the evaluation run based on the star metric.

queued_at: Optional[datetime]

Run queued time.

formatdate-time

run_level_metric_results: Optional[List[APIEvaluationMetricResult]]

error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value: Optional[float]

The value of the metric as a number.

formatdouble

reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

run_name: Optional[str]

Run name.

star_metric_result: Optional[APIEvaluationMetricResult]

error_description: Optional[str]

Error description if the metric could not be calculated.

metric_name: Optional[str]

Metric name

metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

number_value: Optional[float]

The value of the metric as a number.

formatdouble

reasoning: Optional[str]

Reasoning of the metric result.

string_value: Optional[str]

The value of the metric as a string.

started_at: Optional[datetime]

Run start time.

formatdate-time

status: Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]

Evaluation Run Statuses

Accepts one of the following:

"EVALUATION_RUN_STATUS_UNSPECIFIED"

"EVALUATION_RUN_QUEUED"

"EVALUATION_RUN_RUNNING_DATASET"

"EVALUATION_RUN_EVALUATING_RESULTS"

"EVALUATION_RUN_CANCELLING"

"EVALUATION_RUN_CANCELLED"

"EVALUATION_RUN_SUCCESSFUL"

"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"

"EVALUATION_RUN_FAILED"

test_case_description: Optional[str]

Test case description.

test_case_name: Optional[str]

Test case name.

test_case_uuid: Optional[str]

Test-case UUID.

test_case_version: Optional[int]

Test-case-version.

formatint64