Skip to content
  • Auto
  • Light
  • Dark

Evaluation Runs

Evaluation Runs

Run an Evaluation Test Case
client.agents.evaluationRuns.create(EvaluationRunCreateParams { agent_uuids, run_name, test_case_uuid } body?, RequestOptionsoptions?): EvaluationRunCreateResponse { evaluation_run_uuids }
post/v2/gen-ai/evaluation_runs
Retrieve Results of an Evaluation Run
client.agents.evaluationRuns.listResults(stringevaluationRunUuid, EvaluationRunListResultsParams { page, per_page } query?, RequestOptionsoptions?): EvaluationRunListResultsResponse { evaluation_run, links, meta, prompts }
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results
Retrieve Information About an Existing Evaluation Run
client.agents.evaluationRuns.retrieve(stringevaluationRunUuid, RequestOptionsoptions?): EvaluationRunRetrieveResponse { evaluation_run }
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}
Retrieve Results of an Evaluation Run Prompt
client.agents.evaluationRuns.retrieveResults(numberpromptID, EvaluationRunRetrieveResultsParams { evaluation_run_uuid } params, RequestOptionsoptions?): EvaluationRunRetrieveResultsResponse { prompt }
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}
ModelsExpand Collapse
APIEvaluationMetric { description, inverted, metric_name, 5 more }
description?: string
inverted?: boolean

If true, the metric is inverted, meaning that a lower value is better.

metric_name?: string
metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"
Accepts one of the following:
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuid?: string
metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_max?: number

The maximum value for the metric.

formatfloat
range_min?: number

The minimum value for the metric.

formatfloat
APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more }
error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value?: number

The value of the metric as a number.

formatdouble
reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

APIEvaluationPrompt { ground_truth, input, input_tokens, 5 more }
ground_truth?: string

The ground truth for the prompt.

input?: string
input_tokens?: string

The number of input tokens used in the prompt.

formatuint64
output?: string
output_tokens?: string

The number of output tokens used in the prompt.

formatuint64
prompt_chunks?: Array<PromptChunk>

The list of prompt chunks.

chunk_usage_pct?: number

The usage percentage of the chunk.

formatdouble
chunk_used?: boolean

Indicates if the chunk was used in the prompt.

index_uuid?: string

The index uuid (Knowledge Base) of the chunk.

source_name?: string

The source name for the chunk, e.g., the file name or document title.

text?: string

Text content of the chunk.

prompt_id?: number

Prompt ID

formatint64
prompt_level_metric_results?: Array<APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more } >

The metric results for the prompt.

error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value?: number

The value of the metric as a number.

formatdouble
reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

APIEvaluationRun { agent_deleted, agent_name, agent_uuid, 19 more }
agent_deleted?: boolean

Whether agent is deleted

agent_name?: string

Agent name

agent_uuid?: string

Agent UUID.

agent_version_hash?: string

Version hash

agent_workspace_uuid?: string

Agent workspace uuid

created_by_user_email?: string
created_by_user_id?: string
error_description?: string

The error description

evaluation_run_uuid?: string

Evaluation run UUID.

evaluation_test_case_workspace_uuid?: string

Evaluation test case workspace uuid

finished_at?: string

Run end time.

formatdate-time
pass_status?: boolean

The pass status of the evaluation run based on the star metric.

queued_at?: string

Run queued time.

formatdate-time
run_level_metric_results?: Array<APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more } >
error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value?: number

The value of the metric as a number.

formatdouble
reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

run_name?: string

Run name.

star_metric_result?: APIEvaluationMetricResult { error_description, metric_name, metric_value_type, 3 more }
error_description?: string

Error description if the metric could not be calculated.

metric_name?: string

Metric name

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_value?: number

The value of the metric as a number.

formatdouble
reasoning?: string

Reasoning of the metric result.

string_value?: string

The value of the metric as a string.

started_at?: string

Run start time.

formatdate-time
status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more

Evaluation Run Statuses

Accepts one of the following:
"EVALUATION_RUN_STATUS_UNSPECIFIED"
"EVALUATION_RUN_QUEUED"
"EVALUATION_RUN_RUNNING_DATASET"
"EVALUATION_RUN_EVALUATING_RESULTS"
"EVALUATION_RUN_CANCELLING"
"EVALUATION_RUN_CANCELLED"
"EVALUATION_RUN_SUCCESSFUL"
"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"
"EVALUATION_RUN_FAILED"
test_case_description?: string

Test case description.

test_case_name?: string

Test case name.

test_case_uuid?: string

Test-case UUID.

test_case_version?: number

Test-case-version.

formatint64