Skip to content
  • Auto
  • Light
  • Dark

Evaluation Runs

Evaluation Runs

Evaluation Runs

Run an Evaluation Test Case
agents.evaluation_runs.create(EvaluationRunCreateParams**kwargs) -> evaluation_run_uuidslistEvaluationRunCreateResponse
post/v2/gen-ai/evaluation_runs
Retrieve Results of an Evaluation Run
agents.evaluation_runs.list_results(strevaluation_run_uuid, EvaluationRunListResultsParams**kwargs) -> evaluation_runAPIEvaluationRunlinksAPILinksmetaAPIMetapromptslistEvaluationRunListResultsResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results
Retrieve Information About an Existing Evaluation Run
agents.evaluation_runs.retrieve(strevaluation_run_uuid) -> evaluation_runAPIEvaluationRunEvaluationRunRetrieveResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}
Retrieve Results of an Evaluation Run Prompt
agents.evaluation_runs.retrieve_results(intprompt_id, EvaluationRunRetrieveResultsParams**kwargs) -> promptAPIEvaluationPromptEvaluationRunRetrieveResultsResponse
get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}
API Evaluation Metric
APIEvaluationMetricclass
ShowShow
descriptionstr
optional
invertedbool
optional

If true, the metric is inverted, meaning that a lower value is better.

metric_namestr
optional
metric_typeliteral
optional
Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]
Hide ParametersShow Parameters
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuidstr
optional
metric_value_typeliteral
optional
Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_maxfloat
optional

The maximum value for the metric.

formatfloat
range_minfloat
optional

The minimum value for the metric.

formatfloat
API Evaluation Metric Result
APIEvaluationMetricResultclass
ShowShow
error_descriptionstr
optional

Error description if the metric could not be calculated.

metric_namestr
optional

Metric name

metric_value_typeliteral
optional
Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_valuefloat
optional

The value of the metric as a number.

formatdouble
reasoningstr
optional

Reasoning of the metric result.

string_valuestr
optional

The value of the metric as a string.

API Evaluation Prompt
APIEvaluationPromptclass
ShowShow
ground_truthstr
optional

The ground truth for the prompt.

inputstr
optional
input_tokensstr
optional

The number of input tokens used in the prompt.

formatuint64
outputstr
optional
output_tokensstr
optional

The number of output tokens used in the prompt.

formatuint64
prompt_chunkslist
optional
Optional[List[PromptChunk]]

The list of prompt chunks.

Hide ParametersShow Parameters
chunk_usage_pctfloat
optional

The usage percentage of the chunk.

formatdouble
chunk_usedbool
optional

Indicates if the chunk was used in the prompt.

index_uuidstr
optional

The index uuid (Knowledge Base) of the chunk.

source_namestr
optional

The source name for the chunk, e.g., the file name or document title.

textstr
optional

Text content of the chunk.

prompt_idint
optional

Prompt ID

formatint64
prompt_level_metric_resultslist
optional
Optional[List[error_descriptionstrmetric_namestrmetric_value_typeliteralnumber_valuefloatreasoningstrstring_valuestrAPIEvaluationMetricResult]]

The metric results for the prompt.

Hide ParametersShow Parameters
error_descriptionstr
optional

Error description if the metric could not be calculated.

metric_namestr
optional

Metric name

metric_value_typeliteral
optional
Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_valuefloat
optional

The value of the metric as a number.

formatdouble
reasoningstr
optional

Reasoning of the metric result.

string_valuestr
optional

The value of the metric as a string.

API Evaluation Run
APIEvaluationRunclass
ShowShow
agent_deletedbool
optional

Whether agent is deleted

agent_namestr
optional

Agent name

agent_uuidstr
optional

Agent UUID.

agent_version_hashstr
optional

Version hash

agent_workspace_uuidstr
optional

Agent workspace uuid

created_by_user_emailstr
optional
created_by_user_idstr
optional
formatuint64
error_descriptionstr
optional

The error description

evaluation_run_uuidstr
optional

Evaluation run UUID.

evaluation_test_case_workspace_uuidstr
optional

Evaluation test case workspace uuid

finished_atdatetime
optional

Run end time.

formatdate-time
pass_statusbool
optional

The pass status of the evaluation run based on the star metric.

queued_atdatetime
optional

Run queued time.

formatdate-time
run_level_metric_resultslist
optional
Optional[List[error_descriptionstrmetric_namestrmetric_value_typeliteralnumber_valuefloatreasoningstrstring_valuestrAPIEvaluationMetricResult]]
Hide ParametersShow Parameters
error_descriptionstr
optional

Error description if the metric could not be calculated.

metric_namestr
optional

Metric name

metric_value_typeliteral
optional
Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_valuefloat
optional

The value of the metric as a number.

formatdouble
reasoningstr
optional

Reasoning of the metric result.

string_valuestr
optional

The value of the metric as a string.

run_namestr
optional

Run name.

star_metric_resultAPIEvaluationMetricResult
optional
started_atdatetime
optional

Run start time.

formatdate-time
statusliteral
optional
Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]

Evaluation Run Statuses

Hide ParametersShow Parameters
"EVALUATION_RUN_STATUS_UNSPECIFIED"
"EVALUATION_RUN_QUEUED"
"EVALUATION_RUN_RUNNING_DATASET"
"EVALUATION_RUN_EVALUATING_RESULTS"
"EVALUATION_RUN_CANCELLING"
"EVALUATION_RUN_CANCELLED"
"EVALUATION_RUN_SUCCESSFUL"
"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"
"EVALUATION_RUN_FAILED"
test_case_descriptionstr
optional

Test case description.

test_case_namestr
optional

Test case name.

test_case_uuidstr
optional

Test-case UUID.

test_case_versionint
optional

Test-case-version.

formatint64