Evaluation Runs
Evaluation Runs
Run an Evaluation Test Case
Retrieve Results of an Evaluation Run
Retrieve Information About an Existing Evaluation Run
Retrieve Results of an Evaluation Run Prompt
ModelsExpand Collapse
APIEvaluationMetric = object { description, inverted, metric_name, 5 more }
inverted: optional boolean
If true, the metric is inverted, meaning that a lower value is better.
metric_type: optional "METRIC_TYPE_UNSPECIFIED" or "METRIC_TYPE_GENERAL_QUALITY" or "METRIC_TYPE_RAG_AND_TOOL"
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
range_max: optional number
The maximum value for the metric.
range_min: optional number
The minimum value for the metric.
APIEvaluationMetricResult = object { error_description, metric_name, metric_value_type, 3 more }
error_description: optional string
Error description if the metric could not be calculated.
metric_name: optional string
Metric name
metric_value_type: optional "METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"
number_value: optional number
The value of the metric as a number.
reasoning: optional string
Reasoning of the metric result.
string_value: optional string
The value of the metric as a string.
APIEvaluationPrompt = object { ground_truth, input, input_tokens, 5 more }
ground_truth: optional string
The ground truth for the prompt.
input_tokens: optional string
The number of input tokens used in the prompt.
output_tokens: optional string
The number of output tokens used in the prompt.
prompt_chunks: optional array of object { chunk_usage_pct, chunk_used, index_uuid, 2 more }
The list of prompt chunks.
chunk_usage_pct: optional number
The usage percentage of the chunk.
chunk_used: optional boolean
Indicates if the chunk was used in the prompt.
index_uuid: optional string
The index uuid (Knowledge Base) of the chunk.
source_name: optional string
The source name for the chunk, e.g., the file name or document title.
text: optional string
Text content of the chunk.
prompt_id: optional number
Prompt ID
The metric results for the prompt.
APIEvaluationRun = object { agent_deleted, agent_name, agent_uuid, 19 more }
agent_deleted: optional boolean
Whether agent is deleted
agent_name: optional string
Agent name
agent_uuid: optional string
Agent UUID.
agent_version_hash: optional string
Version hash
agent_workspace_uuid: optional string
Agent workspace uuid
error_description: optional string
The error description
evaluation_run_uuid: optional string
Evaluation run UUID.
evaluation_test_case_workspace_uuid: optional string
Evaluation test case workspace uuid
finished_at: optional string
Run end time.
pass_status: optional boolean
The pass status of the evaluation run based on the star metric.
queued_at: optional string
Run queued time.
run_name: optional string
Run name.
started_at: optional string
Run start time.
status: optional "EVALUATION_RUN_STATUS_UNSPECIFIED" or "EVALUATION_RUN_QUEUED" or "EVALUATION_RUN_RUNNING_DATASET" or 6 more
Evaluation Run Statuses
test_case_description: optional string
Test case description.
test_case_name: optional string
Test case name.
test_case_uuid: optional string
Test-case UUID.
test_case_version: optional number
Test-case-version.