Evaluation Runs
Evaluation Runs
Run an Evaluation Test Case
Retrieve Results of an Evaluation Run
Retrieve Information About an Existing Evaluation Run
Retrieve Results of an Evaluation Run Prompt
ModelsExpand Collapse
class APIEvaluationMetric: …
inverted: Optional[bool]
If true, the metric is inverted, meaning that a lower value is better.
metric_type: Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
range_max: Optional[float]
The maximum value for the metric.
range_min: Optional[float]
The minimum value for the metric.
class APIEvaluationMetricResult: …
error_description: Optional[str]
Error description if the metric could not be calculated.
metric_name: Optional[str]
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
number_value: Optional[float]
The value of the metric as a number.
reasoning: Optional[str]
Reasoning of the metric result.
string_value: Optional[str]
The value of the metric as a string.
class APIEvaluationPrompt: …
ground_truth: Optional[str]
The ground truth for the prompt.
input_tokens: Optional[str]
The number of input tokens used in the prompt.
output_tokens: Optional[str]
The number of output tokens used in the prompt.
prompt_chunks: Optional[List[PromptChunk]]
The list of prompt chunks.
chunk_usage_pct: Optional[float]
The usage percentage of the chunk.
chunk_used: Optional[bool]
Indicates if the chunk was used in the prompt.
index_uuid: Optional[str]
The index uuid (Knowledge Base) of the chunk.
source_name: Optional[str]
The source name for the chunk, e.g., the file name or document title.
text: Optional[str]
Text content of the chunk.
prompt_id: Optional[int]
Prompt ID
The metric results for the prompt.
error_description: Optional[str]
Error description if the metric could not be calculated.
metric_name: Optional[str]
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
number_value: Optional[float]
The value of the metric as a number.
reasoning: Optional[str]
Reasoning of the metric result.
string_value: Optional[str]
The value of the metric as a string.
class APIEvaluationRun: …
agent_deleted: Optional[bool]
Whether agent is deleted
agent_name: Optional[str]
Agent name
agent_uuid: Optional[str]
Agent UUID.
agent_version_hash: Optional[str]
Version hash
agent_workspace_uuid: Optional[str]
Agent workspace uuid
error_description: Optional[str]
The error description
evaluation_run_uuid: Optional[str]
Evaluation run UUID.
evaluation_test_case_workspace_uuid: Optional[str]
Evaluation test case workspace uuid
finished_at: Optional[datetime]
Run end time.
pass_status: Optional[bool]
The pass status of the evaluation run based on the star metric.
queued_at: Optional[datetime]
Run queued time.
error_description: Optional[str]
Error description if the metric could not be calculated.
metric_name: Optional[str]
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
number_value: Optional[float]
The value of the metric as a number.
reasoning: Optional[str]
Reasoning of the metric result.
string_value: Optional[str]
The value of the metric as a string.
run_name: Optional[str]
Run name.
star_metric_result: Optional[APIEvaluationMetricResult]
error_description: Optional[str]
Error description if the metric could not be calculated.
metric_name: Optional[str]
Metric name
metric_value_type: Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
number_value: Optional[float]
The value of the metric as a number.
reasoning: Optional[str]
Reasoning of the metric result.
string_value: Optional[str]
The value of the metric as a string.
started_at: Optional[datetime]
Run start time.
status: Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]
Evaluation Run Statuses
test_case_description: Optional[str]
Test case description.
test_case_name: Optional[str]
Test case name.
test_case_uuid: Optional[str]
Test-case UUID.
test_case_version: Optional[int]
Test-case-version.