Skip to content
  • Auto
  • Light
  • Dark

Evaluation Test Cases

Evaluation Test Cases

Create Evaluation Test Case.
client.agents.evaluationTestCases.create(EvaluationTestCaseCreateParams { dataset_uuid, description, metrics, 3 more } body?, RequestOptionsoptions?): EvaluationTestCaseCreateResponse { test_case_uuid }
post/v2/gen-ai/evaluation_test_cases
List Evaluation Test Cases
client.agents.evaluationTestCases.list(RequestOptionsoptions?): EvaluationTestCaseListResponse { evaluation_test_cases }
get/v2/gen-ai/evaluation_test_cases
List Evaluation Runs by Test Case
client.agents.evaluationTestCases.listEvaluationRuns(stringevaluationTestCaseUuid, EvaluationTestCaseListEvaluationRunsParams { evaluation_test_case_version } query?, RequestOptionsoptions?): EvaluationTestCaseListEvaluationRunsResponse { evaluation_runs }
get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs
Retrieve Information About an Existing Evaluation Test Case
client.agents.evaluationTestCases.retrieve(stringtestCaseUuid, EvaluationTestCaseRetrieveParams { evaluation_test_case_version } query?, RequestOptionsoptions?): EvaluationTestCaseRetrieveResponse { evaluation_test_case }
get/v2/gen-ai/evaluation_test_cases/{test_case_uuid}
Update an Evaluation Test Case.
client.agents.evaluationTestCases.update(stringtestCaseUuid, EvaluationTestCaseUpdateParams { dataset_uuid, description, metrics, 3 more } body?, RequestOptionsoptions?): EvaluationTestCaseUpdateResponse { test_case_uuid, version }
put/v2/gen-ai/evaluation_test_cases/{test_case_uuid}
ModelsExpand Collapse
APIEvaluationTestCase { archived_at, created_at, created_by_user_email, 15 more }
archived_at?: string
created_at?: string
created_by_user_email?: string
created_by_user_id?: string
dataset?: Dataset { created_at, dataset_name, dataset_uuid, 3 more }
created_at?: string

Time created at.

formatdate-time
dataset_name?: string

Name of the dataset.

dataset_uuid?: string

UUID of the dataset.

file_size?: string

The size of the dataset uploaded file in bytes.

formatuint64
has_ground_truth?: boolean

Does the dataset have a ground truth column?

row_count?: number

Number of rows in the dataset.

formatint64
dataset_name?: string
dataset_uuid?: string
description?: string
latest_version_number_of_runs?: number
metrics?: Array<APIEvaluationMetric { description, inverted, metric_name, 5 more } >
description?: string
inverted?: boolean

If true, the metric is inverted, meaning that a lower value is better.

metric_name?: string
metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"
Accepts one of the following:
"METRIC_TYPE_UNSPECIFIED"
"METRIC_TYPE_GENERAL_QUALITY"
"METRIC_TYPE_RAG_AND_TOOL"
metric_uuid?: string
metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"
Accepts one of the following:
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
range_max?: number

The maximum value for the metric.

formatfloat
range_min?: number

The minimum value for the metric.

formatfloat
name?: string
star_metric?: APIStarMetric { metric_uuid, name, success_threshold, success_threshold_pct }
metric_uuid?: string
name?: string
success_threshold?: number

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat
success_threshold_pct?: number

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32
test_case_uuid?: string
total_runs?: number
updated_at?: string
updated_by_user_email?: string
updated_by_user_id?: string
version?: number
APIStarMetric { metric_uuid, name, success_threshold, success_threshold_pct }
metric_uuid?: string
name?: string
success_threshold?: number

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat
success_threshold_pct?: number

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32