Evaluation Test Cases

List Evaluation Test Cases

client.agents.evaluationTestCases.list(?): EvaluationTestCaseListResponse { evaluation_test_cases }

get/v2/gen-ai/evaluation_test_cases

Create Evaluation Test Case.

client.agents.evaluationTestCases.create(?, ?): EvaluationTestCaseCreateResponse { test_case_uuid }

post/v2/gen-ai/evaluation_test_cases

List Evaluation Runs by Test Case

client.agents.evaluationTestCases.listEvaluationRuns(, ?, ?): EvaluationTestCaseListEvaluationRunsResponse { evaluation_runs }

get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs

Retrieve Information About an Existing Evaluation Test Case

client.agents.evaluationTestCases.retrieve(, ?, ?): EvaluationTestCaseRetrieveResponse { evaluation_test_case }

get/v2/gen-ai/evaluation_test_cases/{test_case_uuid}

Update an Evaluation Test Case.

client.agents.evaluationTestCases.update(, ?, ?): EvaluationTestCaseUpdateResponse { test_case_uuid, version }

put/v2/gen-ai/evaluation_test_cases/{test_case_uuid}

ModelsExpand Collapse

APIEvaluationTestCase { archived_at, created_at, created_by_user_email, 15 more }

archived_at?: string

formatdate-time

created_at?: string

formatdate-time

created_by_user_email?: string

created_by_user_id?: string

formatuint64

dataset?: Dataset { created_at, dataset_name, dataset_uuid, 3 more }

created_at?: string

Time created at.

formatdate-time

dataset_name?: string

Name of the dataset.

dataset_uuid?: string

UUID of the dataset.

file_size?: string

The size of the dataset uploaded file in bytes.

formatuint64

has_ground_truth?: boolean

Does the dataset have a ground truth column?

row_count?: number

Number of rows in the dataset.

formatint64

dataset_name?: string

dataset_uuid?: string

description?: string

latest_version_number_of_runs?: number

formatint32

metrics?: Array<APIEvaluationMetric { category, description, inverted, 8 more } >

category?: "METRIC_CATEGORY_UNSPECIFIED" | "METRIC_CATEGORY_CORRECTNESS" | "METRIC_CATEGORY_USER_OUTCOMES" | 3 more

Accepts one of the following:

"METRIC_CATEGORY_UNSPECIFIED"

"METRIC_CATEGORY_CORRECTNESS"

"METRIC_CATEGORY_USER_OUTCOMES"

"METRIC_CATEGORY_SAFETY_AND_SECURITY"

"METRIC_CATEGORY_CONTEXT_QUALITY"

"METRIC_CATEGORY_MODEL_FIT"

description?: string

inverted?: boolean

If true, the metric is inverted, meaning that a lower value is better.

is_metric_goal?: boolean

metric_name?: string

metric_rank?: number

formatint64

metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"

Accepts one of the following:

"METRIC_TYPE_UNSPECIFIED"

"METRIC_TYPE_GENERAL_QUALITY"

"METRIC_TYPE_RAG_AND_TOOL"

metric_uuid?: string

metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"

Accepts one of the following:

"METRIC_VALUE_TYPE_UNSPECIFIED"

"METRIC_VALUE_TYPE_NUMBER"

"METRIC_VALUE_TYPE_STRING"

"METRIC_VALUE_TYPE_PERCENTAGE"

range_max?: number

The maximum value for the metric.

formatfloat

range_min?: number

The minimum value for the metric.

formatfloat

name?: string

star_metric?: APIStarMetric { metric_uuid, name, success_threshold, success_threshold_pct }

metric_uuid?: string

name?: string

success_threshold?: number

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat

success_threshold_pct?: number

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32

test_case_uuid?: string

total_runs?: number

formatint32

updated_at?: string

formatdate-time

updated_by_user_email?: string

updated_by_user_id?: string

formatuint64

version?: number

formatint64

APIStarMetric { metric_uuid, name, success_threshold, success_threshold_pct }

metric_uuid?: string

name?: string

success_threshold?: number

The success threshold for the star metric. This is a value that the metric must reach to be considered successful.

formatfloat

success_threshold_pct?: number

The success threshold for the star metric. This is a percentage value between 0 and 100.

formatint32