# Evaluation Test Cases ## Create `agents.evaluation_test_cases.create(EvaluationTestCaseCreateParams**kwargs) -> EvaluationTestCaseCreateResponse` **post** `/v2/gen-ai/evaluation_test_cases` To create an evaluation test-case send a POST request to `/v2/gen-ai/evaluation_test_cases`. ### Parameters - **dataset\_uuid:** `str` Dataset against which the test‑case is executed. - **description:** `str` Description of the test case. - **metrics:** `List[str]` Full metric list to use for evaluation test case. - **name:** `str` Name of the test case. - **star\_metric:** `APIStarMetricParam` - **workspace\_uuid:** `str` The workspace uuid. ### Returns - `class EvaluationTestCaseCreateResponse` - **test\_case\_uuid:** `Optional[str]` Test‑case UUID. ### Example ```python from gradient import Gradient client = Gradient() evaluation_test_case = client.agents.evaluation_test_cases.create() print(evaluation_test_case.test_case_uuid) ``` ## Retrieve `agents.evaluation_test_cases.retrieve(strtest_case_uuid, EvaluationTestCaseRetrieveParams**kwargs) -> EvaluationTestCaseRetrieveResponse` **get** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}` To retrive information about an existing evaluation test case, send a GET request to `/v2/gen-ai/evaluation_test_case/{test_case_uuid}`. ### Parameters - **test\_case\_uuid:** `str` - **evaluation\_test\_case\_version:** `int` Version of the test case. ### Returns - `class EvaluationTestCaseRetrieveResponse` - **evaluation\_test\_case:** `Optional[APIEvaluationTestCase]` ### Example ```python from gradient import Gradient client = Gradient() evaluation_test_case = client.agents.evaluation_test_cases.retrieve( test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"", ) print(evaluation_test_case.evaluation_test_case) ``` ## Update `agents.evaluation_test_cases.update(strpath_test_case_uuid, EvaluationTestCaseUpdateParams**kwargs) -> EvaluationTestCaseUpdateResponse` **put** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}` To update an evaluation test-case send a PUT request to `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`. ### Parameters - **test\_case\_uuid:** `str` - **dataset\_uuid:** `str` Dataset against which the test‑case is executed. - **description:** `str` Description of the test case. - **metrics:** `Metrics` - **metric\_uuids:** `List[str]` - **name:** `str` Name of the test case. - **star\_metric:** `APIStarMetricParam` - **test\_case\_uuid:** `str` ### Returns - `class EvaluationTestCaseUpdateResponse` - **test\_case\_uuid:** `Optional[str]` - **version:** `Optional[int]` The new verson of the test case. ### Example ```python from gradient import Gradient client = Gradient() evaluation_test_case = client.agents.evaluation_test_cases.update( path_test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"", ) print(evaluation_test_case.test_case_uuid) ``` ## List `agents.evaluation_test_cases.list() -> EvaluationTestCaseListResponse` **get** `/v2/gen-ai/evaluation_test_cases` To list all evaluation test cases, send a GET request to `/v2/gen-ai/evaluation_test_cases`. ### Returns - `class EvaluationTestCaseListResponse` - **evaluation\_test\_cases:** `Optional[List[APIEvaluationTestCase]]` Alternative way of authentication for internal usage only - should not be exposed to public api - **archived\_at:** `Optional[datetime]` - **created\_at:** `Optional[datetime]` - **created\_by\_user\_email:** `Optional[str]` - **created\_by\_user\_id:** `Optional[str]` - **dataset:** `Optional[Dataset]` - **created\_at:** `Optional[datetime]` Time created at. - **dataset\_name:** `Optional[str]` Name of the dataset. - **dataset\_uuid:** `Optional[str]` UUID of the dataset. - **file\_size:** `Optional[str]` The size of the dataset uploaded file in bytes. - **has\_ground\_truth:** `Optional[bool]` Does the dataset have a ground truth column? - **row\_count:** `Optional[int]` Number of rows in the dataset. - **dataset\_name:** `Optional[str]` - **dataset\_uuid:** `Optional[str]` - **description:** `Optional[str]` - **latest\_version\_number\_of\_runs:** `Optional[int]` - **metrics:** `Optional[List[APIEvaluationMetric]]` - **description:** `Optional[str]` - **inverted:** `Optional[bool]` If true, the metric is inverted, meaning that a lower value is better. - **metric\_name:** `Optional[str]` - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - **metric\_uuid:** `Optional[str]` - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **range\_max:** `Optional[float]` The maximum value for the metric. - **range\_min:** `Optional[float]` The minimum value for the metric. - **name:** `Optional[str]` - **star\_metric:** `Optional[APIStarMetric]` - **test\_case\_uuid:** `Optional[str]` - **total\_runs:** `Optional[int]` - **updated\_at:** `Optional[datetime]` - **updated\_by\_user\_email:** `Optional[str]` - **updated\_by\_user\_id:** `Optional[str]` - **version:** `Optional[int]` ### Example ```python from gradient import Gradient client = Gradient() evaluation_test_cases = client.agents.evaluation_test_cases.list() print(evaluation_test_cases.evaluation_test_cases) ``` ## List Evaluation Runs `agents.evaluation_test_cases.list_evaluation_runs(strevaluation_test_case_uuid, EvaluationTestCaseListEvaluationRunsParams**kwargs) -> EvaluationTestCaseListEvaluationRunsResponse` **get** `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs` To list all evaluation runs by test case, send a GET request to `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`. ### Parameters - **evaluation\_test\_case\_uuid:** `str` - **evaluation\_test\_case\_version:** `int` Version of the test case. ### Returns - `class EvaluationTestCaseListEvaluationRunsResponse` - **evaluation\_runs:** `Optional[List[APIEvaluationRun]]` List of evaluation runs. - **agent\_deleted:** `Optional[bool]` Whether agent is deleted - **agent\_name:** `Optional[str]` Agent name - **agent\_uuid:** `Optional[str]` Agent UUID. - **agent\_version\_hash:** `Optional[str]` Version hash - **agent\_workspace\_uuid:** `Optional[str]` Agent workspace uuid - **created\_by\_user\_email:** `Optional[str]` - **created\_by\_user\_id:** `Optional[str]` - **error\_description:** `Optional[str]` The error description - **evaluation\_run\_uuid:** `Optional[str]` Evaluation run UUID. - **evaluation\_test\_case\_workspace\_uuid:** `Optional[str]` Evaluation test case workspace uuid - **finished\_at:** `Optional[datetime]` Run end time. - **pass\_status:** `Optional[bool]` The pass status of the evaluation run based on the star metric. - **queued\_at:** `Optional[datetime]` Run queued time. - **run\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]` - **error\_description:** `Optional[str]` Error description if the metric could not be calculated. - **metric\_name:** `Optional[str]` Metric name - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `Optional[float]` The value of the metric as a number. - **reasoning:** `Optional[str]` Reasoning of the metric result. - **string\_value:** `Optional[str]` The value of the metric as a string. - **run\_name:** `Optional[str]` Run name. - **star\_metric\_result:** `Optional[APIEvaluationMetricResult]` - **started\_at:** `Optional[datetime]` Run start time. - **status:** `Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - **test\_case\_description:** `Optional[str]` Test case description. - **test\_case\_name:** `Optional[str]` Test case name. - **test\_case\_uuid:** `Optional[str]` Test-case UUID. - **test\_case\_version:** `Optional[int]` Test-case-version. ### Example ```python from gradient import Gradient client = Gradient() response = client.agents.evaluation_test_cases.list_evaluation_runs( evaluation_test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"", ) print(response.evaluation_runs) ``` ## Domain Types ### API Evaluation Test Case - `class APIEvaluationTestCase` - **archived\_at:** `Optional[datetime]` - **created\_at:** `Optional[datetime]` - **created\_by\_user\_email:** `Optional[str]` - **created\_by\_user\_id:** `Optional[str]` - **dataset:** `Optional[Dataset]` - **created\_at:** `Optional[datetime]` Time created at. - **dataset\_name:** `Optional[str]` Name of the dataset. - **dataset\_uuid:** `Optional[str]` UUID of the dataset. - **file\_size:** `Optional[str]` The size of the dataset uploaded file in bytes. - **has\_ground\_truth:** `Optional[bool]` Does the dataset have a ground truth column? - **row\_count:** `Optional[int]` Number of rows in the dataset. - **dataset\_name:** `Optional[str]` - **dataset\_uuid:** `Optional[str]` - **description:** `Optional[str]` - **latest\_version\_number\_of\_runs:** `Optional[int]` - **metrics:** `Optional[List[APIEvaluationMetric]]` - **description:** `Optional[str]` - **inverted:** `Optional[bool]` If true, the metric is inverted, meaning that a lower value is better. - **metric\_name:** `Optional[str]` - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - **metric\_uuid:** `Optional[str]` - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **range\_max:** `Optional[float]` The maximum value for the metric. - **range\_min:** `Optional[float]` The minimum value for the metric. - **name:** `Optional[str]` - **star\_metric:** `Optional[APIStarMetric]` - **test\_case\_uuid:** `Optional[str]` - **total\_runs:** `Optional[int]` - **updated\_at:** `Optional[datetime]` - **updated\_by\_user\_email:** `Optional[str]` - **updated\_by\_user\_id:** `Optional[str]` - **version:** `Optional[int]` ### API Star Metric - `class APIStarMetric` - **metric\_uuid:** `Optional[str]` - **name:** `Optional[str]` - **success\_threshold:** `Optional[float]` The success threshold for the star metric. This is a value that the metric must reach to be considered successful. - **success\_threshold\_pct:** `Optional[int]` The success threshold for the star metric. This is a percentage value between 0 and 100.