# Evaluation Test Cases

## Create

`agents.evaluation_test_cases.create(EvaluationTestCaseCreateParams**kwargs)  -> EvaluationTestCaseCreateResponse`

**post** `/v2/gen-ai/evaluation_test_cases`

To create an evaluation test-case send a POST request to `/v2/gen-ai/evaluation_test_cases`.

### Parameters

- **dataset\_uuid:** `str`

  Dataset against which the test‑case is executed.

- **description:** `str`

  Description of the test case.

- **metrics:** `List[str]`

  Full metric list to use for evaluation test case.

- **name:** `str`

  Name of the test case.

- **star\_metric:** `APIStarMetricParam`

- **workspace\_uuid:** `str`

  The workspace uuid.

### Returns

- `class EvaluationTestCaseCreateResponse`

  - **test\_case\_uuid:** `Optional[str]`

    Test‑case UUID.

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_test_case = client.agents.evaluation_test_cases.create()
print(evaluation_test_case.test_case_uuid)
```

## Retrieve

`agents.evaluation_test_cases.retrieve(strtest_case_uuid, EvaluationTestCaseRetrieveParams**kwargs)  -> EvaluationTestCaseRetrieveResponse`

**get** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`

To retrive information about an existing evaluation test case, send a GET request to `/v2/gen-ai/evaluation_test_case/{test_case_uuid}`.

### Parameters

- **test\_case\_uuid:** `str`

- **evaluation\_test\_case\_version:** `int`

  Version of the test case.

### Returns

- `class EvaluationTestCaseRetrieveResponse`

  - **evaluation\_test\_case:** `Optional[APIEvaluationTestCase]`

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_test_case = client.agents.evaluation_test_cases.retrieve(
    test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(evaluation_test_case.evaluation_test_case)
```

## Update

`agents.evaluation_test_cases.update(strpath_test_case_uuid, EvaluationTestCaseUpdateParams**kwargs)  -> EvaluationTestCaseUpdateResponse`

**put** `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`

To update an evaluation test-case send a PUT request to `/v2/gen-ai/evaluation_test_cases/{test_case_uuid}`.

### Parameters

- **test\_case\_uuid:** `str`

- **dataset\_uuid:** `str`

  Dataset against which the test‑case is executed.

- **description:** `str`

  Description of the test case.

- **metrics:** `Metrics`

  - **metric\_uuids:** `List[str]`

- **name:** `str`

  Name of the test case.

- **star\_metric:** `APIStarMetricParam`

- **test\_case\_uuid:** `str`

### Returns

- `class EvaluationTestCaseUpdateResponse`

  - **test\_case\_uuid:** `Optional[str]`

  - **version:** `Optional[int]`

    The new verson of the test case.

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_test_case = client.agents.evaluation_test_cases.update(
    path_test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(evaluation_test_case.test_case_uuid)
```

## List

`agents.evaluation_test_cases.list()  -> EvaluationTestCaseListResponse`

**get** `/v2/gen-ai/evaluation_test_cases`

To list all evaluation test cases, send a GET request to `/v2/gen-ai/evaluation_test_cases`.

### Returns

- `class EvaluationTestCaseListResponse`

  - **evaluation\_test\_cases:** `Optional[List[APIEvaluationTestCase]]`

    Alternative way of authentication for internal usage only - should not be exposed to public api

    - **archived\_at:** `Optional[datetime]`

    - **created\_at:** `Optional[datetime]`

    - **created\_by\_user\_email:** `Optional[str]`

    - **created\_by\_user\_id:** `Optional[str]`

    - **dataset:** `Optional[Dataset]`

      - **created\_at:** `Optional[datetime]`

        Time created at.

      - **dataset\_name:** `Optional[str]`

        Name of the dataset.

      - **dataset\_uuid:** `Optional[str]`

        UUID of the dataset.

      - **file\_size:** `Optional[str]`

        The size of the dataset uploaded file in bytes.

      - **has\_ground\_truth:** `Optional[bool]`

        Does the dataset have a ground truth column?

      - **row\_count:** `Optional[int]`

        Number of rows in the dataset.

    - **dataset\_name:** `Optional[str]`

    - **dataset\_uuid:** `Optional[str]`

    - **description:** `Optional[str]`

    - **latest\_version\_number\_of\_runs:** `Optional[int]`

    - **metrics:** `Optional[List[APIEvaluationMetric]]`

      - **description:** `Optional[str]`

      - **inverted:** `Optional[bool]`

        If true, the metric is inverted, meaning that a lower value is better.

      - **metric\_name:** `Optional[str]`

      - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]`

        - `"METRIC_TYPE_UNSPECIFIED"`

        - `"METRIC_TYPE_GENERAL_QUALITY"`

        - `"METRIC_TYPE_RAG_AND_TOOL"`

      - **metric\_uuid:** `Optional[str]`

      - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - **range\_max:** `Optional[float]`

        The maximum value for the metric.

      - **range\_min:** `Optional[float]`

        The minimum value for the metric.

    - **name:** `Optional[str]`

    - **star\_metric:** `Optional[APIStarMetric]`

    - **test\_case\_uuid:** `Optional[str]`

    - **total\_runs:** `Optional[int]`

    - **updated\_at:** `Optional[datetime]`

    - **updated\_by\_user\_email:** `Optional[str]`

    - **updated\_by\_user\_id:** `Optional[str]`

    - **version:** `Optional[int]`

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_test_cases = client.agents.evaluation_test_cases.list()
print(evaluation_test_cases.evaluation_test_cases)
```

## List Evaluation Runs

`agents.evaluation_test_cases.list_evaluation_runs(strevaluation_test_case_uuid, EvaluationTestCaseListEvaluationRunsParams**kwargs)  -> EvaluationTestCaseListEvaluationRunsResponse`

**get** `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`

To list all evaluation runs by test case, send a GET request to `/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs`.

### Parameters

- **evaluation\_test\_case\_uuid:** `str`

- **evaluation\_test\_case\_version:** `int`

  Version of the test case.

### Returns

- `class EvaluationTestCaseListEvaluationRunsResponse`

  - **evaluation\_runs:** `Optional[List[APIEvaluationRun]]`

    List of evaluation runs.

    - **agent\_deleted:** `Optional[bool]`

      Whether agent is deleted

    - **agent\_name:** `Optional[str]`

      Agent name

    - **agent\_uuid:** `Optional[str]`

      Agent UUID.

    - **agent\_version\_hash:** `Optional[str]`

      Version hash

    - **agent\_workspace\_uuid:** `Optional[str]`

      Agent workspace uuid

    - **created\_by\_user\_email:** `Optional[str]`

    - **created\_by\_user\_id:** `Optional[str]`

    - **error\_description:** `Optional[str]`

      The error description

    - **evaluation\_run\_uuid:** `Optional[str]`

      Evaluation run UUID.

    - **evaluation\_test\_case\_workspace\_uuid:** `Optional[str]`

      Evaluation test case workspace uuid

    - **finished\_at:** `Optional[datetime]`

      Run end time.

    - **pass\_status:** `Optional[bool]`

      The pass status of the evaluation run based on the star metric.

    - **queued\_at:** `Optional[datetime]`

      Run queued time.

    - **run\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]`

      - **error\_description:** `Optional[str]`

        Error description if the metric could not be calculated.

      - **metric\_name:** `Optional[str]`

        Metric name

      - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - **number\_value:** `Optional[float]`

        The value of the metric as a number.

      - **reasoning:** `Optional[str]`

        Reasoning of the metric result.

      - **string\_value:** `Optional[str]`

        The value of the metric as a string.

    - **run\_name:** `Optional[str]`

      Run name.

    - **star\_metric\_result:** `Optional[APIEvaluationMetricResult]`

    - **started\_at:** `Optional[datetime]`

      Run start time.

    - **status:** `Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]`

      Evaluation Run Statuses

      - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

      - `"EVALUATION_RUN_QUEUED"`

      - `"EVALUATION_RUN_RUNNING_DATASET"`

      - `"EVALUATION_RUN_EVALUATING_RESULTS"`

      - `"EVALUATION_RUN_CANCELLING"`

      - `"EVALUATION_RUN_CANCELLED"`

      - `"EVALUATION_RUN_SUCCESSFUL"`

      - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

      - `"EVALUATION_RUN_FAILED"`

    - **test\_case\_description:** `Optional[str]`

      Test case description.

    - **test\_case\_name:** `Optional[str]`

      Test case name.

    - **test\_case\_uuid:** `Optional[str]`

      Test-case UUID.

    - **test\_case\_version:** `Optional[int]`

      Test-case-version.

### Example

```python
from gradient import Gradient

client = Gradient()
response = client.agents.evaluation_test_cases.list_evaluation_runs(
    evaluation_test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(response.evaluation_runs)
```

## Domain Types

### API Evaluation Test Case

- `class APIEvaluationTestCase`

  - **archived\_at:** `Optional[datetime]`

  - **created\_at:** `Optional[datetime]`

  - **created\_by\_user\_email:** `Optional[str]`

  - **created\_by\_user\_id:** `Optional[str]`

  - **dataset:** `Optional[Dataset]`

    - **created\_at:** `Optional[datetime]`

      Time created at.

    - **dataset\_name:** `Optional[str]`

      Name of the dataset.

    - **dataset\_uuid:** `Optional[str]`

      UUID of the dataset.

    - **file\_size:** `Optional[str]`

      The size of the dataset uploaded file in bytes.

    - **has\_ground\_truth:** `Optional[bool]`

      Does the dataset have a ground truth column?

    - **row\_count:** `Optional[int]`

      Number of rows in the dataset.

  - **dataset\_name:** `Optional[str]`

  - **dataset\_uuid:** `Optional[str]`

  - **description:** `Optional[str]`

  - **latest\_version\_number\_of\_runs:** `Optional[int]`

  - **metrics:** `Optional[List[APIEvaluationMetric]]`

    - **description:** `Optional[str]`

    - **inverted:** `Optional[bool]`

      If true, the metric is inverted, meaning that a lower value is better.

    - **metric\_name:** `Optional[str]`

    - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]`

      - `"METRIC_TYPE_UNSPECIFIED"`

      - `"METRIC_TYPE_GENERAL_QUALITY"`

      - `"METRIC_TYPE_RAG_AND_TOOL"`

    - **metric\_uuid:** `Optional[str]`

    - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - **range\_max:** `Optional[float]`

      The maximum value for the metric.

    - **range\_min:** `Optional[float]`

      The minimum value for the metric.

  - **name:** `Optional[str]`

  - **star\_metric:** `Optional[APIStarMetric]`

  - **test\_case\_uuid:** `Optional[str]`

  - **total\_runs:** `Optional[int]`

  - **updated\_at:** `Optional[datetime]`

  - **updated\_by\_user\_email:** `Optional[str]`

  - **updated\_by\_user\_id:** `Optional[str]`

  - **version:** `Optional[int]`

### API Star Metric

- `class APIStarMetric`

  - **metric\_uuid:** `Optional[str]`

  - **name:** `Optional[str]`

  - **success\_threshold:** `Optional[float]`

    The success threshold for the star metric.
    This is a value that the metric must reach to be considered successful.

  - **success\_threshold\_pct:** `Optional[int]`

    The success threshold for the star metric.
    This is a percentage value between 0 and 100.