# Evaluation Runs

## Create

`agents.evaluation_runs.create(EvaluationRunCreateParams**kwargs)  -> EvaluationRunCreateResponse`

**post** `/v2/gen-ai/evaluation_runs`

To run an evaluation test case, send a POST request to `/v2/gen-ai/evaluation_runs`.

### Parameters

- **agent\_uuids:** `List[str]`

  Agent UUIDs to run the test case against.

- **run\_name:** `str`

  The name of the run.

- **test\_case\_uuid:** `str`

  Test-case UUID to run

### Returns

- `class EvaluationRunCreateResponse`

  - **evaluation\_run\_uuids:** `Optional[List[str]]`

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_run = client.agents.evaluation_runs.create()
print(evaluation_run.evaluation_run_uuids)
```

## Retrieve

`agents.evaluation_runs.retrieve(strevaluation_run_uuid)  -> EvaluationRunRetrieveResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`

To retrive information about an existing evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`.

### Parameters

- **evaluation\_run\_uuid:** `str`

### Returns

- `class EvaluationRunRetrieveResponse`

  - **evaluation\_run:** `Optional[APIEvaluationRun]`

### Example

```python
from gradient import Gradient

client = Gradient()
evaluation_run = client.agents.evaluation_runs.retrieve(
    "evaluation_run_uuid",
)
print(evaluation_run.evaluation_run)
```

## List Results

`agents.evaluation_runs.list_results(strevaluation_run_uuid, EvaluationRunListResultsParams**kwargs)  -> EvaluationRunListResultsResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`

To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`.

### Parameters

- **evaluation\_run\_uuid:** `str`

- **page:** `int`

  Page number.

- **per\_page:** `int`

  Items per page.

### Returns

- `class EvaluationRunListResultsResponse`

  Gets the full results of an evaluation run with all prompts.

  - **evaluation\_run:** `Optional[APIEvaluationRun]`

  - **links:** `Optional[APILinks]`

    Links to other pages

  - **meta:** `Optional[APIMeta]`

    Meta information about the data set

  - **prompts:** `Optional[List[APIEvaluationPrompt]]`

    The prompt level results.

    - **ground\_truth:** `Optional[str]`

      The ground truth for the prompt.

    - **input:** `Optional[str]`

    - **input\_tokens:** `Optional[str]`

      The number of input tokens used in the prompt.

    - **output:** `Optional[str]`

    - **output\_tokens:** `Optional[str]`

      The number of output tokens used in the prompt.

    - **prompt\_chunks:** `Optional[List[PromptChunk]]`

      The list of prompt chunks.

      - **chunk\_usage\_pct:** `Optional[float]`

        The usage percentage of the chunk.

      - **chunk\_used:** `Optional[bool]`

        Indicates if the chunk was used in the prompt.

      - **index\_uuid:** `Optional[str]`

        The index uuid (Knowledge Base) of the chunk.

      - **source\_name:** `Optional[str]`

        The source name for the chunk, e.g., the file name or document title.

      - **text:** `Optional[str]`

        Text content of the chunk.

    - **prompt\_id:** `Optional[int]`

      Prompt ID

    - **prompt\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]`

      The metric results for the prompt.

      - **error\_description:** `Optional[str]`

        Error description if the metric could not be calculated.

      - **metric\_name:** `Optional[str]`

        Metric name

      - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - **number\_value:** `Optional[float]`

        The value of the metric as a number.

      - **reasoning:** `Optional[str]`

        Reasoning of the metric result.

      - **string\_value:** `Optional[str]`

        The value of the metric as a string.

### Example

```python
from gradient import Gradient

client = Gradient()
response = client.agents.evaluation_runs.list_results(
    evaluation_run_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(response.evaluation_run)
```

## Retrieve Results

`agents.evaluation_runs.retrieve_results(intprompt_id, EvaluationRunRetrieveResultsParams**kwargs)  -> EvaluationRunRetrieveResultsResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`

To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`.

### Parameters

- **evaluation\_run\_uuid:** `str`

- **prompt\_id:** `int`

### Returns

- `class EvaluationRunRetrieveResultsResponse`

  - **prompt:** `Optional[APIEvaluationPrompt]`

### Example

```python
from gradient import Gradient

client = Gradient()
response = client.agents.evaluation_runs.retrieve_results(
    prompt_id=1,
    evaluation_run_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(response.prompt)
```

## Domain Types

### API Evaluation Metric

- `class APIEvaluationMetric`

  - **description:** `Optional[str]`

  - **inverted:** `Optional[bool]`

    If true, the metric is inverted, meaning that a lower value is better.

  - **metric\_name:** `Optional[str]`

  - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]`

    - `"METRIC_TYPE_UNSPECIFIED"`

    - `"METRIC_TYPE_GENERAL_QUALITY"`

    - `"METRIC_TYPE_RAG_AND_TOOL"`

  - **metric\_uuid:** `Optional[str]`

  - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

    - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

    - `"METRIC_VALUE_TYPE_NUMBER"`

    - `"METRIC_VALUE_TYPE_STRING"`

    - `"METRIC_VALUE_TYPE_PERCENTAGE"`

  - **range\_max:** `Optional[float]`

    The maximum value for the metric.

  - **range\_min:** `Optional[float]`

    The minimum value for the metric.

### API Evaluation Metric Result

- `class APIEvaluationMetricResult`

  - **error\_description:** `Optional[str]`

    Error description if the metric could not be calculated.

  - **metric\_name:** `Optional[str]`

    Metric name

  - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

    - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

    - `"METRIC_VALUE_TYPE_NUMBER"`

    - `"METRIC_VALUE_TYPE_STRING"`

    - `"METRIC_VALUE_TYPE_PERCENTAGE"`

  - **number\_value:** `Optional[float]`

    The value of the metric as a number.

  - **reasoning:** `Optional[str]`

    Reasoning of the metric result.

  - **string\_value:** `Optional[str]`

    The value of the metric as a string.

### API Evaluation Prompt

- `class APIEvaluationPrompt`

  - **ground\_truth:** `Optional[str]`

    The ground truth for the prompt.

  - **input:** `Optional[str]`

  - **input\_tokens:** `Optional[str]`

    The number of input tokens used in the prompt.

  - **output:** `Optional[str]`

  - **output\_tokens:** `Optional[str]`

    The number of output tokens used in the prompt.

  - **prompt\_chunks:** `Optional[List[PromptChunk]]`

    The list of prompt chunks.

    - **chunk\_usage\_pct:** `Optional[float]`

      The usage percentage of the chunk.

    - **chunk\_used:** `Optional[bool]`

      Indicates if the chunk was used in the prompt.

    - **index\_uuid:** `Optional[str]`

      The index uuid (Knowledge Base) of the chunk.

    - **source\_name:** `Optional[str]`

      The source name for the chunk, e.g., the file name or document title.

    - **text:** `Optional[str]`

      Text content of the chunk.

  - **prompt\_id:** `Optional[int]`

    Prompt ID

  - **prompt\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]`

    The metric results for the prompt.

    - **error\_description:** `Optional[str]`

      Error description if the metric could not be calculated.

    - **metric\_name:** `Optional[str]`

      Metric name

    - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - **number\_value:** `Optional[float]`

      The value of the metric as a number.

    - **reasoning:** `Optional[str]`

      Reasoning of the metric result.

    - **string\_value:** `Optional[str]`

      The value of the metric as a string.

### API Evaluation Run

- `class APIEvaluationRun`

  - **agent\_deleted:** `Optional[bool]`

    Whether agent is deleted

  - **agent\_name:** `Optional[str]`

    Agent name

  - **agent\_uuid:** `Optional[str]`

    Agent UUID.

  - **agent\_version\_hash:** `Optional[str]`

    Version hash

  - **agent\_workspace\_uuid:** `Optional[str]`

    Agent workspace uuid

  - **created\_by\_user\_email:** `Optional[str]`

  - **created\_by\_user\_id:** `Optional[str]`

  - **error\_description:** `Optional[str]`

    The error description

  - **evaluation\_run\_uuid:** `Optional[str]`

    Evaluation run UUID.

  - **evaluation\_test\_case\_workspace\_uuid:** `Optional[str]`

    Evaluation test case workspace uuid

  - **finished\_at:** `Optional[datetime]`

    Run end time.

  - **pass\_status:** `Optional[bool]`

    The pass status of the evaluation run based on the star metric.

  - **queued\_at:** `Optional[datetime]`

    Run queued time.

  - **run\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]`

    - **error\_description:** `Optional[str]`

      Error description if the metric could not be calculated.

    - **metric\_name:** `Optional[str]`

      Metric name

    - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - **number\_value:** `Optional[float]`

      The value of the metric as a number.

    - **reasoning:** `Optional[str]`

      Reasoning of the metric result.

    - **string\_value:** `Optional[str]`

      The value of the metric as a string.

  - **run\_name:** `Optional[str]`

    Run name.

  - **star\_metric\_result:** `Optional[APIEvaluationMetricResult]`

  - **started\_at:** `Optional[datetime]`

    Run start time.

  - **status:** `Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]`

    Evaluation Run Statuses

    - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

    - `"EVALUATION_RUN_QUEUED"`

    - `"EVALUATION_RUN_RUNNING_DATASET"`

    - `"EVALUATION_RUN_EVALUATING_RESULTS"`

    - `"EVALUATION_RUN_CANCELLING"`

    - `"EVALUATION_RUN_CANCELLED"`

    - `"EVALUATION_RUN_SUCCESSFUL"`

    - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

    - `"EVALUATION_RUN_FAILED"`

  - **test\_case\_description:** `Optional[str]`

    Test case description.

  - **test\_case\_name:** `Optional[str]`

    Test case name.

  - **test\_case\_uuid:** `Optional[str]`

    Test-case UUID.

  - **test\_case\_version:** `Optional[int]`

    Test-case-version.