# Evaluation Runs

## Create

`client.agents.evaluationRuns.create(EvaluationRunCreateParamsbody?, RequestOptionsoptions?): EvaluationRunCreateResponse`

**post** `/v2/gen-ai/evaluation_runs`

To run an evaluation test case, send a POST request to `/v2/gen-ai/evaluation_runs`.

### Parameters

- `body: EvaluationRunCreateParams`

  - `agent_uuids?: Array<string>`

    Agent UUIDs to run the test case against.

  - `run_name?: string`

    The name of the run.

  - `test_case_uuid?: string`

    Test-case UUID to run

### Returns

- `EvaluationRunCreateResponse`

  - `evaluation_run_uuids?: Array<string>`

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationRun = await client.agents.evaluationRuns.create();

console.log(evaluationRun.evaluation_run_uuids);
```

## Retrieve

`client.agents.evaluationRuns.retrieve(stringevaluationRunUuid, RequestOptionsoptions?): EvaluationRunRetrieveResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`

To retrive information about an existing evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`.

### Parameters

- `evaluationRunUuid: string`

### Returns

- `EvaluationRunRetrieveResponse`

  - `evaluation_run?: APIEvaluationRun`

    - `agent_deleted?: boolean`

      Whether agent is deleted

    - `agent_name?: string`

      Agent name

    - `agent_uuid?: string`

      Agent UUID.

    - `agent_version_hash?: string`

      Version hash

    - `agent_workspace_uuid?: string`

      Agent workspace uuid

    - `created_by_user_email?: string`

    - `created_by_user_id?: string`

    - `error_description?: string`

      The error description

    - `evaluation_run_uuid?: string`

      Evaluation run UUID.

    - `evaluation_test_case_workspace_uuid?: string`

      Evaluation test case workspace uuid

    - `finished_at?: string`

      Run end time.

    - `pass_status?: boolean`

      The pass status of the evaluation run based on the star metric.

    - `queued_at?: string`

      Run queued time.

    - `run_level_metric_results?: Array<APIEvaluationMetricResult>`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `run_name?: string`

      Run name.

    - `star_metric_result?: APIEvaluationMetricResult`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `started_at?: string`

      Run start time.

    - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more`

      Evaluation Run Statuses

      - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

      - `"EVALUATION_RUN_QUEUED"`

      - `"EVALUATION_RUN_RUNNING_DATASET"`

      - `"EVALUATION_RUN_EVALUATING_RESULTS"`

      - `"EVALUATION_RUN_CANCELLING"`

      - `"EVALUATION_RUN_CANCELLED"`

      - `"EVALUATION_RUN_SUCCESSFUL"`

      - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

      - `"EVALUATION_RUN_FAILED"`

    - `test_case_description?: string`

      Test case description.

    - `test_case_name?: string`

      Test case name.

    - `test_case_uuid?: string`

      Test-case UUID.

    - `test_case_version?: number`

      Test-case-version.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const evaluationRun = await client.agents.evaluationRuns.retrieve('"123e4567-e89b-12d3-a456-426614174000"');

console.log(evaluationRun.evaluation_run);
```

## List Results

`client.agents.evaluationRuns.listResults(stringevaluationRunUuid, EvaluationRunListResultsParamsquery?, RequestOptionsoptions?): EvaluationRunListResultsResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`

To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`.

### Parameters

- `evaluationRunUuid: string`

- `query: EvaluationRunListResultsParams`

  - `page?: number`

    Page number.

  - `per_page?: number`

    Items per page.

### Returns

- `EvaluationRunListResultsResponse`

  Gets the full results of an evaluation run with all prompts.

  - `evaluation_run?: APIEvaluationRun`

    - `agent_deleted?: boolean`

      Whether agent is deleted

    - `agent_name?: string`

      Agent name

    - `agent_uuid?: string`

      Agent UUID.

    - `agent_version_hash?: string`

      Version hash

    - `agent_workspace_uuid?: string`

      Agent workspace uuid

    - `created_by_user_email?: string`

    - `created_by_user_id?: string`

    - `error_description?: string`

      The error description

    - `evaluation_run_uuid?: string`

      Evaluation run UUID.

    - `evaluation_test_case_workspace_uuid?: string`

      Evaluation test case workspace uuid

    - `finished_at?: string`

      Run end time.

    - `pass_status?: boolean`

      The pass status of the evaluation run based on the star metric.

    - `queued_at?: string`

      Run queued time.

    - `run_level_metric_results?: Array<APIEvaluationMetricResult>`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `run_name?: string`

      Run name.

    - `star_metric_result?: APIEvaluationMetricResult`

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

    - `started_at?: string`

      Run start time.

    - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more`

      Evaluation Run Statuses

      - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

      - `"EVALUATION_RUN_QUEUED"`

      - `"EVALUATION_RUN_RUNNING_DATASET"`

      - `"EVALUATION_RUN_EVALUATING_RESULTS"`

      - `"EVALUATION_RUN_CANCELLING"`

      - `"EVALUATION_RUN_CANCELLED"`

      - `"EVALUATION_RUN_SUCCESSFUL"`

      - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

      - `"EVALUATION_RUN_FAILED"`

    - `test_case_description?: string`

      Test case description.

    - `test_case_name?: string`

      Test case name.

    - `test_case_uuid?: string`

      Test-case UUID.

    - `test_case_version?: number`

      Test-case-version.

  - `links?: APILinks`

    Links to other pages

    - `pages?: Pages`

      Information about how to reach other pages

      - `first?: string`

        First page

      - `last?: string`

        Last page

      - `next?: string`

        Next page

      - `previous?: string`

        Previous page

  - `meta?: APIMeta`

    Meta information about the data set

    - `page?: number`

      The current page

    - `pages?: number`

      Total number of pages

    - `total?: number`

      Total amount of items over all pages

  - `prompts?: Array<APIEvaluationPrompt>`

    The prompt level results.

    - `ground_truth?: string`

      The ground truth for the prompt.

    - `input?: string`

    - `input_tokens?: string`

      The number of input tokens used in the prompt.

    - `output?: string`

    - `output_tokens?: string`

      The number of output tokens used in the prompt.

    - `prompt_chunks?: Array<PromptChunk>`

      The list of prompt chunks.

      - `chunk_usage_pct?: number`

        The usage percentage of the chunk.

      - `chunk_used?: boolean`

        Indicates if the chunk was used in the prompt.

      - `index_uuid?: string`

        The index uuid (Knowledge Base) of the chunk.

      - `source_name?: string`

        The source name for the chunk, e.g., the file name or document title.

      - `text?: string`

        Text content of the chunk.

    - `prompt_id?: number`

      Prompt ID

    - `prompt_level_metric_results?: Array<APIEvaluationMetricResult>`

      The metric results for the prompt.

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const response = await client.agents.evaluationRuns.listResults('"123e4567-e89b-12d3-a456-426614174000"');

console.log(response.evaluation_run);
```

## Retrieve Results

`client.agents.evaluationRuns.retrieveResults(numberpromptID, EvaluationRunRetrieveResultsParamsparams, RequestOptionsoptions?): EvaluationRunRetrieveResultsResponse`

**get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`

To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`.

### Parameters

- `promptID: number`

- `params: EvaluationRunRetrieveResultsParams`

  - `evaluation_run_uuid: string`

    Evaluation run UUID.

### Returns

- `EvaluationRunRetrieveResultsResponse`

  - `prompt?: APIEvaluationPrompt`

    - `ground_truth?: string`

      The ground truth for the prompt.

    - `input?: string`

    - `input_tokens?: string`

      The number of input tokens used in the prompt.

    - `output?: string`

    - `output_tokens?: string`

      The number of output tokens used in the prompt.

    - `prompt_chunks?: Array<PromptChunk>`

      The list of prompt chunks.

      - `chunk_usage_pct?: number`

        The usage percentage of the chunk.

      - `chunk_used?: boolean`

        Indicates if the chunk was used in the prompt.

      - `index_uuid?: string`

        The index uuid (Knowledge Base) of the chunk.

      - `source_name?: string`

        The source name for the chunk, e.g., the file name or document title.

      - `text?: string`

        Text content of the chunk.

    - `prompt_id?: number`

      Prompt ID

    - `prompt_level_metric_results?: Array<APIEvaluationMetricResult>`

      The metric results for the prompt.

      - `error_description?: string`

        Error description if the metric could not be calculated.

      - `metric_name?: string`

        Metric name

      - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

        - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

        - `"METRIC_VALUE_TYPE_NUMBER"`

        - `"METRIC_VALUE_TYPE_STRING"`

        - `"METRIC_VALUE_TYPE_PERCENTAGE"`

      - `number_value?: number`

        The value of the metric as a number.

      - `reasoning?: string`

        Reasoning of the metric result.

      - `string_value?: string`

        The value of the metric as a string.

### Example

```typescript
import Gradient from '@digitalocean/gradient';

const client = new Gradient();

const response = await client.agents.evaluationRuns.retrieveResults(1, {
  evaluation_run_uuid: '"123e4567-e89b-12d3-a456-426614174000"',
});

console.log(response.prompt);
```

## Domain Types

### API Evaluation Metric

- `APIEvaluationMetric`

  - `description?: string`

  - `inverted?: boolean`

    If true, the metric is inverted, meaning that a lower value is better.

  - `metric_name?: string`

  - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"`

    - `"METRIC_TYPE_UNSPECIFIED"`

    - `"METRIC_TYPE_GENERAL_QUALITY"`

    - `"METRIC_TYPE_RAG_AND_TOOL"`

  - `metric_uuid?: string`

  - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

    - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

    - `"METRIC_VALUE_TYPE_NUMBER"`

    - `"METRIC_VALUE_TYPE_STRING"`

    - `"METRIC_VALUE_TYPE_PERCENTAGE"`

  - `range_max?: number`

    The maximum value for the metric.

  - `range_min?: number`

    The minimum value for the metric.

### API Evaluation Metric Result

- `APIEvaluationMetricResult`

  - `error_description?: string`

    Error description if the metric could not be calculated.

  - `metric_name?: string`

    Metric name

  - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

    - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

    - `"METRIC_VALUE_TYPE_NUMBER"`

    - `"METRIC_VALUE_TYPE_STRING"`

    - `"METRIC_VALUE_TYPE_PERCENTAGE"`

  - `number_value?: number`

    The value of the metric as a number.

  - `reasoning?: string`

    Reasoning of the metric result.

  - `string_value?: string`

    The value of the metric as a string.

### API Evaluation Prompt

- `APIEvaluationPrompt`

  - `ground_truth?: string`

    The ground truth for the prompt.

  - `input?: string`

  - `input_tokens?: string`

    The number of input tokens used in the prompt.

  - `output?: string`

  - `output_tokens?: string`

    The number of output tokens used in the prompt.

  - `prompt_chunks?: Array<PromptChunk>`

    The list of prompt chunks.

    - `chunk_usage_pct?: number`

      The usage percentage of the chunk.

    - `chunk_used?: boolean`

      Indicates if the chunk was used in the prompt.

    - `index_uuid?: string`

      The index uuid (Knowledge Base) of the chunk.

    - `source_name?: string`

      The source name for the chunk, e.g., the file name or document title.

    - `text?: string`

      Text content of the chunk.

  - `prompt_id?: number`

    Prompt ID

  - `prompt_level_metric_results?: Array<APIEvaluationMetricResult>`

    The metric results for the prompt.

    - `error_description?: string`

      Error description if the metric could not be calculated.

    - `metric_name?: string`

      Metric name

    - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - `number_value?: number`

      The value of the metric as a number.

    - `reasoning?: string`

      Reasoning of the metric result.

    - `string_value?: string`

      The value of the metric as a string.

### API Evaluation Run

- `APIEvaluationRun`

  - `agent_deleted?: boolean`

    Whether agent is deleted

  - `agent_name?: string`

    Agent name

  - `agent_uuid?: string`

    Agent UUID.

  - `agent_version_hash?: string`

    Version hash

  - `agent_workspace_uuid?: string`

    Agent workspace uuid

  - `created_by_user_email?: string`

  - `created_by_user_id?: string`

  - `error_description?: string`

    The error description

  - `evaluation_run_uuid?: string`

    Evaluation run UUID.

  - `evaluation_test_case_workspace_uuid?: string`

    Evaluation test case workspace uuid

  - `finished_at?: string`

    Run end time.

  - `pass_status?: boolean`

    The pass status of the evaluation run based on the star metric.

  - `queued_at?: string`

    Run queued time.

  - `run_level_metric_results?: Array<APIEvaluationMetricResult>`

    - `error_description?: string`

      Error description if the metric could not be calculated.

    - `metric_name?: string`

      Metric name

    - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - `number_value?: number`

      The value of the metric as a number.

    - `reasoning?: string`

      Reasoning of the metric result.

    - `string_value?: string`

      The value of the metric as a string.

  - `run_name?: string`

    Run name.

  - `star_metric_result?: APIEvaluationMetricResult`

    - `error_description?: string`

      Error description if the metric could not be calculated.

    - `metric_name?: string`

      Metric name

    - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"`

      - `"METRIC_VALUE_TYPE_UNSPECIFIED"`

      - `"METRIC_VALUE_TYPE_NUMBER"`

      - `"METRIC_VALUE_TYPE_STRING"`

      - `"METRIC_VALUE_TYPE_PERCENTAGE"`

    - `number_value?: number`

      The value of the metric as a number.

    - `reasoning?: string`

      Reasoning of the metric result.

    - `string_value?: string`

      The value of the metric as a string.

  - `started_at?: string`

    Run start time.

  - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more`

    Evaluation Run Statuses

    - `"EVALUATION_RUN_STATUS_UNSPECIFIED"`

    - `"EVALUATION_RUN_QUEUED"`

    - `"EVALUATION_RUN_RUNNING_DATASET"`

    - `"EVALUATION_RUN_EVALUATING_RESULTS"`

    - `"EVALUATION_RUN_CANCELLING"`

    - `"EVALUATION_RUN_CANCELLED"`

    - `"EVALUATION_RUN_SUCCESSFUL"`

    - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"`

    - `"EVALUATION_RUN_FAILED"`

  - `test_case_description?: string`

    Test case description.

  - `test_case_name?: string`

    Test case name.

  - `test_case_uuid?: string`

    Test-case UUID.

  - `test_case_version?: number`

    Test-case-version.