# Evaluation Runs ## Create `client.agents.evaluationRuns.create(EvaluationRunCreateParamsbody?, RequestOptionsoptions?): EvaluationRunCreateResponse` **post** `/v2/gen-ai/evaluation_runs` To run an evaluation test case, send a POST request to `/v2/gen-ai/evaluation_runs`. ### Parameters - `body: EvaluationRunCreateParams` - `agent_uuids?: Array` Agent UUIDs to run the test case against. - `run_name?: string` The name of the run. - `test_case_uuid?: string` Test-case UUID to run ### Returns - `EvaluationRunCreateResponse` - `evaluation_run_uuids?: Array` ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationRun = await client.agents.evaluationRuns.create(); console.log(evaluationRun.evaluation_run_uuids); ``` ## Retrieve `client.agents.evaluationRuns.retrieve(stringevaluationRunUuid, RequestOptionsoptions?): EvaluationRunRetrieveResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}` To retrive information about an existing evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`. ### Parameters - `evaluationRunUuid: string` ### Returns - `EvaluationRunRetrieveResponse` - `evaluation_run?: APIEvaluationRun` - `agent_deleted?: boolean` Whether agent is deleted - `agent_name?: string` Agent name - `agent_uuid?: string` Agent UUID. - `agent_version_hash?: string` Version hash - `agent_workspace_uuid?: string` Agent workspace uuid - `created_by_user_email?: string` - `created_by_user_id?: string` - `error_description?: string` The error description - `evaluation_run_uuid?: string` Evaluation run UUID. - `evaluation_test_case_workspace_uuid?: string` Evaluation test case workspace uuid - `finished_at?: string` Run end time. - `pass_status?: boolean` The pass status of the evaluation run based on the star metric. - `queued_at?: string` Run queued time. - `run_level_metric_results?: Array` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `run_name?: string` Run name. - `star_metric_result?: APIEvaluationMetricResult` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `started_at?: string` Run start time. - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - `test_case_description?: string` Test case description. - `test_case_name?: string` Test case name. - `test_case_uuid?: string` Test-case UUID. - `test_case_version?: number` Test-case-version. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const evaluationRun = await client.agents.evaluationRuns.retrieve('"123e4567-e89b-12d3-a456-426614174000"'); console.log(evaluationRun.evaluation_run); ``` ## List Results `client.agents.evaluationRuns.listResults(stringevaluationRunUuid, EvaluationRunListResultsParamsquery?, RequestOptionsoptions?): EvaluationRunListResultsResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`. ### Parameters - `evaluationRunUuid: string` - `query: EvaluationRunListResultsParams` - `page?: number` Page number. - `per_page?: number` Items per page. ### Returns - `EvaluationRunListResultsResponse` Gets the full results of an evaluation run with all prompts. - `evaluation_run?: APIEvaluationRun` - `agent_deleted?: boolean` Whether agent is deleted - `agent_name?: string` Agent name - `agent_uuid?: string` Agent UUID. - `agent_version_hash?: string` Version hash - `agent_workspace_uuid?: string` Agent workspace uuid - `created_by_user_email?: string` - `created_by_user_id?: string` - `error_description?: string` The error description - `evaluation_run_uuid?: string` Evaluation run UUID. - `evaluation_test_case_workspace_uuid?: string` Evaluation test case workspace uuid - `finished_at?: string` Run end time. - `pass_status?: boolean` The pass status of the evaluation run based on the star metric. - `queued_at?: string` Run queued time. - `run_level_metric_results?: Array` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `run_name?: string` Run name. - `star_metric_result?: APIEvaluationMetricResult` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `started_at?: string` Run start time. - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - `test_case_description?: string` Test case description. - `test_case_name?: string` Test case name. - `test_case_uuid?: string` Test-case UUID. - `test_case_version?: number` Test-case-version. - `links?: APILinks` Links to other pages - `pages?: Pages` Information about how to reach other pages - `first?: string` First page - `last?: string` Last page - `next?: string` Next page - `previous?: string` Previous page - `meta?: APIMeta` Meta information about the data set - `page?: number` The current page - `pages?: number` Total number of pages - `total?: number` Total amount of items over all pages - `prompts?: Array` The prompt level results. - `ground_truth?: string` The ground truth for the prompt. - `input?: string` - `input_tokens?: string` The number of input tokens used in the prompt. - `output?: string` - `output_tokens?: string` The number of output tokens used in the prompt. - `prompt_chunks?: Array` The list of prompt chunks. - `chunk_usage_pct?: number` The usage percentage of the chunk. - `chunk_used?: boolean` Indicates if the chunk was used in the prompt. - `index_uuid?: string` The index uuid (Knowledge Base) of the chunk. - `source_name?: string` The source name for the chunk, e.g., the file name or document title. - `text?: string` Text content of the chunk. - `prompt_id?: number` Prompt ID - `prompt_level_metric_results?: Array` The metric results for the prompt. - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const response = await client.agents.evaluationRuns.listResults('"123e4567-e89b-12d3-a456-426614174000"'); console.log(response.evaluation_run); ``` ## Retrieve Results `client.agents.evaluationRuns.retrieveResults(numberpromptID, EvaluationRunRetrieveResultsParamsparams, RequestOptionsoptions?): EvaluationRunRetrieveResultsResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`. ### Parameters - `promptID: number` - `params: EvaluationRunRetrieveResultsParams` - `evaluation_run_uuid: string` Evaluation run UUID. ### Returns - `EvaluationRunRetrieveResultsResponse` - `prompt?: APIEvaluationPrompt` - `ground_truth?: string` The ground truth for the prompt. - `input?: string` - `input_tokens?: string` The number of input tokens used in the prompt. - `output?: string` - `output_tokens?: string` The number of output tokens used in the prompt. - `prompt_chunks?: Array` The list of prompt chunks. - `chunk_usage_pct?: number` The usage percentage of the chunk. - `chunk_used?: boolean` Indicates if the chunk was used in the prompt. - `index_uuid?: string` The index uuid (Knowledge Base) of the chunk. - `source_name?: string` The source name for the chunk, e.g., the file name or document title. - `text?: string` Text content of the chunk. - `prompt_id?: number` Prompt ID - `prompt_level_metric_results?: Array` The metric results for the prompt. - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. ### Example ```typescript import Gradient from '@digitalocean/gradient'; const client = new Gradient(); const response = await client.agents.evaluationRuns.retrieveResults(1, { evaluation_run_uuid: '"123e4567-e89b-12d3-a456-426614174000"', }); console.log(response.prompt); ``` ## Domain Types ### API Evaluation Metric - `APIEvaluationMetric` - `description?: string` - `inverted?: boolean` If true, the metric is inverted, meaning that a lower value is better. - `metric_name?: string` - `metric_type?: "METRIC_TYPE_UNSPECIFIED" | "METRIC_TYPE_GENERAL_QUALITY" | "METRIC_TYPE_RAG_AND_TOOL"` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - `metric_uuid?: string` - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `range_max?: number` The maximum value for the metric. - `range_min?: number` The minimum value for the metric. ### API Evaluation Metric Result - `APIEvaluationMetricResult` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. ### API Evaluation Prompt - `APIEvaluationPrompt` - `ground_truth?: string` The ground truth for the prompt. - `input?: string` - `input_tokens?: string` The number of input tokens used in the prompt. - `output?: string` - `output_tokens?: string` The number of output tokens used in the prompt. - `prompt_chunks?: Array` The list of prompt chunks. - `chunk_usage_pct?: number` The usage percentage of the chunk. - `chunk_used?: boolean` Indicates if the chunk was used in the prompt. - `index_uuid?: string` The index uuid (Knowledge Base) of the chunk. - `source_name?: string` The source name for the chunk, e.g., the file name or document title. - `text?: string` Text content of the chunk. - `prompt_id?: number` Prompt ID - `prompt_level_metric_results?: Array` The metric results for the prompt. - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. ### API Evaluation Run - `APIEvaluationRun` - `agent_deleted?: boolean` Whether agent is deleted - `agent_name?: string` Agent name - `agent_uuid?: string` Agent UUID. - `agent_version_hash?: string` Version hash - `agent_workspace_uuid?: string` Agent workspace uuid - `created_by_user_email?: string` - `created_by_user_id?: string` - `error_description?: string` The error description - `evaluation_run_uuid?: string` Evaluation run UUID. - `evaluation_test_case_workspace_uuid?: string` Evaluation test case workspace uuid - `finished_at?: string` Run end time. - `pass_status?: boolean` The pass status of the evaluation run based on the star metric. - `queued_at?: string` Run queued time. - `run_level_metric_results?: Array` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `run_name?: string` Run name. - `star_metric_result?: APIEvaluationMetricResult` - `error_description?: string` Error description if the metric could not be calculated. - `metric_name?: string` Metric name - `metric_value_type?: "METRIC_VALUE_TYPE_UNSPECIFIED" | "METRIC_VALUE_TYPE_NUMBER" | "METRIC_VALUE_TYPE_STRING" | "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - `number_value?: number` The value of the metric as a number. - `reasoning?: string` Reasoning of the metric result. - `string_value?: string` The value of the metric as a string. - `started_at?: string` Run start time. - `status?: "EVALUATION_RUN_STATUS_UNSPECIFIED" | "EVALUATION_RUN_QUEUED" | "EVALUATION_RUN_RUNNING_DATASET" | 6 more` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - `test_case_description?: string` Test case description. - `test_case_name?: string` Test case name. - `test_case_uuid?: string` Test-case UUID. - `test_case_version?: number` Test-case-version.