# Evaluation Runs ## Create **post** `/v2/gen-ai/evaluation_runs` To run an evaluation test case, send a POST request to `/v2/gen-ai/evaluation_runs`. ### Returns - **evaluation\_run\_uuids:** `array of string` ## Retrieve **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}` To retrive information about an existing evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`. ### Returns - **evaluation\_run:** `APIEvaluationRun` ## List Results **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`. ### Returns - **evaluation\_run:** `APIEvaluationRun` - **links:** `APILinks` Links to other pages - **meta:** `APIMeta` Meta information about the data set - **prompts:** `array of APIEvaluationPrompt` The prompt level results. ## Retrieve Results **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`. ### Returns - **prompt:** `APIEvaluationPrompt` ## Domain Types ### API Evaluation Metric - **APIEvaluationMetric:** `object { description, inverted, metric_name, 5 more }` - **description:** `string` - **inverted:** `boolean` If true, the metric is inverted, meaning that a lower value is better. - **metric\_name:** `string` - **metric\_type:** `"METRIC_TYPE_UNSPECIFIED" or "METRIC_TYPE_GENERAL_QUALITY" or "METRIC_TYPE_RAG_AND_TOOL"` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - **metric\_uuid:** `string` - **metric\_value\_type:** `"METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **range\_max:** `number` The maximum value for the metric. - **range\_min:** `number` The minimum value for the metric. ### API Evaluation Metric Result - **APIEvaluationMetricResult:** `object { error_description, metric_name, metric_value_type, 3 more }` - **error\_description:** `string` Error description if the metric could not be calculated. - **metric\_name:** `string` Metric name - **metric\_value\_type:** `"METRIC_VALUE_TYPE_UNSPECIFIED" or "METRIC_VALUE_TYPE_NUMBER" or "METRIC_VALUE_TYPE_STRING" or "METRIC_VALUE_TYPE_PERCENTAGE"` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `number` The value of the metric as a number. - **reasoning:** `string` Reasoning of the metric result. - **string\_value:** `string` The value of the metric as a string. ### API Evaluation Prompt - **APIEvaluationPrompt:** `object { ground_truth, input, input_tokens, 5 more }` - **ground\_truth:** `string` The ground truth for the prompt. - **input:** `string` - **input\_tokens:** `string` The number of input tokens used in the prompt. - **output:** `string` - **output\_tokens:** `string` The number of output tokens used in the prompt. - **prompt\_chunks:** `array of object { chunk_usage_pct, chunk_used, index_uuid, 2 more }` The list of prompt chunks. - **chunk\_usage\_pct:** `number` The usage percentage of the chunk. - **chunk\_used:** `boolean` Indicates if the chunk was used in the prompt. - **index\_uuid:** `string` The index uuid (Knowledge Base) of the chunk. - **source\_name:** `string` The source name for the chunk, e.g., the file name or document title. - **text:** `string` Text content of the chunk. - **prompt\_id:** `number` Prompt ID - **prompt\_level\_metric\_results:** `array of APIEvaluationMetricResult` The metric results for the prompt. ### API Evaluation Run - **APIEvaluationRun:** `object { agent_deleted, agent_name, agent_uuid, 19 more }` - **agent\_deleted:** `boolean` Whether agent is deleted - **agent\_name:** `string` Agent name - **agent\_uuid:** `string` Agent UUID. - **agent\_version\_hash:** `string` Version hash - **agent\_workspace\_uuid:** `string` Agent workspace uuid - **created\_by\_user\_email:** `string` - **created\_by\_user\_id:** `string` - **error\_description:** `string` The error description - **evaluation\_run\_uuid:** `string` Evaluation run UUID. - **evaluation\_test\_case\_workspace\_uuid:** `string` Evaluation test case workspace uuid - **finished\_at:** `string` Run end time. - **pass\_status:** `boolean` The pass status of the evaluation run based on the star metric. - **queued\_at:** `string` Run queued time. - **run\_level\_metric\_results:** `array of APIEvaluationMetricResult` - **run\_name:** `string` Run name. - **star\_metric\_result:** `APIEvaluationMetricResult` - **started\_at:** `string` Run start time. - **status:** `"EVALUATION_RUN_STATUS_UNSPECIFIED" or "EVALUATION_RUN_QUEUED" or "EVALUATION_RUN_RUNNING_DATASET" or 6 more` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - **test\_case\_description:** `string` Test case description. - **test\_case\_name:** `string` Test case name. - **test\_case\_uuid:** `string` Test-case UUID. - **test\_case\_version:** `number` Test-case-version.