# Evaluation Runs ## Create `agents.evaluation_runs.create(EvaluationRunCreateParams**kwargs) -> EvaluationRunCreateResponse` **post** `/v2/gen-ai/evaluation_runs` To run an evaluation test case, send a POST request to `/v2/gen-ai/evaluation_runs`. ### Parameters - **agent\_uuids:** `List[str]` Agent UUIDs to run the test case against. - **run\_name:** `str` The name of the run. - **test\_case\_uuid:** `str` Test-case UUID to run ### Returns - `class EvaluationRunCreateResponse` - **evaluation\_run\_uuids:** `Optional[List[str]]` ### Example ```python from gradient import Gradient client = Gradient() evaluation_run = client.agents.evaluation_runs.create() print(evaluation_run.evaluation_run_uuids) ``` ## Retrieve `agents.evaluation_runs.retrieve(strevaluation_run_uuid) -> EvaluationRunRetrieveResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}` To retrive information about an existing evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}`. ### Parameters - **evaluation\_run\_uuid:** `str` ### Returns - `class EvaluationRunRetrieveResponse` - **evaluation\_run:** `Optional[APIEvaluationRun]` ### Example ```python from gradient import Gradient client = Gradient() evaluation_run = client.agents.evaluation_runs.retrieve( "evaluation_run_uuid", ) print(evaluation_run.evaluation_run) ``` ## List Results `agents.evaluation_runs.list_results(strevaluation_run_uuid, EvaluationRunListResultsParams**kwargs) -> EvaluationRunListResultsResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results`. ### Parameters - **evaluation\_run\_uuid:** `str` - **page:** `int` Page number. - **per\_page:** `int` Items per page. ### Returns - `class EvaluationRunListResultsResponse` Gets the full results of an evaluation run with all prompts. - **evaluation\_run:** `Optional[APIEvaluationRun]` - **links:** `Optional[APILinks]` Links to other pages - **meta:** `Optional[APIMeta]` Meta information about the data set - **prompts:** `Optional[List[APIEvaluationPrompt]]` The prompt level results. - **ground\_truth:** `Optional[str]` The ground truth for the prompt. - **input:** `Optional[str]` - **input\_tokens:** `Optional[str]` The number of input tokens used in the prompt. - **output:** `Optional[str]` - **output\_tokens:** `Optional[str]` The number of output tokens used in the prompt. - **prompt\_chunks:** `Optional[List[PromptChunk]]` The list of prompt chunks. - **chunk\_usage\_pct:** `Optional[float]` The usage percentage of the chunk. - **chunk\_used:** `Optional[bool]` Indicates if the chunk was used in the prompt. - **index\_uuid:** `Optional[str]` The index uuid (Knowledge Base) of the chunk. - **source\_name:** `Optional[str]` The source name for the chunk, e.g., the file name or document title. - **text:** `Optional[str]` Text content of the chunk. - **prompt\_id:** `Optional[int]` Prompt ID - **prompt\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]` The metric results for the prompt. - **error\_description:** `Optional[str]` Error description if the metric could not be calculated. - **metric\_name:** `Optional[str]` Metric name - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `Optional[float]` The value of the metric as a number. - **reasoning:** `Optional[str]` Reasoning of the metric result. - **string\_value:** `Optional[str]` The value of the metric as a string. ### Example ```python from gradient import Gradient client = Gradient() response = client.agents.evaluation_runs.list_results( evaluation_run_uuid="\"123e4567-e89b-12d3-a456-426614174000\"", ) print(response.evaluation_run) ``` ## Retrieve Results `agents.evaluation_runs.retrieve_results(intprompt_id, EvaluationRunRetrieveResultsParams**kwargs) -> EvaluationRunRetrieveResultsResponse` **get** `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}` To retrieve results of an evaluation run, send a GET request to `/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results/{prompt_id}`. ### Parameters - **evaluation\_run\_uuid:** `str` - **prompt\_id:** `int` ### Returns - `class EvaluationRunRetrieveResultsResponse` - **prompt:** `Optional[APIEvaluationPrompt]` ### Example ```python from gradient import Gradient client = Gradient() response = client.agents.evaluation_runs.retrieve_results( prompt_id=1, evaluation_run_uuid="\"123e4567-e89b-12d3-a456-426614174000\"", ) print(response.prompt) ``` ## Domain Types ### API Evaluation Metric - `class APIEvaluationMetric` - **description:** `Optional[str]` - **inverted:** `Optional[bool]` If true, the metric is inverted, meaning that a lower value is better. - **metric\_name:** `Optional[str]` - **metric\_type:** `Optional[Literal["METRIC_TYPE_UNSPECIFIED", "METRIC_TYPE_GENERAL_QUALITY", "METRIC_TYPE_RAG_AND_TOOL"]]` - `"METRIC_TYPE_UNSPECIFIED"` - `"METRIC_TYPE_GENERAL_QUALITY"` - `"METRIC_TYPE_RAG_AND_TOOL"` - **metric\_uuid:** `Optional[str]` - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **range\_max:** `Optional[float]` The maximum value for the metric. - **range\_min:** `Optional[float]` The minimum value for the metric. ### API Evaluation Metric Result - `class APIEvaluationMetricResult` - **error\_description:** `Optional[str]` Error description if the metric could not be calculated. - **metric\_name:** `Optional[str]` Metric name - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `Optional[float]` The value of the metric as a number. - **reasoning:** `Optional[str]` Reasoning of the metric result. - **string\_value:** `Optional[str]` The value of the metric as a string. ### API Evaluation Prompt - `class APIEvaluationPrompt` - **ground\_truth:** `Optional[str]` The ground truth for the prompt. - **input:** `Optional[str]` - **input\_tokens:** `Optional[str]` The number of input tokens used in the prompt. - **output:** `Optional[str]` - **output\_tokens:** `Optional[str]` The number of output tokens used in the prompt. - **prompt\_chunks:** `Optional[List[PromptChunk]]` The list of prompt chunks. - **chunk\_usage\_pct:** `Optional[float]` The usage percentage of the chunk. - **chunk\_used:** `Optional[bool]` Indicates if the chunk was used in the prompt. - **index\_uuid:** `Optional[str]` The index uuid (Knowledge Base) of the chunk. - **source\_name:** `Optional[str]` The source name for the chunk, e.g., the file name or document title. - **text:** `Optional[str]` Text content of the chunk. - **prompt\_id:** `Optional[int]` Prompt ID - **prompt\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]` The metric results for the prompt. - **error\_description:** `Optional[str]` Error description if the metric could not be calculated. - **metric\_name:** `Optional[str]` Metric name - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `Optional[float]` The value of the metric as a number. - **reasoning:** `Optional[str]` Reasoning of the metric result. - **string\_value:** `Optional[str]` The value of the metric as a string. ### API Evaluation Run - `class APIEvaluationRun` - **agent\_deleted:** `Optional[bool]` Whether agent is deleted - **agent\_name:** `Optional[str]` Agent name - **agent\_uuid:** `Optional[str]` Agent UUID. - **agent\_version\_hash:** `Optional[str]` Version hash - **agent\_workspace\_uuid:** `Optional[str]` Agent workspace uuid - **created\_by\_user\_email:** `Optional[str]` - **created\_by\_user\_id:** `Optional[str]` - **error\_description:** `Optional[str]` The error description - **evaluation\_run\_uuid:** `Optional[str]` Evaluation run UUID. - **evaluation\_test\_case\_workspace\_uuid:** `Optional[str]` Evaluation test case workspace uuid - **finished\_at:** `Optional[datetime]` Run end time. - **pass\_status:** `Optional[bool]` The pass status of the evaluation run based on the star metric. - **queued\_at:** `Optional[datetime]` Run queued time. - **run\_level\_metric\_results:** `Optional[List[APIEvaluationMetricResult]]` - **error\_description:** `Optional[str]` Error description if the metric could not be calculated. - **metric\_name:** `Optional[str]` Metric name - **metric\_value\_type:** `Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]` - `"METRIC_VALUE_TYPE_UNSPECIFIED"` - `"METRIC_VALUE_TYPE_NUMBER"` - `"METRIC_VALUE_TYPE_STRING"` - `"METRIC_VALUE_TYPE_PERCENTAGE"` - **number\_value:** `Optional[float]` The value of the metric as a number. - **reasoning:** `Optional[str]` Reasoning of the metric result. - **string\_value:** `Optional[str]` The value of the metric as a string. - **run\_name:** `Optional[str]` Run name. - **star\_metric\_result:** `Optional[APIEvaluationMetricResult]` - **started\_at:** `Optional[datetime]` Run start time. - **status:** `Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]` Evaluation Run Statuses - `"EVALUATION_RUN_STATUS_UNSPECIFIED"` - `"EVALUATION_RUN_QUEUED"` - `"EVALUATION_RUN_RUNNING_DATASET"` - `"EVALUATION_RUN_EVALUATING_RESULTS"` - `"EVALUATION_RUN_CANCELLING"` - `"EVALUATION_RUN_CANCELLED"` - `"EVALUATION_RUN_SUCCESSFUL"` - `"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"` - `"EVALUATION_RUN_FAILED"` - **test\_case\_description:** `Optional[str]` Test case description. - **test\_case\_name:** `Optional[str]` Test case name. - **test\_case\_uuid:** `Optional[str]` Test-case UUID. - **test\_case\_version:** `Optional[int]` Test-case-version.