Skip to content
  • Auto
  • Light
  • Dark

List Evaluation Runs

List Evaluation Runs by Test Case
agents.evaluation_test_cases.list_evaluation_runs(strevaluation_test_case_uuid, EvaluationTestCaseListEvaluationRunsParams**kwargs) -> evaluation_runslistEvaluationTestCaseListEvaluationRunsResponse
get/v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs

To list all evaluation runs by test case, send a GET request to /v2/gen-ai/evaluation_test_cases/{evaluation_test_case_uuid}/evaluation_runs.

Parameters
evaluation_test_case_uuidstr
evaluation_test_case_versionint
optional

Version of the test case.

Returns
EvaluationTestCaseListEvaluationRunsResponseclass
Hide ParametersShow Parameters
evaluation_runslist
optional
Optional[List[agent_deletedboolagent_namestragent_uuidstragent_version_hashstragent_workspace_uuidstrcreated_by_user_emailstrcreated_by_user_idstrerror_descriptionstrevaluation_run_uuidstrevaluation_test_case_workspace_uuidstrfinished_atdatetimepass_statusboolqueued_atdatetimerun_level_metric_resultslistrun_namestrstar_metric_resultAPIEvaluationMetricResultstarted_atdatetimestatusliteraltest_case_descriptionstrtest_case_namestrtest_case_uuidstrtest_case_versionintAPIEvaluationRun]]

List of evaluation runs.

Hide ParametersShow Parameters
agent_deletedbool
optional

Whether agent is deleted

agent_namestr
optional

Agent name

agent_uuidstr
optional

Agent UUID.

agent_version_hashstr
optional

Version hash

agent_workspace_uuidstr
optional

Agent workspace uuid

created_by_user_emailstr
optional
created_by_user_idstr
optional
formatuint64
error_descriptionstr
optional

The error description

evaluation_run_uuidstr
optional

Evaluation run UUID.

evaluation_test_case_workspace_uuidstr
optional

Evaluation test case workspace uuid

finished_atdatetime
optional

Run end time.

formatdate-time
pass_statusbool
optional

The pass status of the evaluation run based on the star metric.

queued_atdatetime
optional

Run queued time.

formatdate-time
run_level_metric_resultslist
optional
Optional[List[error_descriptionstrmetric_namestrmetric_value_typeliteralnumber_valuefloatreasoningstrstring_valuestrAPIEvaluationMetricResult]]
Hide ParametersShow Parameters
error_descriptionstr
optional

Error description if the metric could not be calculated.

metric_namestr
optional

Metric name

metric_value_typeliteral
optional
Optional[Literal["METRIC_VALUE_TYPE_UNSPECIFIED", "METRIC_VALUE_TYPE_NUMBER", "METRIC_VALUE_TYPE_STRING", "METRIC_VALUE_TYPE_PERCENTAGE"]]
Hide ParametersShow Parameters
"METRIC_VALUE_TYPE_UNSPECIFIED"
"METRIC_VALUE_TYPE_NUMBER"
"METRIC_VALUE_TYPE_STRING"
"METRIC_VALUE_TYPE_PERCENTAGE"
number_valuefloat
optional

The value of the metric as a number.

formatdouble
reasoningstr
optional

Reasoning of the metric result.

string_valuestr
optional

The value of the metric as a string.

run_namestr
optional

Run name.

star_metric_resultAPIEvaluationMetricResult
optional
started_atdatetime
optional

Run start time.

formatdate-time
statusliteral
optional
Optional[Literal["EVALUATION_RUN_STATUS_UNSPECIFIED", "EVALUATION_RUN_QUEUED", "EVALUATION_RUN_RUNNING_DATASET", 6 more]]

Evaluation Run Statuses

Hide ParametersShow Parameters
"EVALUATION_RUN_STATUS_UNSPECIFIED"
"EVALUATION_RUN_QUEUED"
"EVALUATION_RUN_RUNNING_DATASET"
"EVALUATION_RUN_EVALUATING_RESULTS"
"EVALUATION_RUN_CANCELLING"
"EVALUATION_RUN_CANCELLED"
"EVALUATION_RUN_SUCCESSFUL"
"EVALUATION_RUN_PARTIALLY_SUCCESSFUL"
"EVALUATION_RUN_FAILED"
test_case_descriptionstr
optional

Test case description.

test_case_namestr
optional

Test case name.

test_case_uuidstr
optional

Test-case UUID.

test_case_versionint
optional

Test-case-version.

formatint64
from do_gradientai import GradientAI

client = GradientAI()
response = client.agents.evaluation_test_cases.list_evaluation_runs(
    evaluation_test_case_uuid="\"123e4567-e89b-12d3-a456-426614174000\"",
)
print(response.evaluation_runs)
200 Example
{
  "evaluation_runs": [
    {
      "agent_deleted": true,
      "agent_name": "\"example name\"",
      "agent_uuid": "\"123e4567-e89b-12d3-a456-426614174000\"",
      "agent_version_hash": "\"example string\"",
      "agent_workspace_uuid": "\"123e4567-e89b-12d3-a456-426614174000\"",
      "created_by_user_email": "[email protected]",
      "created_by_user_id": "\"12345\"",
      "error_description": "\"example string\"",
      "evaluation_run_uuid": "\"123e4567-e89b-12d3-a456-426614174000\"",
      "evaluation_test_case_workspace_uuid": "\"123e4567-e89b-12d3-a456-426614174000\"",
      "finished_at": "2023-01-01T00:00:00Z",
      "pass_status": true,
      "queued_at": "2023-01-01T00:00:00Z",
      "run_level_metric_results": [
        {
          "error_description": "\"example string\"",
          "metric_name": "\"example name\"",
          "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
          "number_value": 123,
          "reasoning": "\"example string\"",
          "string_value": "\"example string\""
        }
      ],
      "run_name": "\"example name\"",
      "star_metric_result": {
        "error_description": "\"example string\"",
        "metric_name": "\"example name\"",
        "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
        "number_value": 123,
        "reasoning": "\"example string\"",
        "string_value": "\"example string\""
      },
      "started_at": "2023-01-01T00:00:00Z",
      "status": "EVALUATION_RUN_STATUS_UNSPECIFIED",
      "test_case_description": "\"example string\"",
      "test_case_name": "\"example name\"",
      "test_case_uuid": "\"123e4567-e89b-12d3-a456-426614174000\"",
      "test_case_version": 123
    }
  ]
}