Retrieve Results of an Evaluation Run

client.Agents.EvaluationRuns.ListResults(ctx, evaluationRunUuid, query) (*AgentEvaluationRunListResultsResponse, error)

get/v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results

To retrieve results of an evaluation run, send a GET request to /v2/gen-ai/evaluation_runs/{evaluation_run_uuid}/results.

ParametersExpand Collapse

evaluationRunUuid string

query AgentEvaluationRunListResultsParams

Page param.Field[int64]optional

Page number.

PerPage param.Field[int64]optional

Items per page.

ReturnsExpand Collapse

type AgentEvaluationRunListResultsResponse struct{…}

Gets the full results of an evaluation run with all prompts.

EvaluationRun APIEvaluationRunoptional

AgentDeleted booloptional

Whether agent is deleted

AgentDeploymentName stringoptional

The agent deployment name

AgentName stringoptional

Agent name

AgentUuid stringoptional

Agent UUID.

AgentVersionHash stringoptional

Version hash

AgentWorkspaceUuid stringoptional

Agent workspace uuid

CreatedByUserEmail stringoptional

CreatedByUserID stringoptional

formatuint64

ErrorDescription stringoptional

The error description

EvaluationRunUuid stringoptional

Evaluation run UUID.

EvaluationTestCaseWorkspaceUuid stringoptional

Evaluation test case workspace uuid

FinishedAt Timeoptional

Run end time.

formatdate-time

PassStatus booloptional

The pass status of the evaluation run based on the star metric.

QueuedAt Timeoptional

Run queued time.

formatdate-time

RunLevelMetricResults []APIEvaluationMetricResultoptional

ErrorDescription stringoptional

Error description if the metric could not be calculated.

MetricName stringoptional

Metric name

MetricValueType APIEvaluationMetricResultMetricValueTypeoptional

Accepts one of the following:

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeUnspecified APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_UNSPECIFIED"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeNumber APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_NUMBER"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeString APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_STRING"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypePercentage APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_PERCENTAGE"

NumberValue float64optional

The value of the metric as a number.

formatdouble

Reasoning stringoptional

Reasoning of the metric result.

StringValue stringoptional

The value of the metric as a string.

RunName stringoptional

Run name.

StarMetricResult APIEvaluationMetricResultoptional

ErrorDescription stringoptional

Error description if the metric could not be calculated.

MetricName stringoptional

Metric name

MetricValueType APIEvaluationMetricResultMetricValueTypeoptional

Accepts one of the following:

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeUnspecified APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_UNSPECIFIED"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeNumber APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_NUMBER"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeString APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_STRING"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypePercentage APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_PERCENTAGE"

NumberValue float64optional

The value of the metric as a number.

formatdouble

Reasoning stringoptional

Reasoning of the metric result.

StringValue stringoptional

The value of the metric as a string.

StartedAt Timeoptional

Run start time.

formatdate-time

Status APIEvaluationRunStatusoptional

Evaluation Run Statuses

Accepts one of the following:

const APIEvaluationRunStatusEvaluationRunStatusUnspecified APIEvaluationRunStatus = "EVALUATION_RUN_STATUS_UNSPECIFIED"

const APIEvaluationRunStatusEvaluationRunQueued APIEvaluationRunStatus = "EVALUATION_RUN_QUEUED"

const APIEvaluationRunStatusEvaluationRunRunningDataset APIEvaluationRunStatus = "EVALUATION_RUN_RUNNING_DATASET"

const APIEvaluationRunStatusEvaluationRunEvaluatingResults APIEvaluationRunStatus = "EVALUATION_RUN_EVALUATING_RESULTS"

const APIEvaluationRunStatusEvaluationRunCancelling APIEvaluationRunStatus = "EVALUATION_RUN_CANCELLING"

const APIEvaluationRunStatusEvaluationRunCancelled APIEvaluationRunStatus = "EVALUATION_RUN_CANCELLED"

const APIEvaluationRunStatusEvaluationRunSuccessful APIEvaluationRunStatus = "EVALUATION_RUN_SUCCESSFUL"

const APIEvaluationRunStatusEvaluationRunPartiallySuccessful APIEvaluationRunStatus = "EVALUATION_RUN_PARTIALLY_SUCCESSFUL"

const APIEvaluationRunStatusEvaluationRunFailed APIEvaluationRunStatus = "EVALUATION_RUN_FAILED"

TestCaseDescription stringoptional

Test case description.

TestCaseName stringoptional

Test case name.

TestCaseUuid stringoptional

Test-case UUID.

TestCaseVersion int64optional

Test-case-version.

formatint64

Links APILinksoptional

Links to other pages

Pages APILinksPagesoptional

Information about how to reach other pages

First stringoptional

First page

Last stringoptional

Last page

Next stringoptional

Previous stringoptional

Meta APIMetaoptional

Meta information about the data set

Page int64optional

The current page

formatint64

Pages int64optional

Total number of pages

formatint64

Total int64optional

Total amount of items over all pages

formatint64

Prompts []APIEvaluationPromptoptional

The prompt level results.

EvaluationTraceSpans []APIEvaluationPromptEvaluationTraceSpanoptional

The evaluated trace spans.

CreatedAt Timeoptional

When the span was created

formatdate-time

Input unknownoptional

Input data for the span (flexible structure - can be messages array, string, etc.)

Name stringoptional

Name/identifier for the span

Output unknownoptional

Output data from the span (flexible structure - can be message, string, etc.)

RetrieverChunks []APIEvaluationPromptEvaluationTraceSpansRetrieverChunkoptional

Any retriever span chunks that were included as part of the span.

ChunkUsagePct float64optional

The usage percentage of the chunk.

formatdouble

ChunkUsed booloptional

Indicates if the chunk was used in the prompt.

IndexUuid stringoptional

The index uuid (Knowledge Base) of the chunk.

SourceName stringoptional

The source name for the chunk, e.g., the file name or document title.

Text stringoptional

Text content of the chunk.

SpanLevelMetricResults []APIEvaluationMetricResultoptional

The span-level metric results.

ErrorDescription stringoptional

Error description if the metric could not be calculated.

MetricName stringoptional

Metric name

MetricValueType APIEvaluationMetricResultMetricValueTypeoptional

Accepts one of the following:

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeUnspecified APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_UNSPECIFIED"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeNumber APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_NUMBER"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeString APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_STRING"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypePercentage APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_PERCENTAGE"

NumberValue float64optional

The value of the metric as a number.

formatdouble

Reasoning stringoptional

Reasoning of the metric result.

StringValue stringoptional

The value of the metric as a string.

Type APIEvaluationPromptEvaluationTraceSpansTypeoptional

Types of spans in a trace

Accepts one of the following:

const APIEvaluationPromptEvaluationTraceSpansTypeTraceSpanTypeUnknown APIEvaluationPromptEvaluationTraceSpansType = "TRACE_SPAN_TYPE_UNKNOWN"

const APIEvaluationPromptEvaluationTraceSpansTypeTraceSpanTypeLlm APIEvaluationPromptEvaluationTraceSpansType = "TRACE_SPAN_TYPE_LLM"

const APIEvaluationPromptEvaluationTraceSpansTypeTraceSpanTypeRetriever APIEvaluationPromptEvaluationTraceSpansType = "TRACE_SPAN_TYPE_RETRIEVER"

const APIEvaluationPromptEvaluationTraceSpansTypeTraceSpanTypeTool APIEvaluationPromptEvaluationTraceSpansType = "TRACE_SPAN_TYPE_TOOL"

GroundTruth stringoptional

The ground truth for the prompt.

Input stringoptional

InputTokens stringoptional

The number of input tokens used in the prompt.

formatuint64

Output stringoptional

OutputTokens stringoptional

The number of output tokens used in the prompt.

formatuint64

PromptChunks []APIEvaluationPromptPromptChunkoptional

The list of prompt chunks.

ChunkUsagePct float64optional

The usage percentage of the chunk.

formatdouble

ChunkUsed booloptional

Indicates if the chunk was used in the prompt.

IndexUuid stringoptional

The index uuid (Knowledge Base) of the chunk.

SourceName stringoptional

The source name for the chunk, e.g., the file name or document title.

Text stringoptional

Text content of the chunk.

PromptID int64optional

Prompt ID

formatint64

PromptLevelMetricResults []APIEvaluationMetricResultoptional

The metric results for the prompt.

ErrorDescription stringoptional

Error description if the metric could not be calculated.

MetricName stringoptional

Metric name

MetricValueType APIEvaluationMetricResultMetricValueTypeoptional

Accepts one of the following:

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeUnspecified APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_UNSPECIFIED"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeNumber APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_NUMBER"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypeString APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_STRING"

const APIEvaluationMetricResultMetricValueTypeMetricValueTypePercentage APIEvaluationMetricResultMetricValueType = "METRIC_VALUE_TYPE_PERCENTAGE"

NumberValue float64optional

The value of the metric as a number.

formatdouble

Reasoning stringoptional

Reasoning of the metric result.

StringValue stringoptional

The value of the metric as a string.

TraceID stringoptional

The trace id for the prompt.

Retrieve Results of an Evaluation Run

package main

import (
  "context"
  "fmt"

  "github.com/stainless-sdks/-go"
  "github.com/stainless-sdks/-go/option"
)

func main() {
  client := gradient.NewClient(
    option.WithAccessToken("My Access Token"),
  )
  response, err := client.Agents.EvaluationRuns.ListResults(
    context.TODO(),
    `"123e4567-e89b-12d3-a456-426614174000"`,
    gradient.AgentEvaluationRunListResultsParams{

    },
  )
  if err != nil {
    panic(err.Error())
  }
  fmt.Printf("%+v\n", response.EvaluationRun)
}

{
  "evaluation_run": {
    "agent_deleted": true,
    "agent_deployment_name": "example name",
    "agent_name": "example name",
    "agent_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "agent_version_hash": "example string",
    "agent_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "created_by_user_email": "example@example.com",
    "created_by_user_id": "12345",
    "error_description": "example string",
    "evaluation_run_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "evaluation_test_case_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "finished_at": "2023-01-01T00:00:00Z",
    "pass_status": true,
    "queued_at": "2023-01-01T00:00:00Z",
    "run_level_metric_results": [
      {
        "error_description": "example string",
        "metric_name": "example name",
        "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
        "number_value": 123,
        "reasoning": "example string",
        "string_value": "example string"
      }
    ],
    "run_name": "example name",
    "star_metric_result": {
      "error_description": "example string",
      "metric_name": "example name",
      "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
      "number_value": 123,
      "reasoning": "example string",
      "string_value": "example string"
    },
    "started_at": "2023-01-01T00:00:00Z",
    "status": "EVALUATION_RUN_STATUS_UNSPECIFIED",
    "test_case_description": "example string",
    "test_case_name": "example name",
    "test_case_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "test_case_version": 123
  },
  "links": {
    "pages": {
      "first": "example string",
      "last": "example string",
      "next": "example string",
      "previous": "example string"
    }
  },
  "meta": {
    "page": 123,
    "pages": 123,
    "total": 123
  },
  "prompts": [
    {
      "evaluation_trace_spans": [
        {
          "created_at": "2023-01-01T00:00:00Z",
          "input": {},
          "name": "example name",
          "output": {},
          "retriever_chunks": [
            {
              "chunk_usage_pct": 123,
              "chunk_used": true,
              "index_uuid": "123e4567-e89b-12d3-a456-426614174000",
              "source_name": "example name",
              "text": "example string"
            }
          ],
          "span_level_metric_results": [
            {
              "error_description": "example string",
              "metric_name": "example name",
              "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
              "number_value": 123,
              "reasoning": "example string",
              "string_value": "example string"
            }
          ],
          "type": "TRACE_SPAN_TYPE_UNKNOWN"
        }
      ],
      "ground_truth": "example string",
      "input": "example string",
      "input_tokens": "12345",
      "output": "example string",
      "output_tokens": "12345",
      "prompt_chunks": [
        {
          "chunk_usage_pct": 123,
          "chunk_used": true,
          "index_uuid": "123e4567-e89b-12d3-a456-426614174000",
          "source_name": "example name",
          "text": "example string"
        }
      ],
      "prompt_id": 123,
      "prompt_level_metric_results": [
        {
          "error_description": "example string",
          "metric_name": "example name",
          "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
          "number_value": 123,
          "reasoning": "example string",
          "string_value": "example string"
        }
      ],
      "trace_id": "123e4567-e89b-12d3-a456-426614174000"
    }
  ]
}

Returns Examples

{
  "evaluation_run": {
    "agent_deleted": true,
    "agent_deployment_name": "example name",
    "agent_name": "example name",
    "agent_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "agent_version_hash": "example string",
    "agent_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "created_by_user_email": "example@example.com",
    "created_by_user_id": "12345",
    "error_description": "example string",
    "evaluation_run_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "evaluation_test_case_workspace_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "finished_at": "2023-01-01T00:00:00Z",
    "pass_status": true,
    "queued_at": "2023-01-01T00:00:00Z",
    "run_level_metric_results": [
      {
        "error_description": "example string",
        "metric_name": "example name",
        "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
        "number_value": 123,
        "reasoning": "example string",
        "string_value": "example string"
      }
    ],
    "run_name": "example name",
    "star_metric_result": {
      "error_description": "example string",
      "metric_name": "example name",
      "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
      "number_value": 123,
      "reasoning": "example string",
      "string_value": "example string"
    },
    "started_at": "2023-01-01T00:00:00Z",
    "status": "EVALUATION_RUN_STATUS_UNSPECIFIED",
    "test_case_description": "example string",
    "test_case_name": "example name",
    "test_case_uuid": "123e4567-e89b-12d3-a456-426614174000",
    "test_case_version": 123
  },
  "links": {
    "pages": {
      "first": "example string",
      "last": "example string",
      "next": "example string",
      "previous": "example string"
    }
  },
  "meta": {
    "page": 123,
    "pages": 123,
    "total": 123
  },
  "prompts": [
    {
      "evaluation_trace_spans": [
        {
          "created_at": "2023-01-01T00:00:00Z",
          "input": {},
          "name": "example name",
          "output": {},
          "retriever_chunks": [
            {
              "chunk_usage_pct": 123,
              "chunk_used": true,
              "index_uuid": "123e4567-e89b-12d3-a456-426614174000",
              "source_name": "example name",
              "text": "example string"
            }
          ],
          "span_level_metric_results": [
            {
              "error_description": "example string",
              "metric_name": "example name",
              "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
              "number_value": 123,
              "reasoning": "example string",
              "string_value": "example string"
            }
          ],
          "type": "TRACE_SPAN_TYPE_UNKNOWN"
        }
      ],
      "ground_truth": "example string",
      "input": "example string",
      "input_tokens": "12345",
      "output": "example string",
      "output_tokens": "12345",
      "prompt_chunks": [
        {
          "chunk_usage_pct": 123,
          "chunk_used": true,
          "index_uuid": "123e4567-e89b-12d3-a456-426614174000",
          "source_name": "example name",
          "text": "example string"
        }
      ],
      "prompt_id": 123,
      "prompt_level_metric_results": [
        {
          "error_description": "example string",
          "metric_name": "example name",
          "metric_value_type": "METRIC_VALUE_TYPE_UNSPECIFIED",
          "number_value": 123,
          "reasoning": "example string",
          "string_value": "example string"
        }
      ],
      "trace_id": "123e4567-e89b-12d3-a456-426614174000"
    }
  ]
}