Evaluation Runs
Run an Evaluation Test Case
Retrieve Information About an Existing Evaluation Run
Retrieve Results of an Evaluation Run
Retrieve Results of an Evaluation Run Prompt
ModelsExpand Collapse
type APIEvaluationMetric struct{…}
Category APIEvaluationMetricCategoryoptional
If true, the metric is inverted, meaning that a lower value is better.
MetricType APIEvaluationMetricMetricTypeoptional
MetricValueType APIEvaluationMetricMetricValueTypeoptional
The maximum value for the metric.
The minimum value for the metric.
type APIEvaluationMetricResult struct{…}
Error description if the metric could not be calculated.
Metric name
MetricValueType APIEvaluationMetricResultMetricValueTypeoptional
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
type APIEvaluationPrompt struct{…}
EvaluationTraceSpans []APIEvaluationPromptEvaluationTraceSpanoptionalThe evaluated trace spans.
The evaluated trace spans.
When the span was created
Input data for the span (flexible structure - can be messages array, string, etc.)
Name/identifier for the span
Output data from the span (flexible structure - can be message, string, etc.)
RetrieverChunks []APIEvaluationPromptEvaluationTraceSpansRetrieverChunkoptionalAny retriever span chunks that were included as part of the span.
Any retriever span chunks that were included as part of the span.
The usage percentage of the chunk.
Indicates if the chunk was used in the prompt.
The index uuid (Knowledge Base) of the chunk.
The source name for the chunk, e.g., the file name or document title.
Text content of the chunk.
The span-level metric results.
The span-level metric results.
Error description if the metric could not be calculated.
Metric name
MetricValueType APIEvaluationMetricResultMetricValueTypeoptional
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Type APIEvaluationPromptEvaluationTraceSpansTypeoptionalTypes of spans in a trace
Types of spans in a trace
The ground truth for the prompt.
The number of input tokens used in the prompt.
The number of output tokens used in the prompt.
PromptChunks []APIEvaluationPromptPromptChunkoptionalThe list of prompt chunks.
The list of prompt chunks.
The usage percentage of the chunk.
Indicates if the chunk was used in the prompt.
The index uuid (Knowledge Base) of the chunk.
The source name for the chunk, e.g., the file name or document title.
Text content of the chunk.
Prompt ID
The metric results for the prompt.
The metric results for the prompt.
Error description if the metric could not be calculated.
Metric name
MetricValueType APIEvaluationMetricResultMetricValueTypeoptional
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
The trace id for the prompt.
type APIEvaluationRun struct{…}
Whether agent is deleted
The agent deployment name
Agent name
Agent UUID.
Version hash
Agent workspace uuid
The error description
Evaluation run UUID.
Evaluation test case workspace uuid
Run end time.
The pass status of the evaluation run based on the star metric.
Run queued time.
Error description if the metric could not be calculated.
Metric name
MetricValueType APIEvaluationMetricResultMetricValueTypeoptional
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run name.
Error description if the metric could not be calculated.
Metric name
MetricValueType APIEvaluationMetricResultMetricValueTypeoptional
The value of the metric as a number.
Reasoning of the metric result.
The value of the metric as a string.
Run start time.
Status APIEvaluationRunStatusoptionalEvaluation Run Statuses
Evaluation Run Statuses
Test case description.
Test case name.
Test-case UUID.
Test-case-version.