Hanzo

Scores

Evaluation scores from feedback, graders, and review — attached to traces, observations, and sessions.

Scores

A score is a judgment attached to a trace, observation, or session: a thumbs-up from a user, a grade from an automated evaluator, or a rating from a human reviewer. Scores are the measurable output of evaluation. Served by Hanzo O11y, tenant-scoped by org.

Anatomy of a score

Every score has a data type and a source, and points at what it grades. Its shape is governed by a score config referenced through configId.

FieldDescription
nameThe metric, e.g. helpfulness, hallucination
valueNumeric value (also used for boolean 0/1 and category codes)
stringValueThe category label or free-text value
dataTypeNUMERIC, CATEGORICAL, BOOLEAN, or TEXT
sourceAPI (feedback), EVAL (graders), ANNOTATION (review)
traceId / observationId / sessionIdWhat the score is attached to
commentOptional reviewer note

Listing scores

curl "https://api.hanzo.ai/v1/o11y/scores?page=1&limit=50" \
  -H "Authorization: Bearer hk-..."
{
  "data": [
    {
      "id": "sc_1",
      "name": "helpfulness",
      "value": 4,
      "dataType": "NUMERIC",
      "source": "ANNOTATION",
      "traceId": "tr_abc123",
      "configId": "cfg_help"
    }
  ],
  "meta": { "page": 1, "limit": 50, "totalItems": 1, "totalPages": 1 }
}

Where scores come from

  • API — end-user feedback (thumbs up/down, star ratings) sent from your app.
  • EVAL — automated graders, including LLM-as-judge, produced by experiment runs.
  • ANNOTATION — human review through annotation queues.

How is this guide?

Last updated on

On this page