Hanzo

AST (AI Structured Training)

Structured AI training data collection SDK for capturing interactions, feedback, and model outputs.

Hanzo AST

Hanzo AST (@hanzo/ast) is a structured AI training data collection library. It captures interactions, feedback, and model outputs in a normalized format for fine-tuning, evaluation, and RLHF pipelines. AST works in both browser and server environments with zero-config auto-initialization for web pages.

Package: @hanzo/ast on npm Format: ESM + CJS dual format (built with tsup) Source: github.com/hanzoai/js-sdk

Features

FeatureDescription
Interaction CaptureRecord user-model interactions with structured metadata
Feedback CollectionCollect thumbs-up/down, ratings, corrections, and free-text feedback
Model Output LoggingCapture completions, tool calls, and chain-of-thought traces
Browser Auto-InitDrop-in script tag for automatic browser instrumentation
Server SDKFull programmatic API for backend and pipeline integration
Dataset RoutingRoute collected data to specific datasets by ID
Organization ScopingMulti-tenant data isolation via organization ID
Batched UploadsAutomatic batching and retry for efficient data transport
Dual FormatESM and CJS builds for universal compatibility

Quick Start

Install

npm install @hanzo/ast

Browser Auto-Init

Add a single script tag to any page to start collecting interaction data automatically. No code changes required.

<script
  data-api-key="your-api-key"
  data-dataset-id="web-interactions"
  data-organization-id="org-hanzo"
  src="https://unpkg.com/@hanzo/ast/browser"
></script>

The browser auto-init script captures:

  • Page views and navigation events
  • Form interactions (with PII redaction)
  • Click targets and UI interaction paths
  • Session duration and engagement signals

Programmatic Usage

import { AstClient } from '@hanzo/ast';

const ast = new AstClient({
  apiKey: process.env.HANZO_API_KEY,
  baseUrl: 'https://api.hanzo.ai',
  datasetId: 'training-v3',
  organizationId: 'org-hanzo',
});

Recording Interactions

An interaction represents a single user-model exchange -- a prompt, a completion, and associated metadata.

import { AstClient, AstInteractionType } from '@hanzo/ast';
import type { AstInteractionParams } from '@hanzo/ast';

const ast = new AstClient({
  apiKey: process.env.HANZO_API_KEY,
  baseUrl: 'https://api.hanzo.ai',
  datasetId: 'chat-training',
  organizationId: 'org-hanzo',
});

// Record a chat interaction
await ast.recordInteraction({
  type: AstInteractionType.CHAT,
  input: {
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Explain quantum computing in simple terms.' },
    ],
  },
  output: {
    message: {
      role: 'assistant',
      content: 'Quantum computing uses quantum bits (qubits) that can be 0, 1, or both at the same time...',
    },
    model: 'zen-coder-32b-instruct',
    tokens: { input: 42, output: 156 },
    latencyMs: 1240,
  },
  metadata: {
    sessionId: 'sess_abc123',
    userId: 'user_456',
    environment: 'production',
  },
});

Interaction Types

TypeDescription
AstInteractionType.CHATConversational exchange (messages array)
AstInteractionType.COMPLETIONSingle-turn text completion
AstInteractionType.EMBEDDINGEmbedding generation request
AstInteractionType.TOOL_CALLFunction/tool invocation and result
AstInteractionType.CHAINMulti-step chain-of-thought trace

Collecting Feedback

Feedback links human judgments to recorded interactions for RLHF and evaluation.

import type { AstFeedbackParams } from '@hanzo/ast';

// Binary feedback (thumbs up/down)
await ast.recordFeedback({
  interactionId: 'int_abc123',
  type: 'binary',
  value: 'positive',
  userId: 'user_456',
});

// Numeric rating
await ast.recordFeedback({
  interactionId: 'int_abc123',
  type: 'rating',
  value: 4,
  scale: 5,
  userId: 'user_456',
});

// Correction (preferred output)
await ast.recordFeedback({
  interactionId: 'int_abc123',
  type: 'correction',
  value: 'Quantum computing uses qubits instead of classical bits. Unlike regular bits that are either 0 or 1, qubits can exist in a superposition of both states simultaneously.',
  userId: 'user_456',
  metadata: {
    reason: 'Original response was too vague',
  },
});

// Free-text comment
await ast.recordFeedback({
  interactionId: 'int_abc123',
  type: 'comment',
  value: 'Good explanation but could include a real-world analogy.',
  userId: 'user_456',
});

Configuration

AstClient Options

OptionTypeRequiredDescription
apiKeystringYesHanzo API key for authentication
baseUrlstringNoAPI base URL (default: https://api.hanzo.ai)
datasetIdstringYesTarget dataset for collected data
organizationIdstringYesOrganization ID for multi-tenant isolation
batchSizenumberNoRecords per upload batch (default: 50)
flushIntervalMsnumberNoAuto-flush interval in ms (default: 5000)
maxRetriesnumberNoRetry count for failed uploads (default: 3)
debugbooleanNoEnable debug logging (default: false)

Browser Script Attributes

AttributeRequiredDescription
data-api-keyYesHanzo API key
data-dataset-idNoTarget dataset (default: web-auto)
data-organization-idNoOrganization ID
data-capture-clicksNoEnable click tracking (default: true)
data-capture-formsNoEnable form tracking (default: true)
data-capture-navigationNoEnable navigation tracking (default: true)
data-redact-piiNoRedact emails, phones, etc. (default: true)
data-debugNoEnable console logging (default: false)

Environment Variables

HANZO_API_KEY=your-api-key
HANZO_AST_DATASET_ID=training-v3
HANZO_AST_ORGANIZATION_ID=org-hanzo
HANZO_AST_BASE_URL=https://api.hanzo.ai  # optional

Server-Side Integration

Express Middleware

import { AstClient } from '@hanzo/ast';
import express from 'express';

const app = express();
const ast = new AstClient({
  apiKey: process.env.HANZO_API_KEY,
  datasetId: 'api-interactions',
  organizationId: 'org-hanzo',
});

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  const completion = await llm.chat.completions.create({
    model: 'zen-coder-32b-instruct',
    messages,
  });

  // Record the interaction for training data
  await ast.recordInteraction({
    type: AstInteractionType.CHAT,
    input: { messages },
    output: {
      message: completion.choices[0].message,
      model: completion.model,
      tokens: {
        input: completion.usage.prompt_tokens,
        output: completion.usage.completion_tokens,
      },
    },
    metadata: {
      userId: req.user.id,
      sessionId: req.sessionID,
      requestId: req.headers['x-request-id'],
    },
  });

  res.json(completion);
});

Batch Processing

// Flush pending records manually
await ast.flush();

// Graceful shutdown
process.on('SIGTERM', async () => {
  await ast.flush();
  await ast.close();
  process.exit(0);
});

Data Format

AST stores all records in a normalized JSON format compatible with common fine-tuning pipelines (OpenAI JSONL, Hugging Face datasets, Axolotl).

{
  "id": "int_abc123",
  "type": "chat",
  "dataset_id": "training-v3",
  "organization_id": "org-hanzo",
  "timestamp": "2026-03-03T12:00:00.000Z",
  "input": {
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing."}
    ]
  },
  "output": {
    "message": {"role": "assistant", "content": "..."},
    "model": "zen-coder-32b-instruct",
    "tokens": {"input": 42, "output": 156},
    "latency_ms": 1240
  },
  "feedback": [
    {"type": "binary", "value": "positive", "user_id": "user_456"}
  ],
  "metadata": {
    "session_id": "sess_abc123",
    "user_id": "user_456",
    "environment": "production"
  }
}

How is this guide?

Last updated on

On this page