Hanzo

GPUs

On-demand H100/H200/A100/L40S GPU compute — metered by the hour, provisioned from the console or API, billed to your org's cloud-usage ledger.

GPUs

On-demand accelerated compute for training, fine-tuning, and inference. Provision GPU clusters, watch live utilization, and meter spend from the console or the API. GPUs bill by the hour to the same per-org cloud-usage ledger as the rest of your fleet.

How it works

Hanzo Cloud GPUs are real accelerators resold and metered by the hour. Capacity is sourced from cloud GPU providers — DigitalOcean GPU Droplets are the primary pool (H100, A100, L40S), with Paperspace and AWS as secondary pools for burst and specialized instances — and reconciled by the operator alongside your machines. You get the accelerator; Hanzo handles provisioning, scheduling, telemetry, and a single unified bill.

  • One key, one bill — GPU-hours meter to your org's cloud-usage ledger through commerce billing, priced through api.hanzo.ai/v1/gpu.
  • No idle lock-in — hourly metering starts when a node comes online and stops when you tear the cluster down.
  • Honest telemetry — utilization, memory, temperature, and power come straight from the machine; fields the provider does not report render as , never a fabricated value.

Accelerators

ModelMemoryTypical use
H200141 GBFrontier-model training, long-context and low-latency inference
H10080 GBLarge-model training and high-throughput inference
A10040 / 80 GBTraining, fine-tuning, batched inference
L40S / L4048 GBInference, rendering, mixed media workloads
A6000 / A500024–48 GBCost-efficient inference and development
A400016 GBEntry-level inference and dev boxes

Accelerators are grouped into clusters (a node pool of one GPU size) and scheduled into pools. Node sizes map to real accelerator counts — a gpu-h100x8-640gb node genuinely holds eight H100s:

Node sizeGPUs / node
gpu-h100x1-80gb1× H100
gpu-h100x8-640gb8× H100
gpu-a100x1-80gb1× A100
gpu-a100x8-640gb8× A100
gpu-l40sx1-48gb1× L40S

Pricing

GPUs are metered per GPU-hour. Representative on-demand rates:

GPUVRAMOn-demand
NVIDIA T416 GB$0.50 / hr
NVIDIA A100 40GB40 GB$2.50 / hr
NVIDIA A100 80GB80 GB$3.80 / hr
NVIDIA H10080 GB$5.50 / hr

Live pricing is served from GET /v1/gpu and mirrored under Compute → GPUs → Pricing in the console. Sustained workloads qualify for reserved discounts (10% for a 1-month commitment, 20% for 3 months) — see compute pricing for the full rate card, regions, and reserved terms.

Rates are the source of truth in the API

Prices shown here are representative. The billed rate is always the value returned by GET /v1/gpu at provision time, which already reflects any account-level pricing.

Provision a cluster

Create a GPU cluster from Compute → GPUs → Clusters in the console, or from the CLI / API.

# Launch an 8× H100 training cluster
hanzo vm launch \
  --size gpu-h100x8-640gb \
  --region sfo3 \
  --name train-sfo3
# Or straight against the control plane
curl -X POST https://api.hanzo.ai/v1/visor/machines \
  -H "Authorization: Bearer hk-..." \
  -H "Content-Type: application/json" \
  -d '{"size":"gpu-h100x1-80gb","region":"sfo3","name":"infer-0"}'

Each cluster's nodes become your machines; the operator reconciles them into the running fleet and starts metering when they report online.

Inventory & telemetry

List every GPU with live telemetry — model, cluster, region, utilization, memory, temperature, and power — through the control plane:

curl https://api.hanzo.ai/v1/gpu \
  -H "Authorization: Bearer hk-..."
{
  "gpus": [
    {
      "id": "gpu-0",
      "model": "H100",
      "cluster": "train-sfo3",
      "region": "sfo3",
      "status": "online",
      "utilization": 87,
      "memoryUtil": 74,
      "temperature": 61,
      "power": 640
    }
  ]
}

Fields the platform does not report render as in the console rather than a fabricated value. Alerts and scheduling pools are available at /v1/gpu/alerts and /v1/gpu/pools.

Billing

GPU-hours are metered into your org's cloud-usage ledger and settled through commerce billing. Billing is prepaid: top up credits at console.hanzo.ai and spend draws down as clusters run. When the balance is exhausted, new provisioning is refused with 402 Payment Required ("Add credits at console.hanzo.ai") — running clusters are never silently overdrawn.

Card billing

Automatic card billing is being rolled out: attach a card in Billing → Payment methods and GPU-hours are charged to it as they accrue, so long-running clusters keep running without manual top-ups. Prepaid credits remain fully supported and are always drawn down first.

Track spend under GPUs → Pricing in the console, or pull usage from the billing API. See API Keys for auth and the API Reference for the full surface.

  • Machines — the cluster nodes your GPUs run on
  • Pipelines — build and deploy onto GPU clusters
  • Billing — credits, invoices, and usage metering
  • Edge — on-device inference when a cloud GPU is overkill

How is this guide?

Last updated on

On this page