GPUs
On-demand H100/H200/A100/L40S GPU compute — metered by the hour, provisioned from the console or API, billed to your org's cloud-usage ledger.
GPUs
On-demand accelerated compute for training, fine-tuning, and inference. Provision GPU clusters, watch live utilization, and meter spend from the console or the API. GPUs bill by the hour to the same per-org cloud-usage ledger as the rest of your fleet.
How it works
Hanzo Cloud GPUs are real accelerators resold and metered by the hour. Capacity is sourced from cloud GPU providers — DigitalOcean GPU Droplets are the primary pool (H100, A100, L40S), with Paperspace and AWS as secondary pools for burst and specialized instances — and reconciled by the operator alongside your machines. You get the accelerator; Hanzo handles provisioning, scheduling, telemetry, and a single unified bill.
- One key, one bill — GPU-hours meter to your org's cloud-usage ledger through
commerce billing, priced through
api.hanzo.ai/v1/gpu. - No idle lock-in — hourly metering starts when a node comes online and stops when you tear the cluster down.
- Honest telemetry — utilization, memory, temperature, and power come straight from
the machine; fields the provider does not report render as
—, never a fabricated value.
Accelerators
| Model | Memory | Typical use |
|---|---|---|
| H200 | 141 GB | Frontier-model training, long-context and low-latency inference |
| H100 | 80 GB | Large-model training and high-throughput inference |
| A100 | 40 / 80 GB | Training, fine-tuning, batched inference |
| L40S / L40 | 48 GB | Inference, rendering, mixed media workloads |
| A6000 / A5000 | 24–48 GB | Cost-efficient inference and development |
| A4000 | 16 GB | Entry-level inference and dev boxes |
Accelerators are grouped into clusters (a node pool of one GPU size) and scheduled
into pools. Node sizes map to real accelerator counts — a gpu-h100x8-640gb node
genuinely holds eight H100s:
| Node size | GPUs / node |
|---|---|
gpu-h100x1-80gb | 1× H100 |
gpu-h100x8-640gb | 8× H100 |
gpu-a100x1-80gb | 1× A100 |
gpu-a100x8-640gb | 8× A100 |
gpu-l40sx1-48gb | 1× L40S |
Pricing
GPUs are metered per GPU-hour. Representative on-demand rates:
| GPU | VRAM | On-demand |
|---|---|---|
| NVIDIA T4 | 16 GB | $0.50 / hr |
| NVIDIA A100 40GB | 40 GB | $2.50 / hr |
| NVIDIA A100 80GB | 80 GB | $3.80 / hr |
| NVIDIA H100 | 80 GB | $5.50 / hr |
Live pricing is served from GET /v1/gpu and mirrored under Compute → GPUs → Pricing
in the console. Sustained workloads qualify for reserved discounts (10% for a 1-month
commitment, 20% for 3 months) — see
compute pricing for the full rate card,
regions, and reserved terms.
Rates are the source of truth in the API
Prices shown here are representative. The billed rate is always the value returned by
GET /v1/gpu at provision time, which already reflects any account-level pricing.
Provision a cluster
Create a GPU cluster from Compute → GPUs → Clusters in the console, or from the CLI / API.
# Launch an 8× H100 training cluster
hanzo vm launch \
--size gpu-h100x8-640gb \
--region sfo3 \
--name train-sfo3# Or straight against the control plane
curl -X POST https://api.hanzo.ai/v1/visor/machines \
-H "Authorization: Bearer hk-..." \
-H "Content-Type: application/json" \
-d '{"size":"gpu-h100x1-80gb","region":"sfo3","name":"infer-0"}'Each cluster's nodes become your machines; the operator reconciles them into the running fleet and starts metering when they report online.
Inventory & telemetry
List every GPU with live telemetry — model, cluster, region, utilization, memory, temperature, and power — through the control plane:
curl https://api.hanzo.ai/v1/gpu \
-H "Authorization: Bearer hk-..."{
"gpus": [
{
"id": "gpu-0",
"model": "H100",
"cluster": "train-sfo3",
"region": "sfo3",
"status": "online",
"utilization": 87,
"memoryUtil": 74,
"temperature": 61,
"power": 640
}
]
}Fields the platform does not report render as — in the console rather than a
fabricated value. Alerts and scheduling pools are available at /v1/gpu/alerts and
/v1/gpu/pools.
Billing
GPU-hours are metered into your org's cloud-usage ledger and settled through
commerce billing. Billing is prepaid: top up credits at
console.hanzo.ai and spend draws down as clusters run. When
the balance is exhausted, new provisioning is refused with 402 Payment Required
("Add credits at console.hanzo.ai") — running clusters are never silently overdrawn.
Card billing
Automatic card billing is being rolled out: attach a card in Billing → Payment methods and GPU-hours are charged to it as they accrue, so long-running clusters keep running without manual top-ups. Prepaid credits remain fully supported and are always drawn down first.
Track spend under GPUs → Pricing in the console, or pull usage from the billing API. See API Keys for auth and the API Reference for the full surface.
Related
How is this guide?
Last updated on