uv tool install hud-python --python 3.12. Authenticate once with hud set HUD_API_KEY=....
Build & iterate
hud init
Scaffold a new environment package: env.py (tasks + capabilities), tasks.py, Dockerfile.hud, and pyproject.toml. Purely local — no network, no API key.
| Option | Description |
|---|---|
--dir, -d | Parent directory (default .). |
--force, -f | Overwrite existing files. |
hud serve
Serve an environment’s control channel locally (tcp JSON-RPC). hud dev is a
deprecated alias.
| Option | Default | Description |
|---|---|---|
--port, -p | 8765 | Port to serve on. |
--host | 127.0.0.1 | Interface to bind (use 0.0.0.0 inside containers). |
--verbose, -v | — | Detailed logs. |
hud deploy
Build and publish to HUD infra in one step. The environment’s name comes
from the Environment(...) declaration in code; deploying the same name again
rebuilds that environment.
| Option | Description |
|---|---|
--all, -a | Deploy all environments in the directory. |
--env, -e | Env var KEY=VALUE (repeatable). |
--env-file | Path to a .env file. |
Evaluate
hud eval
The primary local iteration loop: run an agent over a task source (.py, directory, or JSON/JSONL), grade the result, and print the reward. Each rollout gets a fresh subprocess for the env — no shared state between tasks.
- A HUD API key — local evals don’t hit the platform
hud serverunning —hud evalspawns the env subprocess for you- Docker — unless your env explicitly uses
DockerRuntime - An SSH connection — the gateway timeout only applies when
env.workspace()is declared
hud eval "My Tasks" claude. The tasks are fetched from the platform and the rollouts run remotely by default, since the env source is not on disk.
Single-task runs show step-by-step progress (step number + tool calls). Multi-task batches are silent unless --verbose is passed.
| Option | Description |
|---|---|
--full | Run the whole dataset (--all --auto-respond --max-steps 100). |
--all | Run every task instead of just the first. |
--model, -m | Model id. |
--gateway, -g | Force LLM calls through the HUD gateway. Implied when only HUD_API_KEY is set (no provider key); pass it to force the gateway when a provider key is also present. |
--group (alias --group-size) | Runs per task — a group of repeats whose reward spread you can inspect. |
--max-concurrent | Cap parallel rollouts. |
--max-steps | Cap steps per task (default 10). |
--task-ids | Comma-separated slugs or 0-based indices. |
--config, -c | Agent config key=value (repeatable). |
--verbose, -v | Show agent logs (step progress, tool calls) for batch runs too. |
--very-verbose, -vv | Debug-level logs. |
--runtime | Placement: local, hud (HUD runtime tunnel), or tcp://host:port. Defaults to local for a tasks file; platform tasksets default to remote hosted execution. |
--remote | Run the whole rollout remotely on the HUD platform. |
--yes, -y | Skip confirmation prompt. |
Run a packaged image
hud task start / hud task grade attach to an env already serving locally (e.g. inside a built image, or alongside hud serve), or load one from source with --source. hud task list always reads from source (default .) — it doesn’t attach.
| Command | Key options |
|---|---|
hud task start <task> | --source/-s, --args (JSON), --url/-u, --out/-o |
hud task grade <task> | --answer, --answer-file, --source, --args, --url, --out |
hud task list | --source/-s |
Platform
Tasksets — no conversion step. See Harbor interop.
Other commands
| Command | Description |
|---|---|
hud set KEY=VALUE | Persist credentials/vars to ~/.hud/.env. |
hud login | Authenticate with HUD. |
hud models list | List gateway models. |
hud models fork <model> --name <slug> | Fork a trainable model from an existing one. |
hud models checkpoints <model> | List a model’s checkpoint tree. |
hud models head <model> [--set <checkpoint-id>] | Show — or set (rollback/select) — a model’s active checkpoint. |
hud cancel | Cancel a running job. |
hud version | Show the CLI version. |