Skip to main content
Install the CLI with uv tool install hud-python --python 3.12. Authenticate once with hud set HUD_API_KEY=....

Build & iterate

hud init

Scaffold a new environment package: env.py (tasks + capabilities), tasks.py, Dockerfile.hud, and pyproject.toml. Purely local — no network, no API key.
hud init my-env                 # create ./my-env
hud init my-env --dir envs      # create ./envs/my-env
OptionDescription
--dir, -dParent directory (default .).
--force, -fOverwrite existing files.

hud serve

Serve an environment’s control channel locally (tcp JSON-RPC). hud dev is a deprecated alias.
hud serve               # auto-detect env.py
hud serve env:env       # explicit module:attribute
hud serve env.py -p 9000
OptionDefaultDescription
--port, -p8765Port to serve on.
--host127.0.0.1Interface to bind (use 0.0.0.0 inside containers).
--verbose, -vDetailed logs.

hud deploy

Build and publish to HUD infra in one step. The environment’s name comes from the Environment(...) declaration in code; deploying the same name again rebuilds that environment.
hud deploy
OptionDescription
--all, -aDeploy all environments in the directory.
--env, -eEnv var KEY=VALUE (repeatable).
--env-filePath to a .env file.

Evaluate

hud eval

The primary local iteration loop: run an agent over a task source (.py, directory, or JSON/JSONL), grade the result, and print the reward. Each rollout gets a fresh subprocess for the env — no shared state between tasks.
hud eval env.py claude              # one task, one rollout
hud eval env.py haiku               # cheaper model for fast iteration
hud eval env.py claude --max-steps 30
hud eval env.py claude --all        # every task, not just the first
hud eval env.py claude --full       # every task, auto-respond, 100 steps
What you don’t need for a local run:
  • A HUD API key — local evals don’t hit the platform
  • hud serve running — hud eval spawns the env subprocess for you
  • Docker — unless your env explicitly uses DockerRuntime
  • An SSH connection — the gateway timeout only applies when env.workspace() is declared
For a platform taskset, pass its name or id directly: hud eval "My Tasks" claude. The tasks are fetched from the platform and the rollouts run remotely by default, since the env source is not on disk. Single-task runs show step-by-step progress (step number + tool calls). Multi-task batches are silent unless --verbose is passed.
OptionDescription
--fullRun the whole dataset (--all --auto-respond --max-steps 100).
--allRun every task instead of just the first.
--model, -mModel id.
--gateway, -gForce LLM calls through the HUD gateway. Implied when only HUD_API_KEY is set (no provider key); pass it to force the gateway when a provider key is also present.
--group (alias --group-size)Runs per task — a group of repeats whose reward spread you can inspect.
--max-concurrentCap parallel rollouts.
--max-stepsCap steps per task (default 10).
--task-idsComma-separated slugs or 0-based indices.
--config, -cAgent config key=value (repeatable).
--verbose, -vShow agent logs (step progress, tool calls) for batch runs too.
--very-verbose, -vvDebug-level logs.
--runtimePlacement: local, hud (HUD runtime tunnel), or tcp://host:port. Defaults to local for a tasks file; platform tasksets default to remote hosted execution.
--remoteRun the whole rollout remotely on the HUD platform.
--yes, -ySkip confirmation prompt.

Run a packaged image

hud task start / hud task grade attach to an env already serving locally (e.g. inside a built image, or alongside hud serve), or load one from source with --source. hud task list always reads from source (default .) — it doesn’t attach.
hud task list                          # what tasks are exposed
hud task start fix_bug                 # -> the prompt (stdout)
hud task grade fix_bug --answer "…"    # -> the reward (stdout)
CommandKey options
hud task start <task>--source/-s, --args (JSON), --url/-u, --out/-o
hud task grade <task>--answer, --answer-file, --source, --args, --url, --out
hud task list--source/-s

Platform

hud sync tasks my-taskset      # publish tasks as a named taskset
hud sync env                   # sync environment metadata
External benchmark formats (currently Harbor) load directly into the runtime as Tasksets — no conversion step. See Harbor interop.

Other commands

CommandDescription
hud set KEY=VALUEPersist credentials/vars to ~/.hud/.env.
hud loginAuthenticate with HUD.
hud models listList gateway models.
hud models fork <model> --name <slug>Fork a trainable model from an existing one.
hud models checkpoints <model>List a model’s checkpoint tree.
hud models head <model> [--set <checkpoint-id>]Show — or set (rollback/select) — a model’s active checkpoint.
hud cancelCancel a running job.
hud versionShow the CLI version.

See also

Quickstart

Package & deploy