HUD Documentation — Evaluations and RL Environments.

Install the CLI with uv tool install hud-python --python 3.12. Authenticate once with hud set HUD_API_KEY=....

Build & iterate

`hud init`

Scaffold a new environment package: env.py (tasks + capabilities), tasks.py, Dockerfile.hud, and pyproject.toml. Purely local — no network, no API key.

hud init my-env                 # create ./my-env
hud init my-env --dir envs      # create ./envs/my-env

Option	Description
`--dir`, `-d`	Parent directory (default `.`).
`--force`, `-f`	Overwrite existing files.

`hud serve`

Serve an environment’s control channel locally (tcp JSON-RPC). hud dev is a deprecated alias.

hud serve               # auto-detect env.py
hud serve env:env       # explicit module:attribute
hud serve env.py -p 9000

Option	Default	Description
`--port`, `-p`	`8765`	Port to serve on.
`--host`	`127.0.0.1`	Interface to bind (use `0.0.0.0` inside containers).
`--verbose`, `-v`	—	Detailed logs.

`hud deploy`

Build and publish to HUD infra in one step. The environment’s name comes from the Environment(...) declaration in code; deploying the same name again rebuilds that environment.

hud deploy

Option	Description
`--all`, `-a`	Deploy all environments in the directory.
`--env`, `-e`	Env var `KEY=VALUE` (repeatable).
`--env-file`	Path to a `.env` file.

Evaluate

`hud eval`

The primary local iteration loop: run an agent over a task source (.py, directory, or JSON/JSONL), grade the result, and print the reward. Each rollout gets a fresh subprocess for the env — no shared state between tasks.

hud eval env.py claude              # one task, one rollout
hud eval env.py haiku               # cheaper model for fast iteration
hud eval env.py claude --max-steps 30
hud eval env.py claude --all        # every task, not just the first
hud eval env.py claude --full       # every task, auto-respond, 100 steps

What you don’t need for a local run:

A HUD API key — local evals don’t hit the platform
hud serve running — hud eval spawns the env subprocess for you
Docker — unless your env explicitly uses DockerRuntime
An SSH connection — the gateway timeout only applies when env.workspace() is declared

For a platform taskset, pass its name or id directly: hud eval "My Tasks" claude. The tasks are fetched from the platform and the rollouts run remotely by default, since the env source is not on disk. Single-task runs show step-by-step progress (step number + tool calls). Multi-task batches are silent unless --verbose is passed.

Option	Description
`--full`	Run the whole dataset (`--all --auto-respond --max-steps 100`).
`--all`	Run every task instead of just the first.
`--model`, `-m`	Model id.
`--gateway`, `-g`	Force LLM calls through the HUD gateway. Implied when only `HUD_API_KEY` is set (no provider key); pass it to force the gateway when a provider key is also present.
`--group` (alias `--group-size`)	Runs per task — a group of repeats whose reward spread you can inspect.
`--max-concurrent`	Cap parallel rollouts.
`--max-steps`	Cap steps per task (default 10).
`--task-ids`	Comma-separated slugs or 0-based indices.
`--config`, `-c`	Agent config `key=value` (repeatable).
`--verbose`, `-v`	Show agent logs (step progress, tool calls) for batch runs too.
`--very-verbose`, `-vv`	Debug-level logs.
`--runtime`	Placement: `local`, `hud` (HUD runtime tunnel), or `tcp://host:port`. Defaults to `local` for a tasks file; platform tasksets default to remote hosted execution.
`--remote`	Run the whole rollout remotely on the HUD platform.
`--yes`, `-y`	Skip confirmation prompt.

Run a packaged image

hud task start / hud task grade attach to an env already serving locally (e.g. inside a built image, or alongside hud serve), or load one from source with --source. hud task list always reads from source (default .) — it doesn’t attach.

hud task list                          # what tasks are exposed
hud task start fix_bug                 # -> the prompt (stdout)
hud task grade fix_bug --answer "…"    # -> the reward (stdout)

Command	Key options
`hud task start <task>`	`--source`/`-s`, `--args` (JSON), `--url`/`-u`, `--out`/`-o`
`hud task grade <task>`	`--answer`, `--answer-file`, `--source`, `--args`, `--url`, `--out`
`hud task list`	`--source`/`-s`

Platform

hud sync tasks my-taskset      # publish tasks as a named taskset
hud sync env                   # sync environment metadata

External benchmark formats (currently Harbor) load directly into the runtime as Tasksets — no conversion step. See Harbor interop.

Other commands

Command	Description
`hud set KEY=VALUE`	Persist credentials/vars to `~/.hud/.env`.
`hud login`	Authenticate with HUD.
`hud models list`	List gateway models.
`hud models fork <model> --name <slug>`	Fork a trainable model from an existing one.
`hud models checkpoints <model>`	List a model’s checkpoint tree.
`hud models head <model> [--set <checkpoint-id>]`	Show — or set (rollback/select) — a model’s active checkpoint.
`hud cancel`	Cancel a running job.
`hud version`	Show the CLI version.

CLI

Build & iterate

`hud init`

`hud serve`

`hud deploy`

Evaluate

`hud eval`

Run a packaged image

Platform

Other commands

See also

Quickstart

Package & deploy

​Build & iterate

​hud init

​hud serve

​hud deploy

​Evaluate

​hud eval

​Run a packaged image

​Platform

​Other commands

​See also

Quickstart

Package & deploy

Build & iterate

`hud init`

`hud serve`

`hud deploy`

Evaluate

`hud eval`

Run a packaged image

Platform

Other commands

See also