Skip to main content

Goals

  • Understand what Harbor is and how its tasks map to OpenReward environments.
  • Install and use the harbor2or CLI tool.
  • Convert a Harbor task source into a deployable ORS environment.
  • Build, test, and deploy the generated environment to OpenReward.

Prerequisites

What is Harbor?

Harbor is a framework for defining agent tasks using a standardized directory structure. Each Harbor task includes instructions, a container environment (Dockerfile or pre-built image), verification tests, and optional oracle solutions. The harbor2or CLI tool converts Harbor task repositories into ORS environments that can be deployed on OpenReward. It handles downloading tasks, generating environment server code, building Docker images, and validating tasks with oracle agents.

Installation

Install harbor2or from GitHub:
pip install git+https://github.com/OpenRewardAI/harbor2or.git
If you plan to use HuggingFace datasets as a source, install the optional dependencies:
pip install huggingface_hub pyarrow

Quick Start

1

Create the environment

Point harbor2or create at a Harbor task source. This can be a GitHub repository, a HuggingFace dataset, or a local directory:
harbor2or create <source> <name>
For example, to convert Terminal-Bench into an OpenReward environment:
# From GitHub
harbor2or create https://github.com/laude-institute/terminal-bench-2 terminal-bench

# Or from HuggingFace
harbor2or create zai-org/terminal-bench-2-verified terminal-bench
This generates a complete ORS environment in ./terminal-bench/ containing:
  • server.py — the environment server with tools and task logic
  • Dockerfile and requirements.txt — for containerised deployment
  • splits.json and tasks.txt — task metadata and split mappings
You can customise the output with flags:
harbor2or create <source> <name> \
  --output ./custom-dir \
  --image-prefix myorg/env-name \
  --environment MyOrg/MyEnvironment
2

Build Docker images

Build the Docker images for each task locally:
harbor2or build terminal-bench --local
This builds and pushes an image per task, then caches the image digest in each task’s sha.txt file.
For production workloads, harbor2or build also supports Google Cloud Build for parallel, remote image building. See the harbor2or README for details.
3

Test with the oracle agent

Run the oracle agent on tasks that include a solution/solve.sh file:
export OPENREWARD_API_KEY='your-api-key'
harbor2or test terminal-bench --concurrency 4 --verbose
This starts a sandbox for each task, uploads and runs the oracle solution, then reports pass/fail results with a summary.
4

Deploy to OpenReward

Push the generated environment to GitHub and connect it via the OpenReward dashboard:
cd terminal-bench
git init && git add -A && git commit -m "Initial environment"
git remote add origin https://github.com/yourorg/terminal-bench.git
git push -u origin main
Then follow the GitHub deployment guide to connect the repository and trigger your first build on OpenReward.

Generated Tools

The generated environment exposes the following tools to the agent:
ToolDescription
bashRun shell commands in the sandbox
str_replaceEdit files via unique string replacement
viewView file contents or directory listings
create_fileCreate new files with content
submit_answerRun verification and return reward
When an agent calls submit_answer, the environment executes the tests/test.sh script from the original Harbor task to verify the result. The reward is read from /logs/verifier/reward.txt in the sandbox.

Supported Sources

The harbor2or create command accepts several source formats:
  • GitHub repository: https://github.com/org/repo
  • HuggingFace dataset URL: https://huggingface.co/datasets/org/name
  • HuggingFace short ID: org/name
  • Local directory: ./path/to/tasks
Harbor tasks are auto-detected within the source. Each valid task must contain an instruction.md, task.toml, tests/test.sh, and either an environment/Dockerfile or a docker_image reference in task.toml.

Next Steps