Goals
- Understand what Verifiers is and how its environments map to OpenReward.
- Install and use the
verifiers2orCLI tools. - Convert a Verifiers environment into a deployable ORS environment.
- Build, test, and deploy the generated environment to OpenReward.
Prerequisites
- An OpenReward account and API key
- Python 3.11+
- A Verifiers environment package to convert (e.g.,
wordle) - Familiarity with environments and deployment
What is Verifiers?
Verifiers is Prime Intellect’s library for creating environments to train and evaluate LLMs. Each environment is a self-contained Python module that packages a dataset of task inputs, a harness for tools and context management, and a reward function for scoring. Verifiers environments support reinforcement learning training, capability evaluation, synthetic data generation, and agent experimentation. The key architectural difference with ORS is Cartesian separation. In Verifiers, the environment and agent are entangled: the environment owns the agent loop, calls the model, parses its raw text output, and scores the episode. Environment and agent are one fused unit, so changing either means rewriting the other. ORS decouples these completely. The environment is an HTTP server that exposes tools and returns structured feedback. The agent is an independent process that calls those tools. Neither knows the other’s internals. This means any agent can be paired with any environment - they are orthogonal axes, not coupled components.- Verifiers: Environment ↔ Agent are entangled. The environment drives the loop, calls the model, and parses raw text.
- ORS: Environment ⊥ Agent are separated. The environment exposes tools via HTTP; the agent drives the loop and calls tools.
verifiers2or toolkit bridges these paradigms, letting you convert existing Verifiers environments into ORS-compatible servers.
Installation
Installverifiers2or from GitHub:
Quick Start
Analyze the environment
Use This outputs the environment type, system prompt, parser fields, reward functions, dataset structure, and more:
analyzer to inspect a Verifiers environment before converting:The analyzer accepts installed packages (
wordle, primeintellect/gsm8k) or local files (./path/to/my_env.py).Convert the environment
You have two conversion options:Option A: Quick wrap (zero-code)Wrap the environment as an ORS server with one command:This starts an ORS server on Option B: Generate scaffold (full control)Generate ORS environment code for customisation:This creates:
http://localhost:8080 with tools, tasks, and splits automatically extracted from the Verifiers environment.To pass arguments to the environment:wordle_env.py— ORS Environment subclass with TODOs for game logicserver.py— Server entry pointtest_wordle.py— Client test script
Test locally
Start the server and run the test script:Verify that splits and tasks are listed, tools appear with correct schemas, and episodes complete with
finished=True and valid rewards.Deploy to OpenReward
Push the generated environment to GitHub and connect it via the OpenReward dashboard:Then follow the GitHub deployment guide to connect the repository and trigger your first build on OpenReward.
Conversion Methods
| Method | Use Case | Output |
|---|---|---|
wrap | Quick prototyping, testing | ORS server at runtime (no code generated) |
scaffold | Production, customisation | Python files with TODOs for manual completion |
The
wrap command reuses all original Verifiers logic (game state, feedback, scoring) at runtime. Use scaffold when you need full control over the ORS implementation.Concept Mapping
This table shows how Verifiers concepts translate to ORS:| Verifiers | ORS | Notes |
|---|---|---|
load_environment() | __init__() + setup() | ORS creates one instance per session |
| Dataset (HuggingFace) | list_tasks(split) | Each row becomes a task JSON object |
| Rubric + reward funcs | ToolOutput.reward | Reward returned on terminal tool call |
| Parser (XMLParser) | Pydantic BaseModel | Structured tool input replaces text parsing |
env_response() | @tool method | Agent calls tools instead of env parsing text |
system_prompt | get_prompt() | Prompt delivered via API |
State dict | Instance attributes | ORS env is instantiated per session |
@vf.stop conditions | ToolOutput(finished=True) | Termination signaled via tool output |
| train/eval datasets | list_splits() + Split | Named splits with type annotation |
Supported Environment Types
| Type | Wrapper Support | Scaffold Support | Notes |
|---|---|---|---|
SingleTurnEnv | Full | Full | One submit tool |
MultiTurnEnv | Full | Full | Tool from parser fields |
ToolEnv | Full | Full | One ORS tool per Verifiers tool |
StatefulToolEnv | Full | Full | Same as ToolEnv |
SandboxEnv | Partial | Template + TODOs | Requires manual sandbox config |

