Skip to main content

Goals

  • Understand what Verifiers is and how its environments map to OpenReward.
  • Install and use the verifiers2or CLI tools.
  • Convert a Verifiers environment into a deployable ORS environment.
  • Build, test, and deploy the generated environment to OpenReward.

Prerequisites

What is Verifiers?

Verifiers is Prime Intellect’s library for creating environments to train and evaluate LLMs. Each environment is a self-contained Python module that packages a dataset of task inputs, a harness for tools and context management, and a reward function for scoring. Verifiers environments support reinforcement learning training, capability evaluation, synthetic data generation, and agent experimentation. The key architectural difference with ORS is Cartesian separation. In Verifiers, the environment and agent are entangled: the environment owns the agent loop, calls the model, parses its raw text output, and scores the episode. Environment and agent are one fused unit, so changing either means rewriting the other. ORS decouples these completely. The environment is an HTTP server that exposes tools and returns structured feedback. The agent is an independent process that calls those tools. Neither knows the other’s internals. This means any agent can be paired with any environment - they are orthogonal axes, not coupled components.
  • Verifiers: Environment ↔ Agent are entangled. The environment drives the loop, calls the model, and parses raw text.
  • ORS: Environment ⊥ Agent are separated. The environment exposes tools via HTTP; the agent drives the loop and calls tools.
The verifiers2or toolkit bridges these paradigms, letting you convert existing Verifiers environments into ORS-compatible servers.

Installation

Install verifiers2or from GitHub:
pip install git+https://github.com/OpenRewardAI/verifiers2or.git
You also need the Verifiers environment package you want to convert:
pip install verifiers>=0.1.9

Quick Start

1

Analyze the environment

Use analyzer to inspect a Verifiers environment before converting:
python -m converter.analyzer wordle
This outputs the environment type, system prompt, parser fields, reward functions, dataset structure, and more:
=== Verifiers Environment Analysis ===
Class:        TextArenaEnv
Type:         MultiTurnEnv
System prompt: 'You are a competitive game player...'

--- Parser ---
Type:         XMLParser
Fields:       ['guess']
Answer field: guess

--- Reward Functions ---
  correct_answer (weight=1.0)
  partial_answer (weight=1.0)
  length_bonus (weight=1.0)
The analyzer accepts installed packages (wordle, primeintellect/gsm8k) or local files (./path/to/my_env.py).
2

Convert the environment

You have two conversion options:Option A: Quick wrap (zero-code)Wrap the environment as an ORS server with one command:
python -m converter.wrap wordle
This starts an ORS server on http://localhost:8080 with tools, tasks, and splits automatically extracted from the Verifiers environment.To pass arguments to the environment:
python -m converter.wrap wordle --env-args '{"num_train_examples": 500}'
Option B: Generate scaffold (full control)Generate ORS environment code for customisation:
python -m converter.scaffolder wordle -o ./my_wordle -n wordle
This creates:
  • wordle_env.py — ORS Environment subclass with TODOs for game logic
  • server.py — Server entry point
  • test_wordle.py — Client test script
3

Test locally

Start the server and run the test script:
# Terminal 1: Start the server
cd ./my_wordle
python server.py

# Terminal 2: Run the test
python test_wordle.py
Verify that splits and tasks are listed, tools appear with correct schemas, and episodes complete with finished=True and valid rewards.
4

Deploy to OpenReward

Push the generated environment to GitHub and connect it via the OpenReward dashboard:
cd my_wordle
git init && git add -A && git commit -m "Initial environment"
git remote add origin https://github.com/yourorg/my-wordle.git
git push -u origin main
Then follow the GitHub deployment guide to connect the repository and trigger your first build on OpenReward.

Conversion Methods

MethodUse CaseOutput
wrapQuick prototyping, testingORS server at runtime (no code generated)
scaffoldProduction, customisationPython files with TODOs for manual completion
The wrap command reuses all original Verifiers logic (game state, feedback, scoring) at runtime. Use scaffold when you need full control over the ORS implementation.

Concept Mapping

This table shows how Verifiers concepts translate to ORS:
VerifiersORSNotes
load_environment()__init__() + setup()ORS creates one instance per session
Dataset (HuggingFace)list_tasks(split)Each row becomes a task JSON object
Rubric + reward funcsToolOutput.rewardReward returned on terminal tool call
Parser (XMLParser)Pydantic BaseModelStructured tool input replaces text parsing
env_response()@tool methodAgent calls tools instead of env parsing text
system_promptget_prompt()Prompt delivered via API
State dictInstance attributesORS env is instantiated per session
@vf.stop conditionsToolOutput(finished=True)Termination signaled via tool output
train/eval datasetslist_splits() + SplitNamed splits with type annotation

Supported Environment Types

TypeWrapper SupportScaffold SupportNotes
SingleTurnEnvFullFullOne submit tool
MultiTurnEnvFullFullTool from parser fields
ToolEnvFullFullOne ORS tool per Verifiers tool
StatefulToolEnvFullFullSame as ToolEnv
SandboxEnvPartialTemplate + TODOsRequires manual sandbox config

Next Steps