Documentation Index
Fetch the complete documentation index at: https://docs.openreward.ai/llms.txt
Use this file to discover all available pages before exploring further.
Goals
- Understand when and why to host multiple environments in a single project
- Define multiple Environment subclasses and serve them from one server
- Select a specific environment variant from the client
Prerequisites
Introduction
So far, each project we’ve built has had a single Environment class served by a single server. But sometimes you have a family of related environments that share logic, data, or infrastructure. For example, an arithmetic benchmark might have a basic variant (addition and subtraction) and a bitwise variant (AND, OR, XOR). These environments share common patterns - task loading, answer verification, data formats - so it makes sense to keep them in the same codebase rather than maintaining separate projects.
An ORS server can host multiple Environment classes. When it does, the relationship between server and environment is no longer one-to-one. Clients need to specify which variant they want to interact with.
Defining multiple environments
Let’s build two arithmetic environments that share a common AnswerParams model but define different tasks and verification logic.
from pydantic import BaseModel
from openreward.environments import Environment, JSONObject, Server, Split, TextBlock, ToolOutput, tool
class AnswerParams(BaseModel):
answer: str
# --- Basic Arithmetic ---
class BasicArithmeticTaskSpec(BaseModel):
id: str
problem: str
answer: int
basic_tasks = [
{"id": "0", "problem": "What is 7 + 3?", "answer": 10},
{"id": "1", "problem": "What is 15 - 8?", "answer": 7},
]
class BasicArithmetic(Environment):
"""Addition and subtraction problems."""
def __init__(self, task_spec: JSONObject = {}, secrets: dict[str, str] = {}):
super().__init__(task_spec)
self.config = BasicArithmeticTaskSpec.model_validate(task_spec)
@classmethod
def list_splits(cls):
return [Split(name="train", type="train")]
@classmethod
def list_tasks(cls, split: str) -> list[JSONObject]:
if split == "train":
return basic_tasks
raise ValueError(f"Unknown split: {split}")
def get_prompt(self):
return [TextBlock(type="text", text=self.config.problem)]
@tool
async def answer(self, params: AnswerParams) -> ToolOutput:
"""Submit your final answer."""
try:
is_correct = int(params.answer) == self.config.answer
except ValueError:
is_correct = False
return ToolOutput(
blocks=[TextBlock(type="text", text="Correct!" if is_correct else "Wrong!")],
reward=1.0 if is_correct else 0.0,
finished=True,
)
# --- Bitwise Arithmetic ---
class BitwiseArithmeticTaskSpec(BaseModel):
id: str
problem: str
answer: int
bitwise_tasks = [
{"id": "0", "problem": "What is 5 AND 3? (bitwise)", "answer": 1},
{"id": "1", "problem": "What is 5 OR 3? (bitwise)", "answer": 7},
{"id": "2", "problem": "What is 5 XOR 3? (bitwise)", "answer": 6},
]
class BitwiseArithmetic(Environment):
"""Bitwise operation problems."""
def __init__(self, task_spec: JSONObject = {}, secrets: dict[str, str] = {}):
super().__init__(task_spec)
self.config = BitwiseArithmeticTaskSpec.model_validate(task_spec)
@classmethod
def list_splits(cls):
return [Split(name="train", type="train")]
@classmethod
def list_tasks(cls, split: str) -> list[JSONObject]:
if split == "train":
return bitwise_tasks
raise ValueError(f"Unknown split: {split}")
def get_prompt(self):
return [TextBlock(type="text", text=self.config.problem)]
@tool
async def answer(self, params: AnswerParams) -> ToolOutput:
"""Submit your final answer."""
try:
is_correct = int(params.answer) == self.config.answer
except ValueError:
is_correct = False
return ToolOutput(
blocks=[TextBlock(type="text", text="Correct!" if is_correct else "Wrong!")],
reward=1.0 if is_correct else 0.0,
finished=True,
)
Both environments share AnswerParams and follow the same structure. The only differences are the tasks and the domain.
Serving multiple environments
To serve both environments from a single server, pass them as a list to Server:
ENVIRONMENTS = [
BasicArithmetic,
BitwiseArithmetic,
]
if __name__ == "__main__":
Server(ENVIRONMENTS).run()
Each environment class is registered by its lowercased class name:
BasicArithmetic → "basicarithmetic"
BitwiseArithmetic → "bitwisearithmetic"
The first environment in the list is the default. If a client doesn’t specify a variant, it will interact with BasicArithmetic.
Run the server:
Selecting a variant from the client
When a server hosts a single environment, you don’t need to specify a variant:
# Single environment — no variant needed
environment = or_client.environments.get(name="gsm8k", base_url="http://localhost:8080")
When a server hosts multiple environments, you need to pass the variant parameter to target a specific one:
from openreward import OpenReward
or_client = OpenReward()
# Get the basic arithmetic variant
basic_env = or_client.environments.get(
name="ArithmeticEnv",
variant="basicarithmetic",
base_url="http://localhost:8080"
)
# Get the bitwise arithmetic variant
bitwise_env = or_client.environments.get(
name="ArithmeticEnv",
variant="bitwisearithmetic",
base_url="http://localhost:8080"
)
From here, each environment works exactly as before. You can list tasks, start sessions, and call tools independently:
# List tasks for each variant
basic_tasks = basic_env.list_tasks(split="train")
bitwise_tasks = bitwise_env.list_tasks(split="train")
print(f"Basic tasks: {len(basic_tasks)}") # 2
print(f"Bitwise tasks: {len(bitwise_tasks)}") # 3
# Run a session on the basic variant
with basic_env.session(task=basic_tasks[0]) as session:
prompt = session.get_prompt()
print(prompt[0].text) # "What is 7 + 3?"
result = session.call_tool("answer", {"answer": "10"})
print(result.reward) # 1.0
# Run a session on the bitwise variant
with bitwise_env.session(task=bitwise_tasks[0]) as session:
prompt = session.get_prompt()
print(prompt[0].text) # "What is 5 AND 3? (bitwise)"
result = session.call_tool("answer", {"answer": "1"})
print(result.reward) # 1.0
Organizing your code
The environment classes can live in the same file or in separate modules. For a small number of variants, a single file is fine. As the number of variants grows, splitting them into separate files keeps things manageable:
from basic_arithmetic import BasicArithmetic
from bitwise_arithmetic import BitwiseArithmetic
from modular_arithmetic import ModularArithmetic
from roman_numeral_arithmetic import RomanNumeralArithmetic
ENVIRONMENTS = [
BasicArithmetic,
BitwiseArithmetic,
ModularArithmetic,
RomanNumeralArithmetic,
]
if __name__ == "__main__":
Server(ENVIRONMENTS).run()
Shared logic - task spec models, grading utilities, data loading - can go in common modules that each environment imports. This is the main benefit of keeping related environments in one project: you write the shared code once.