Using Environment Variants

Goals

Understand when and why to host multiple environments in a single project
Define multiple Environment subclasses and serve them from one server
Select a specific environment variant from the client

Prerequisites

Completion of the Your First Environment tutorial

Introduction

So far, each project we’ve built has had a single Environment class served by a single server. But sometimes you have a family of related environments that share logic, data, or infrastructure. For example, an arithmetic benchmark might have a basic variant (addition and subtraction) and a bitwise variant (AND, OR, XOR). These environments share common patterns - task loading, answer verification, data formats - so it makes sense to keep them in the same codebase rather than maintaining separate projects. An ORS server can host multiple Environment classes. When it does, the relationship between server and environment is no longer one-to-one. Clients need to specify which variant they want to interact with.

Defining multiple environments

Let’s build two arithmetic environments that share a common AnswerParams model but define different tasks and verification logic.

from pydantic import BaseModel
from openreward.environments import Environment, JSONObject, Server, Split, TextBlock, ToolOutput, tool


class AnswerParams(BaseModel):
    answer: str


# --- Basic Arithmetic ---

class BasicArithmeticTaskSpec(BaseModel):
    id: str
    problem: str
    answer: int

basic_tasks = [
    {"id": "0", "problem": "What is 7 + 3?", "answer": 10},
    {"id": "1", "problem": "What is 15 - 8?", "answer": 7},
]

class BasicArithmetic(Environment):
    """Addition and subtraction problems."""

    def __init__(self, task_spec: JSONObject = {}, secrets: dict[str, str] = {}):
        super().__init__(task_spec)
        self.config = BasicArithmeticTaskSpec.model_validate(task_spec)

    @classmethod
    def list_splits(cls):
        return [Split(name="train", type="train")]

    @classmethod
    def list_tasks(cls, split: str) -> list[JSONObject]:
        if split == "train":
            return basic_tasks
        raise ValueError(f"Unknown split: {split}")

    def get_prompt(self):
        return [TextBlock(type="text", text=self.config.problem)]

    @tool
    async def answer(self, params: AnswerParams) -> ToolOutput:
        """Submit your final answer."""
        try:
            is_correct = int(params.answer) == self.config.answer
        except ValueError:
            is_correct = False

        return ToolOutput(
            blocks=[TextBlock(type="text", text="Correct!" if is_correct else "Wrong!")],
            reward=1.0 if is_correct else 0.0,
            finished=True,
        )


# --- Bitwise Arithmetic ---

class BitwiseArithmeticTaskSpec(BaseModel):
    id: str
    problem: str
    answer: int

bitwise_tasks = [
    {"id": "0", "problem": "What is 5 AND 3? (bitwise)", "answer": 1},
    {"id": "1", "problem": "What is 5 OR 3? (bitwise)", "answer": 7},
    {"id": "2", "problem": "What is 5 XOR 3? (bitwise)", "answer": 6},
]

class BitwiseArithmetic(Environment):
    """Bitwise operation problems."""

    def __init__(self, task_spec: JSONObject = {}, secrets: dict[str, str] = {}):
        super().__init__(task_spec)
        self.config = BitwiseArithmeticTaskSpec.model_validate(task_spec)

    @classmethod
    def list_splits(cls):
        return [Split(name="train", type="train")]

    @classmethod
    def list_tasks(cls, split: str) -> list[JSONObject]:
        if split == "train":
            return bitwise_tasks
        raise ValueError(f"Unknown split: {split}")

    def get_prompt(self):
        return [TextBlock(type="text", text=self.config.problem)]

    @tool
    async def answer(self, params: AnswerParams) -> ToolOutput:
        """Submit your final answer."""
        try:
            is_correct = int(params.answer) == self.config.answer
        except ValueError:
            is_correct = False

        return ToolOutput(
            blocks=[TextBlock(type="text", text="Correct!" if is_correct else "Wrong!")],
            reward=1.0 if is_correct else 0.0,
            finished=True,
        )

Both environments share AnswerParams and follow the same structure. The only differences are the tasks and the domain.

Serving multiple environments

To serve both environments from a single server, pass them as a list to Server:

ENVIRONMENTS = [
    BasicArithmetic,
    BitwiseArithmetic,
]

if __name__ == "__main__":
    Server(ENVIRONMENTS).run()

Each environment class is registered by its lowercased class name:

BasicArithmetic → "basicarithmetic"
BitwiseArithmetic → "bitwisearithmetic"

The first environment in the list is the default. If a client doesn’t specify a variant, it will interact with BasicArithmetic. Run the server:

python server.py

Selecting a variant from the client

When a server hosts a single environment, you don’t need to specify a variant:

# Single environment — no variant needed
environment = or_client.environments.get(name="gsm8k", base_url="http://localhost:8080")

When a server hosts multiple environments, you need to pass the variant parameter to target a specific one:

from openreward import OpenReward

or_client = OpenReward()

# Get the basic arithmetic variant
basic_env = or_client.environments.get(
    name="ArithmeticEnv",
    variant="basicarithmetic",
    base_url="http://localhost:8080"
)

# Get the bitwise arithmetic variant
bitwise_env = or_client.environments.get(
    name="ArithmeticEnv",
    variant="bitwisearithmetic",
    base_url="http://localhost:8080"
)

From here, each environment works exactly as before. You can list tasks, start sessions, and call tools independently:

# List tasks for each variant
basic_tasks = basic_env.list_tasks(split="train")
bitwise_tasks = bitwise_env.list_tasks(split="train")

print(f"Basic tasks: {len(basic_tasks)}")    # 2
print(f"Bitwise tasks: {len(bitwise_tasks)}") # 3

# Run a session on the basic variant
with basic_env.session(task=basic_tasks[0]) as session:
    prompt = session.get_prompt()
    print(prompt[0].text)  # "What is 7 + 3?"

    result = session.call_tool("answer", {"answer": "10"})
    print(result.reward)   # 1.0

# Run a session on the bitwise variant
with bitwise_env.session(task=bitwise_tasks[0]) as session:
    prompt = session.get_prompt()
    print(prompt[0].text)  # "What is 5 AND 3? (bitwise)"

    result = session.call_tool("answer", {"answer": "1"})
    print(result.reward)   # 1.0

Organizing your code

The environment classes can live in the same file or in separate modules. For a small number of variants, a single file is fine. As the number of variants grows, splitting them into separate files keeps things manageable:

from basic_arithmetic import BasicArithmetic
from bitwise_arithmetic import BitwiseArithmetic
from modular_arithmetic import ModularArithmetic
from roman_numeral_arithmetic import RomanNumeralArithmetic

ENVIRONMENTS = [
    BasicArithmetic,
    BitwiseArithmetic,
    ModularArithmetic,
    RomanNumeralArithmetic,
]

if __name__ == "__main__":
    Server(ENVIRONMENTS).run()

Shared logic - task spec models, grading utilities, data loading - can go in common modules that each environment imports. This is the main benefit of keeping related environments in one project: you write the shared code once.

​Goals

​Prerequisites

​Introduction

​Defining multiple environments

​Serving multiple environments

​Selecting a variant from the client

​Organizing your code