Different agent CLIs expose different tool surfaces. In OpenReward, you can expose agent-native tool surfaces server-side, backed by your environment’s sandbox. When a session is created with a toolset, the agent sees the exact tools it expects and every tool call executes against the sandbox.
Harness toolsets are session-scoped: they are passed when creating a session and their tools are merged with the environment’s own tools. The openreward package includes a registry of built-in harness toolsets that can be referenced by name:
from openreward import OpenRewardor_client = OpenReward()env = or_client.environments.get(name="MyOrg/MyEnv")tasks = env.list_tasks(split="train")# Pass a toolset by name — the session exposes agent-native toolswith env.session(task=tasks[0], toolset="claude-code") as session: tools = session.list_tools(format="anthropic") # tools now include: bash, glob, grep, read, write, edit, todo_write # plus any environment-specific tools (e.g. submit_answer) result = session.call_tool("bash", {"command": "ls /tmp"}) print(result.blocks[0].text)
When a session toolset defines a tool with the same name as an environment tool, the toolset tool takes precedence. A warning is logged for each shadowed tool. This lets harness toolsets override environment-level bash, read, etc. with their agent-specific implementations.
All harness toolsets extend the Toolset base class, which requires the environment to have a self.sandbox attribute. When the toolset is instantiated against an environment, it automatically extracts self.sandbox and uses it for all tool executions.If you’re building an environment that supports harness toolsets, ensure your environment sets up a sandbox:
When a client creates a session with toolset="claude-code", the ClaudeCodeToolset is instantiated with this environment and gains access to self.sandbox. All its tools (bash, read, write, etc.) execute commands against that sandbox.
Name:"hermes" · Tools: 5Exposes the Hermes Agent tool surface from Nous Research. Supports both targeted string replacement and V4A multi-file patches.
Tool
Description
terminal
Execute shell commands (default timeout 180s, max 600s)
read_file
Read files with LINE_NUM|CONTENT format, pagination via offset/limit
write_file
Write content to a file, creating parent directories
search_files
Search file contents (target="content") or find files by name (target="files")
patch
Replace mode (find-and-replace) or patch mode (V4A multi-file patches)
Example: Full Environment with Harness Toolset Support
Here is a complete environment that supports harness toolsets. The environment defines its own task-specific tools (submit_answer), while the harness toolset provides the agent’s coding tools:
from openreward import AsyncOpenReward, SandboxSettings, SandboxBucketConfigfrom openreward.environments import Environment, Split, tool, ToolOutput, TextBlockfrom pydantic import BaseModelclass AnswerParams(BaseModel): answer: strclass CodingChallenge(Environment): def __init__(self, task_spec, secrets): super().__init__(task_spec, secrets) or_client = AsyncOpenReward(api_key=secrets.get("api_key")) self.sandbox = or_client.sandbox(SandboxSettings( environment="MyOrg/CodingChallenge", image="generalreasoning/knowledge-worker:latest", machine_size="4:16", block_network=False, bucket_config=SandboxBucketConfig( mount_path="/tmp/datasets/", read_only=True, ) )) async def setup(self): await self.sandbox.start() async def teardown(self): await self.sandbox.stop() def get_prompt(self): task = self.task_spec return [TextBlock( text=f"Solve the following coding challenge:\n\n{task['problem']}\n\n" f"Submit your answer using the submit_answer tool." )] @tool async def submit_answer(self, params: AnswerParams) -> ToolOutput: """Submit your solution""" correct = params.answer.strip() == self.task_spec["expected"] return ToolOutput( blocks=[TextBlock(text="Correct!" if correct else "Incorrect.")], reward=1.0 if correct else 0.0, finished=True, ) @classmethod def list_tasks(cls, split: str): if split == "train": return [ {"id": "1", "problem": "Write a function that reverses a string.", "expected": "..."}, ] return [] @classmethod def list_splits(cls): return [Split(name="train", type="train"), Split(name="test", type="test")]
Clients can then evaluate this environment with any harness toolset:
from openreward import OpenRewardor_client = OpenReward()env = or_client.environments.get(name="MyOrg/CodingChallenge")tasks = env.list_tasks(split="train")# Evaluate with Claude Code's tool surfacewith env.session(task=tasks[0], toolset="claude-code") as session: tools = session.list_tools(format="anthropic") # Agent sees: bash, glob, grep, read, write, edit, todo_write, submit_answer# Evaluate with Codex's tool surfacewith env.session(task=tasks[0], toolset="codex") as session: tools = session.list_tools(format="openai") # Agent sees: bash, submit_answer# Evaluate with Hermes's tool surfacewith env.session(task=tasks[0], toolset="hermes") as session: tools = session.list_tools(format="anthropic") # Agent sees: terminal, read_file, write_file, search_files, patch, submit_answer