Skip to main content

Introduction

Different agent CLIs expose different tool surfaces. In OpenReward, you can expose agent-native tool surfaces server-side, backed by your environment’s sandbox. When a session is created with a toolset, the agent sees the exact tools it expects and every tool call executes against the sandbox.

How Toolsets Compose into Sessions

Harness toolsets are session-scoped: they are passed when creating a session and their tools are merged with the environment’s own tools. The openreward package includes a registry of built-in harness toolsets that can be referenced by name:
from openreward import OpenReward

or_client = OpenReward()
env = or_client.environments.get(name="MyOrg/MyEnv")
tasks = env.list_tasks(split="train")

# Pass a toolset by name — the session exposes agent-native tools
with env.session(task=tasks[0], toolset="claude-code") as session:
    tools = session.list_tools(format="anthropic")
    # tools now include: bash, glob, grep, read, write, edit, todo_write
    # plus any environment-specific tools (e.g. submit_answer)

    result = session.call_tool("bash", {"command": "ls /tmp"})
    print(result.blocks[0].text)
When a session toolset defines a tool with the same name as an environment tool, the toolset tool takes precedence. A warning is logged for each shadowed tool. This lets harness toolsets override environment-level bash, read, etc. with their agent-specific implementations.

Sandbox Requirement

All harness toolsets extend the Toolset base class, which requires the environment to have a self.sandbox attribute. When the toolset is instantiated against an environment, it automatically extracts self.sandbox and uses it for all tool executions. If you’re building an environment that supports harness toolsets, ensure your environment sets up a sandbox:
from openreward import AsyncOpenReward, SandboxSettings
from openreward.environments import Environment, tool, ToolOutput, TextBlock

class MyEnv(Environment):
    def __init__(self, task_spec, secrets):
        super().__init__(task_spec, secrets)

        or_client = AsyncOpenReward(api_key=secrets.get("api_key"))
        self.sandbox = or_client.sandbox(SandboxSettings(
            environment="MyOrg/MyEnv",
            image="generalreasoning/knowledge-worker:latest",
            machine_size="0.5:1",
        ))

    async def setup(self):
        await self.sandbox.start()

    async def teardown(self):
        await self.sandbox.stop()

    # Environment-specific tools
    @tool
    async def submit_answer(self, params) -> ToolOutput:
        ...
When a client creates a session with toolset="claude-code", the ClaudeCodeToolset is instantiated with this environment and gains access to self.sandbox. All its tools (bash, read, write, etc.) execute commands against that sandbox.

Available Harness Toolsets

ClaudeCodeToolset

Name: "claude-code" · Tools: 7 Mirrors the Claude Code CLI’s built-in tool surface. Tool descriptions are sourced from Claude Code’s internal prompts.
ToolDescription
bashExecute shell commands with full bash support
globFind files matching glob patterns
grepSearch file contents with regex (ripgrep-style)
readRead file contents with optional offset/limit pagination
writeWrite content to a file, creating directories as needed
editExact string replacement in files (unique match required unless replace_all)
todo_writeManage a todo list for task planning and progress tracking
with env.session(task=task, toolset="claude-code") as session:
    session.call_tool("bash", {"command": "python solve.py"})
    session.call_tool("read", {"file_path": "/home/ubuntu/output.txt"})
    session.call_tool("edit", {
        "file_path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

CodexToolset

Name: "codex" · Tools: 1 Codex prefers a single shell tool over discrete file tools. All file operations are done through bash.
ToolDescription
bashRun shell commands; use for all file operations
with env.session(task=task, toolset="codex") as session:
    session.call_tool("bash", {"command": "cat solve.py"})
    session.call_tool("bash", {"command": "sed -i 's/x = 1/x = 2/' solve.py"})

GeminiCliToolset

Name: "gemini-cli" · Tools: 8 Matches the upstream Gemini CLI tool surface. Uses Gemini-specific tool names like run_shell_command and replace.
ToolDescription
run_shell_commandExecute shell commands as bash -c <command>
globFind files matching glob patterns, sorted by modification time
grep_searchSearch file contents with regex, max 100 matches
read_fileRead file contents with optional line range
write_fileWrite content to a file
replaceFind-and-replace in files (unique match required unless allow_multiple)
list_directoryList files and subdirectories in a path
write_todosTrack subtasks for complex queries
with env.session(task=task, toolset="gemini-cli") as session:
    session.call_tool("run_shell_command", {"command": "python solve.py"})
    session.call_tool("read_file", {"file_path": "/home/ubuntu/output.txt"})
    session.call_tool("replace", {
        "file_path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

OpenClawToolset

Name: "openclaw" · Tools: 6 Exposes the OpenClaw coding tool surface, including background process management and structured patch application.
ToolDescription
execExecute shell commands with configurable timeout (default 1800s)
processManage background processes: list, poll, log, write stdin, kill, remove
readRead file contents with optional offset/limit pagination
writeWrite content to a file, creating parent directories
editApply targeted text replacements with oldText/newText pairs
apply_patchApply structured patches with *** Begin Patch / *** End Patch markers
with env.session(task=task, toolset="openclaw") as session:
    session.call_tool("exec", {"command": "python solve.py"})
    session.call_tool("edit", {
        "path": "/home/ubuntu/solve.py",
        "edits": [{"oldText": "x = 1", "newText": "x = 2"}]
    })
    session.call_tool("apply_patch", {
        "input": "*** Begin Patch\n*** Update File: solve.py\n- x = 1\n+ x = 2\n*** End Patch"
    })

HermesToolset

Name: "hermes" · Tools: 5 Exposes the Hermes Agent tool surface from Nous Research. Supports both targeted string replacement and V4A multi-file patches.
ToolDescription
terminalExecute shell commands (default timeout 180s, max 600s)
read_fileRead files with LINE_NUM|CONTENT format, pagination via offset/limit
write_fileWrite content to a file, creating parent directories
search_filesSearch file contents (target="content") or find files by name (target="files")
patchReplace mode (find-and-replace) or patch mode (V4A multi-file patches)
with env.session(task=task, toolset="hermes") as session:
    session.call_tool("terminal", {"command": "python solve.py"})
    session.call_tool("search_files", {
        "pattern": "def main",
        "path": "/home/ubuntu",
        "file_glob": "*.py"
    })
    session.call_tool("patch", {
        "mode": "replace",
        "path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

Tool Comparison

CapabilityClaude CodeCodexGemini CLIOpenClawHermes
Shell executionbashbashrun_shell_commandexecterminal
File readreadvia bashread_filereadread_file
File writewritevia bashwrite_filewritewrite_file
File editeditvia bashreplaceeditpatch (replace mode)
File searchgrepvia bashgrep_searchvia execsearch_files
File globglobvia bashglobvia execsearch_files (target=files)
Directory listingvia bashvia bashlist_directoryvia execsearch_files (target=files)
Task planningtodo_writewrite_todos
Process managementprocess
Patch applicationapply_patchpatch (patch mode)

Example: Full Environment with Harness Toolset Support

Here is a complete environment that supports harness toolsets. The environment defines its own task-specific tools (submit_answer), while the harness toolset provides the agent’s coding tools:
from openreward import AsyncOpenReward, SandboxSettings, SandboxBucketConfig
from openreward.environments import Environment, Split, tool, ToolOutput, TextBlock
from pydantic import BaseModel

class AnswerParams(BaseModel):
    answer: str

class CodingChallenge(Environment):
    def __init__(self, task_spec, secrets):
        super().__init__(task_spec, secrets)

        or_client = AsyncOpenReward(api_key=secrets.get("api_key"))
        self.sandbox = or_client.sandbox(SandboxSettings(
            environment="MyOrg/CodingChallenge",
            image="generalreasoning/knowledge-worker:latest",
            machine_size="4:16",
            block_network=False,
            bucket_config=SandboxBucketConfig(
                mount_path="/tmp/datasets/",
                read_only=True,
            )
        ))

    async def setup(self):
        await self.sandbox.start()

    async def teardown(self):
        await self.sandbox.stop()

    def get_prompt(self):
        task = self.task_spec
        return [TextBlock(
            text=f"Solve the following coding challenge:\n\n{task['problem']}\n\n"
                 f"Submit your answer using the submit_answer tool."
        )]

    @tool
    async def submit_answer(self, params: AnswerParams) -> ToolOutput:
        """Submit your solution"""
        correct = params.answer.strip() == self.task_spec["expected"]
        return ToolOutput(
            blocks=[TextBlock(text="Correct!" if correct else "Incorrect.")],
            reward=1.0 if correct else 0.0,
            finished=True,
        )

    @classmethod
    def list_tasks(cls, split: str):
        if split == "train":
            return [
                {"id": "1", "problem": "Write a function that reverses a string.", "expected": "..."},
            ]
        return []

    @classmethod
    def list_splits(cls):
        return [Split(name="train", type="train"), Split(name="test", type="test")]
Clients can then evaluate this environment with any harness toolset:
from openreward import OpenReward

or_client = OpenReward()
env = or_client.environments.get(name="MyOrg/CodingChallenge")
tasks = env.list_tasks(split="train")

# Evaluate with Claude Code's tool surface
with env.session(task=tasks[0], toolset="claude-code") as session:
    tools = session.list_tools(format="anthropic")
    # Agent sees: bash, glob, grep, read, write, edit, todo_write, submit_answer

# Evaluate with Codex's tool surface
with env.session(task=tasks[0], toolset="codex") as session:
    tools = session.list_tools(format="openai")
    # Agent sees: bash, submit_answer

# Evaluate with Hermes's tool surface
with env.session(task=tasks[0], toolset="hermes") as session:
    tools = session.list_tools(format="anthropic")
    # Agent sees: terminal, read_file, write_file, search_files, patch, submit_answer
Or with firehorse from the command line:
# Each of these uses the corresponding toolset automatically
firehorse --env MyOrg/CodingChallenge --agent claude-code --model anthropic/claude-sonnet-4-6
firehorse --env MyOrg/CodingChallenge --agent codex --model openai/codex-mini
firehorse --env MyOrg/CodingChallenge --agent react --model openrouter/nousresearch/hermes-3-llama-3.1-405b

Next Steps

Using Toolsets

Learn about document toolsets (PDF, Excel, Word, PowerPoint)

Building Agentic Environments

Create sandbox-based environments from scratch

Harness Quickstart

Run agent harnesses with Firehorse

Sandbox Providers

Choose a sandbox provider for your environments