Harness Toolsets

Introduction

Different agent CLIs expose different tool surfaces. In OpenReward, you can expose agent-native tool surfaces server-side, backed by your environment’s sandbox. When a session is created with a toolset, the agent sees the exact tools it expects and every tool call executes against the sandbox.

How Toolsets Compose into Sessions

Harness toolsets are session-scoped: they are passed when creating a session and their tools are merged with the environment’s own tools. The openreward package includes a registry of built-in harness toolsets that can be referenced by name:

from openreward import OpenReward

or_client = OpenReward()
env = or_client.environments.get(name="MyOrg/MyEnv")
tasks = env.list_tasks(split="train")

# Pass a toolset by name — the session exposes agent-native tools
with env.session(task=tasks[0], toolset="claude-code") as session:
    tools = session.list_tools(format="anthropic")
    # tools now include: bash, glob, grep, read, write, edit, todo_write
    # plus any environment-specific tools (e.g. submit_answer)

    result = session.call_tool("bash", {"command": "ls /tmp"})
    print(result.blocks[0].text)

When a session toolset defines a tool with the same name as an environment tool, the toolset tool takes precedence. A warning is logged for each shadowed tool. This lets harness toolsets override environment-level bash, read, etc. with their agent-specific implementations.

Sandbox Requirement

All harness toolsets extend the Toolset base class, which requires the environment to have a self.sandbox attribute. When the toolset is instantiated against an environment, it automatically extracts self.sandbox and uses it for all tool executions. If you’re building an environment that supports harness toolsets, ensure your environment sets up a sandbox:

from openreward import AsyncOpenReward, SandboxSettings
from openreward.environments import Environment, tool, ToolOutput, TextBlock

class MyEnv(Environment):
    def __init__(self, task_spec, secrets):
        super().__init__(task_spec, secrets)

        or_client = AsyncOpenReward(api_key=secrets.get("api_key"))
        self.sandbox = or_client.sandbox(SandboxSettings(
            environment="MyOrg/MyEnv",
            image="generalreasoning/knowledge-worker:latest",
            machine_size="0.5:1",
        ))

    async def setup(self):
        await self.sandbox.start()

    async def teardown(self):
        await self.sandbox.stop()

    # Environment-specific tools
    @tool
    async def submit_answer(self, params) -> ToolOutput:
        ...

When a client creates a session with toolset="claude-code", the ClaudeCodeToolset is instantiated with this environment and gains access to self.sandbox. All its tools (bash, read, write, etc.) execute commands against that sandbox.

Available Harness Toolsets

ClaudeCodeToolset

Name: "claude-code" · Tools: 7 Mirrors the Claude Code CLI’s built-in tool surface. Tool descriptions are sourced from Claude Code’s internal prompts.

Tool	Description
`bash`	Execute shell commands with full bash support
`glob`	Find files matching glob patterns
`grep`	Search file contents with regex (ripgrep-style)
`read`	Read file contents with optional offset/limit pagination
`write`	Write content to a file, creating directories as needed
`edit`	Exact string replacement in files (unique match required unless `replace_all`)
`todo_write`	Manage a todo list for task planning and progress tracking

with env.session(task=task, toolset="claude-code") as session:
    session.call_tool("bash", {"command": "python solve.py"})
    session.call_tool("read", {"file_path": "/home/ubuntu/output.txt"})
    session.call_tool("edit", {
        "file_path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

CodexToolset

Name: "codex" · Tools: 1 Codex prefers a single shell tool over discrete file tools. All file operations are done through bash.

Tool	Description
`bash`	Run shell commands; use for all file operations

with env.session(task=task, toolset="codex") as session:
    session.call_tool("bash", {"command": "cat solve.py"})
    session.call_tool("bash", {"command": "sed -i 's/x = 1/x = 2/' solve.py"})

GeminiCliToolset

Name: "gemini-cli" · Tools: 8 Matches the upstream Gemini CLI tool surface. Uses Gemini-specific tool names like run_shell_command and replace.

Tool	Description
`run_shell_command`	Execute shell commands as `bash -c <command>`
`glob`	Find files matching glob patterns, sorted by modification time
`grep_search`	Search file contents with regex, max 100 matches
`read_file`	Read file contents with optional line range
`write_file`	Write content to a file
`replace`	Find-and-replace in files (unique match required unless `allow_multiple`)
`list_directory`	List files and subdirectories in a path
`write_todos`	Track subtasks for complex queries

with env.session(task=task, toolset="gemini-cli") as session:
    session.call_tool("run_shell_command", {"command": "python solve.py"})
    session.call_tool("read_file", {"file_path": "/home/ubuntu/output.txt"})
    session.call_tool("replace", {
        "file_path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

OpenClawToolset

Name: "openclaw" · Tools: 6 Exposes the OpenClaw coding tool surface, including background process management and structured patch application.

Tool	Description
`exec`	Execute shell commands with configurable timeout (default 1800s)
`process`	Manage background processes: list, poll, log, write stdin, kill, remove
`read`	Read file contents with optional offset/limit pagination
`write`	Write content to a file, creating parent directories
`edit`	Apply targeted text replacements with `oldText`/`newText` pairs
`apply_patch`	Apply structured patches with `* Begin Patch` / `* End Patch` markers

with env.session(task=task, toolset="openclaw") as session:
    session.call_tool("exec", {"command": "python solve.py"})
    session.call_tool("edit", {
        "path": "/home/ubuntu/solve.py",
        "edits": [{"oldText": "x = 1", "newText": "x = 2"}]
    })
    session.call_tool("apply_patch", {
        "input": "*** Begin Patch\n*** Update File: solve.py\n- x = 1\n+ x = 2\n*** End Patch"
    })

HermesToolset

Name: "hermes" · Tools: 5 Exposes the Hermes Agent tool surface from Nous Research. Supports both targeted string replacement and V4A multi-file patches.

Tool	Description
`terminal`	Execute shell commands (default timeout 180s, max 600s)
`read_file`	Read files with `LINE_NUM\|CONTENT` format, pagination via offset/limit
`write_file`	Write content to a file, creating parent directories
`search_files`	Search file contents (`target="content"`) or find files by name (`target="files"`)
`patch`	Replace mode (find-and-replace) or patch mode (V4A multi-file patches)

with env.session(task=task, toolset="hermes") as session:
    session.call_tool("terminal", {"command": "python solve.py"})
    session.call_tool("search_files", {
        "pattern": "def main",
        "path": "/home/ubuntu",
        "file_glob": "*.py"
    })
    session.call_tool("patch", {
        "mode": "replace",
        "path": "/home/ubuntu/solve.py",
        "old_string": "x = 1",
        "new_string": "x = 2"
    })

Tool Comparison

Capability	Claude Code	Codex	Gemini CLI	OpenClaw	Hermes
Shell execution	`bash`	`bash`	`run_shell_command`	`exec`	`terminal`
File read	`read`	via `bash`	`read_file`	`read`	`read_file`
File write	`write`	via `bash`	`write_file`	`write`	`write_file`
File edit	`edit`	via `bash`	`replace`	`edit`	`patch` (replace mode)
File search	`grep`	via `bash`	`grep_search`	via `exec`	`search_files`
File glob	`glob`	via `bash`	`glob`	via `exec`	`search_files` (target=files)
Directory listing	via `bash`	via `bash`	`list_directory`	via `exec`	`search_files` (target=files)
Task planning	`todo_write`	—	`write_todos`	—	—
Process management	—	—	—	`process`	—
Patch application	—	—	—	`apply_patch`	`patch` (patch mode)

Example: Full Environment with Harness Toolset Support

Here is a complete environment that supports harness toolsets. The environment defines its own task-specific tools (submit_answer), while the harness toolset provides the agent’s coding tools:

from openreward import AsyncOpenReward, SandboxSettings, SandboxBucketConfig
from openreward.environments import Environment, Split, tool, ToolOutput, TextBlock
from pydantic import BaseModel

class AnswerParams(BaseModel):
    answer: str

class CodingChallenge(Environment):
    def __init__(self, task_spec, secrets):
        super().__init__(task_spec, secrets)

        or_client = AsyncOpenReward(api_key=secrets.get("api_key"))
        self.sandbox = or_client.sandbox(SandboxSettings(
            environment="MyOrg/CodingChallenge",
            image="generalreasoning/knowledge-worker:latest",
            machine_size="4:16",
            block_network=False,
            bucket_config=SandboxBucketConfig(
                mount_path="/tmp/datasets/",
                read_only=True,
            )
        ))

    async def setup(self):
        await self.sandbox.start()

    async def teardown(self):
        await self.sandbox.stop()

    def get_prompt(self):
        task = self.task_spec
        return [TextBlock(
            text=f"Solve the following coding challenge:\n\n{task['problem']}\n\n"
                 f"Submit your answer using the submit_answer tool."
        )]

    @tool
    async def submit_answer(self, params: AnswerParams) -> ToolOutput:
        """Submit your solution"""
        correct = params.answer.strip() == self.task_spec["expected"]
        return ToolOutput(
            blocks=[TextBlock(text="Correct!" if correct else "Incorrect.")],
            reward=1.0 if correct else 0.0,
            finished=True,
        )

    @classmethod
    def list_tasks(cls, split: str):
        if split == "train":
            return [
                {"id": "1", "problem": "Write a function that reverses a string.", "expected": "..."},
            ]
        return []

    @classmethod
    def list_splits(cls):
        return [Split(name="train", type="train"), Split(name="test", type="test")]

Clients can then evaluate this environment with any harness toolset:

from openreward import OpenReward

or_client = OpenReward()
env = or_client.environments.get(name="MyOrg/CodingChallenge")
tasks = env.list_tasks(split="train")

# Evaluate with Claude Code's tool surface
with env.session(task=tasks[0], toolset="claude-code") as session:
    tools = session.list_tools(format="anthropic")
    # Agent sees: bash, glob, grep, read, write, edit, todo_write, submit_answer

# Evaluate with Codex's tool surface
with env.session(task=tasks[0], toolset="codex") as session:
    tools = session.list_tools(format="openai")
    # Agent sees: bash, submit_answer

# Evaluate with Hermes's tool surface
with env.session(task=tasks[0], toolset="hermes") as session:
    tools = session.list_tools(format="anthropic")
    # Agent sees: terminal, read_file, write_file, search_files, patch, submit_answer

Or with firehorse from the command line:

# Each of these uses the corresponding toolset automatically
firehorse --env MyOrg/CodingChallenge --agent claude-code --model anthropic/claude-sonnet-4-6
firehorse --env MyOrg/CodingChallenge --agent codex --model openai/codex-mini
firehorse --env MyOrg/CodingChallenge --agent react --model openrouter/nousresearch/hermes-3-llama-3.1-405b

Next Steps

Using Toolsets

Learn about document toolsets (PDF, Excel, Word, PowerPoint)

Building Agentic Environments

Create sandbox-based environments from scratch

Harness Quickstart

Run agent harnesses with Firehorse

Sandbox Providers

Choose a sandbox provider for your environments

Get started

Core Concepts

Making Environments

Advanced Environment Topics

Training

Evaluation

Harnesses

Sandbox Providers

Integrations

Rollouts

Deployment

Storage & Data

Introduction

How Toolsets Compose into Sessions

Sandbox Requirement

Available Harness Toolsets

ClaudeCodeToolset

CodexToolset

GeminiCliToolset

OpenClawToolset

HermesToolset

Tool Comparison

Example: Full Environment with Harness Toolset Support

Next Steps

Using Toolsets

Building Agentic Environments

Harness Quickstart

Sandbox Providers

​Introduction

​How Toolsets Compose into Sessions

​Sandbox Requirement

​Available Harness Toolsets

​ClaudeCodeToolset

​CodexToolset

​GeminiCliToolset

​OpenClawToolset

​HermesToolset

​Tool Comparison

​Example: Full Environment with Harness Toolset Support

​Next Steps

Using Toolsets

Building Agentic Environments

Harness Quickstart

Sandbox Providers

Introduction

How Toolsets Compose into Sessions

Sandbox Requirement

Available Harness Toolsets

ClaudeCodeToolset

CodexToolset

GeminiCliToolset

OpenClawToolset

HermesToolset

Tool Comparison

Example: Full Environment with Harness Toolset Support

Next Steps