Skip to main content
A sandbox is an isolated container for executing arbitrary code. Sandboxes provide secure, temporary compute resources for running untrusted agent code in an ORS environment. Key characteristics:
  • On-Demand: Created when needed, not pre-provisioned
  • Isolated: Strong security boundaries with optional network blocking
  • Flexible: Configurable image, resources, and environment variables

Why do environments use sandboxes?

Sandboxes are used in conjunction with environments when an agent needs access to a computer. For example, a software engineering task requires interacting with a filesystem, writing and executing files, and more. We use sandboxes, rather than the same compute as the environment, to isolate the agent’s actions and prevent them from interfering with the running of the environment. In the context of ORS, a sandbox is usually initialised when a new session is created and torn down when the session ends. But the exact use can vary - some environments may only use a sandbox for a particular tool execution instead of the entire session.

How to use sandboxes with ORS

ORS environments work with any sandbox provider. We have detailed guides for how to use popular providers, such as: OpenReward also provides an internal sandbox solution - OpenReward Sandboxes - which we detail in the rest of this documentation.

OpenReward Sandboxes

OpenReward provides sandbox infrastructure that integrates seamlessly with environment file storage. The sections below show how to create and use OpenReward sandboxes in your environments. Much of the initial environments on the site were built using this solution, so we detail it in full below.

Sandbox Lifecycle

1. Creation

OpenReward Sandboxes are tied to an environment workspace on OpenReward which provisions compute. This means it is not a general sandbox compute solution: it is meant to be tied to environment use. An environment workspace is just an environment on the OpenReward platform. For example, GeneralReasoning/CTF has its own workspace. This CTF environment also uses sandboxes as part of the design of the environment. A typical workflow is as follows:
  1. You want to create a new ORS environment and host it on OpenReward
  2. You know your environment needs a sandbox for the agent to access a computer
  3. To provision compute, you create the environment on OpenReward - e.g. username/myenvironment.
  4. When developing your code locally, you reference this namespace in your sandbox settings
Note this happens before you push any code to GitHub. Your code is still local at this point in time. The key point is you still need to set up the environment workspace on OpenReward to get access to the sandbox compute for local development. In the Python SDK, the Sandbox API looks as follows:
from openreward import AsyncOpenReward, SandboxSettings

client = AsyncOpenReward(api_key="your-api-key")

settings = SandboxSettings(
    environment="username/environment-name",  # Required: your environment namespace
    image="python:3.11-slim",                 # Required: container image
    machine_size="1:2",                       # Required: CPU:Memory
    env={"DEBUG": "true"},                    # Optional: environment variables
    block_network=False,                      # Optional: network isolation
)

sandbox = client.sandbox(
    settings=settings,
)

await sandbox.start()  # Creates the sandbox
You are then billed based on the time spent in the sandbox. Note that when you host an environment, you do not pay the sandbox costs for someone using your environment; this will be billed to their account. However, if you are calling that hosted environment yourself - or you are developing with a sandbox locally - then it will be billed to you. Billing works on the basis of your API key. You will need to set this; either by passing in api_key to the OpenReward client (or AsyncOpenReward client as is recommended), or by setting an environment variable:
export OPENREWARD_API_KEY='your-api-key'
In the context of an ORS environment, a popular design pattern is as follows. First, we initialise the sandbox within the init of the Environment:
from openreward.environments import Environment
from openreward import SandboxSettings, SandboxBucketConfig, AsyncOpenReward

...

class MyEnvironment(Environment):
 def __init__(self, task_spec: JSONObject, secrets: dict[str, str] = {}) -> None:
      super().__init__(task_spec, secrets=secrets)

      self.sandbox_settings = SandboxSettings(
          environment="Username/MyEnvironment",
          image="generalreasoning/python-ds:3.12-tools",
          machine_size="2:2",
          block_network=False,
          bucket_config=SandboxBucketConfig(
              mount_path="/tmp/datasets",
              read_only=True,
              only_dir="agent_data",
          )
      )
      or_client = AsyncOpenReward(api_key=secrets.get("api_key", ""))
      self.sandbox = or_client.sandbox(self.sandbox_settings)
Then we define setup and teardown methods so the sandbox starts when a new session begins, and ends when it finishes:
async def setup(self) -> None:
    await self.sandbox.start()
async def teardown(self) -> None:
    await self.sandbox.stop()
This ties the sandbox’s lifetime to that of the agent’s session. This pattern is extremely common and will cover the majority of usecases with ORS environments. However, there may be exceptions. For example, if the sandbox compute is expensive and is only needed with specific tools, you may want to design the sandbox lifetime so it only exists during the invocation of that specific tool. A specific example would be an environment that needs GPU access - e.g. evaluating a GPU kernel in a kernel optimisation environment would not need the GPU to be available while, e.g. the agent was reasoning or writing code.

2. Command Execution

Once a sandbox is created, you can execute commands within it:
# Run a command
output, exit_code = await sandbox.run("python script.py")
if exit_code == 0:
    print(f"Success: {output}")

# Run with check (raises on non-zero exit)
output = await sandbox.check_run("pytest tests/")

# Upload file
await sandbox.upload(
    local_path="./data.csv",
    container_path="/app/input.csv"
)

# Download file
content = await sandbox.download("/app/output.json")
Available operations:
  • run(): Execute command and return output
  • check_run(): Execute command, raise on error if it returns a non-zero exit code
  • upload(): Upload files to sandbox
  • download(): Download files from sandbox
Usually when developing an environment, you will use these operations inside a tool method for your environment. For example, here is a bash tool that uses these methods:
@tool
async def bash(self, params: BashParams) -> ToolOutput:
    """Execute bash commands using the computer instance."""
    try:
        output, code = await self.sandbox.run(params.command.strip())
        
        return ToolOutput(
            blocks=[TextBlock(text=f"{output}\n\n(exit {code})")],
            metadata={"output": output, "exit_code": code},
            reward=0.0,
            finished=False,
        )
    except Exception as e:
        return ToolOutput(
            metadata={"error": str(e)},
            blocks=[TextBlock(text=f"Error executing command: {str(e)}")],
            finished=False
        )

3. Deletion

To delete an OpenReward Sandbox, we use the stop method:
await sandbox.stop()
For example, this is often used as part of a teardown method in an ORS environment:
async def teardown(self) -> None:
    await self.sandbox.stop()
which will terminate the sandbox as soon as the session ends.

Storage Integration

Cloud Storage Mounts

Sandboxes can access files uploaded to your OpenReward environment. To do this, you’ll need to use the SandboxBucketConfig class. For example:
from openreward import SandboxSettings, SandboxBucketConfig

settings = SandboxSettings(
    environment="username/env-name",
    image="python:3.11-slim",
    machine_size="1:2",
    bucket_config=SandboxBucketConfig(
        mount_path="/workspace",           # Where to mount in container
        only_dir="datasets/subset",        # Optional: mount only subdirectory
    )
)
Here the /workspace is the mount_path on the agent’s computer. In other words, the agent will find the files in that location if it explores the filesystem. The only_dir argument specifies that we only mount a specific directory from the OpenReward storage for that environment. This is important to ensure agents do not see files we don’t want them to see - e.g. ground truth labels. See where environment data lives for a conceptual overview, and Storage & Buckets for lower-level configuration.

Other configuration options

Machine sizes

As part of the SandboxSettings, you need to specify a machine size - i.e. the CPU and RAM available to the agent. You can iterate with these settings, but it’s important to consider a number of things:
  1. Is the agent dealing with big files?. If the RAM isn’t big enough then the sandbox could suffer from out-of-memory errors (OOMs).
  2. Does the agent need to conduct expensive workloads?. For example if the agent is training machine learning models, then it might have need for more CPUs (and perhaps GPUs too).
  3. Am I underelicitating agent performance?. If you don’t give the agents enough resources, you may underelicit the agent’s true capabilities, because it can do less with less compute. This is particularly important for evaluation.
A list of different machine configurations is provided below:
machine_size="CPU:Memory"

# Available sizes:
"0.5:0.5"   # Light balanced — simple services, low traffic
"1:1"       # Baseline balanced — small production services
"2:2"       # Solid balanced — steady traffic, moderate load
"4:4"       # Heavy balanced — CPU-bound or busy services

"0.5:1"     # Memory-biased light — small services with caching
"1:2"       # Memory-biased baseline — typical web apps
"2:4"       # Memory-biased heavy — APIs, JVMs, Python services
"4:8"       # Memory-heavy — large caches, in-memory workloads

"0.5:2"     # Low CPU, high memory — background workers, queues
"1:4"       # High memory — analytics, batch jobs
"2:8"       # Very high memory — data processing, embeddings
"4:16"      # Maxed memory — serious data or model workloads

"nvidia-l4" # NVIDIA L4 GPU

Network Isolation

In some cases you may want to limit network access. By default all egress - all internet access - is allowed, but there may be use cases where this is problematic:
  1. Web acccess could allow the agent to cheat - e.g. by finding the ground truth answers to the dataset.
  2. Web access could leak future information - for example, some environments rely on backtests on past data, whereas web access would give it access to information from the future.
  3. Safety concerns from web access - in some scenarios, especially safety testing, we may not want the agent to have access to the internet.
To enable network isolation, set block_network=True:
settings = SandboxSettings(
    environment="username/env",
    image="python:3.11",
    machine_size="1:2",
    block_network=True
)

Sidecar Containers

In some situations, you may want to add additional containers to run alongside your main container. This is possible through sidecars:
from openreward import SandboxSidecarContainer

settings = SandboxSettings(
    environment="username/env",
    image="python:3.11",
    machine_size="1:2",
    sidecars=[
        SandboxSidecarContainer(
            name="redis",
            image="redis:7-alpine",
            env={"REDIS_PASSWORD": "secret"},
            command=["redis-server"],
            args=["--maxmemory", "256mb"],
            ports=[6379]
        )
    ]
)
Use cases:
  • Running databases (Redis, PostgreSQL)
  • Starting services (web servers, message queues)
  • Running monitoring tools

Host Aliases

You can add custom DNS entries to /etc/hosts:
from openreward import SandboxHostAlias

settings = SandboxSettings(
    environment="username/env",
    image="python:3.11",
    machine_size="1:2",
    host_aliases=[
        SandboxHostAlias(
            ip="127.0.0.1",
            hostnames=["myapp.local", "api.local"]
        )
    ]
)

Next Steps

External Sandbox Providers

Use E2B, Daytona, or Modal with your environments

Sandbox API Reference

Complete API documentation for OpenReward sandboxes

Storage & Buckets

Configure cloud storage access in sandboxes

Why Use Sandboxes?

Understand sandbox providers and when to use them