- On-Demand: Created when needed, not pre-provisioned
- Isolated: Strong security boundaries with optional network blocking
- Flexible: Configurable image, resources, and environment variables
Why do environments use sandboxes?
Sandboxes are used in conjunction with environments when an agent needs access to a computer. For example, a software engineering task requires interacting with a filesystem, writing and executing files, and more. We use sandboxes, rather than the same compute as the environment, to isolate the agent’s actions and prevent them from interfering with the running of the environment. In the context of ORS, a sandbox is usually initialised when a new session is created and torn down when the session ends. But the exact use can vary - some environments may only use a sandbox for a particular tool execution instead of the entire session.How to use sandboxes with ORS
ORS environments work with any sandbox provider. We have detailed guides for how to use popular providers, such as:- Using E2B: See E2B Sandboxes
- Using Daytona: See Daytona Sandboxes
- Using Modal: See Modal Sandboxes
OpenReward Sandboxes
OpenReward provides sandbox infrastructure that integrates seamlessly with environment file storage. The sections below show how to create and use OpenReward sandboxes in your environments. Much of the initial environments on the site were built using this solution, so we detail it in full below.Sandbox Lifecycle
1. Creation
OpenReward Sandboxes are tied to an environment workspace on OpenReward which provisions compute. This means it is not a general sandbox compute solution: it is meant to be tied to environment use. An environment workspace is just an environment on the OpenReward platform. For example,GeneralReasoning/CTF has its own workspace. This CTF environment also uses sandboxes as part of the design of the environment.
A typical workflow is as follows:
- You want to create a new ORS environment and host it on OpenReward
- You know your environment needs a sandbox for the agent to access a computer
- To provision compute, you create the environment on OpenReward - e.g.
username/myenvironment. - When developing your code locally, you reference this namespace in your sandbox settings
api_key to the OpenReward client (or AsyncOpenReward client as is recommended), or by setting an environment variable:
setup and teardown methods so the sandbox starts when a new session begins, and ends when it finishes:
2. Command Execution
Once a sandbox is created, you can execute commands within it:run(): Execute command and return outputcheck_run(): Execute command, raise on error if it returns a non-zero exit codeupload(): Upload files to sandboxdownload(): Download files from sandbox
3. Deletion
To delete an OpenReward Sandbox, we use thestop method:
Storage Integration
Cloud Storage Mounts
Sandboxes can access files uploaded to your OpenReward environment. To do this, you’ll need to use theSandboxBucketConfig class. For example:
/workspace is the mount_path on the agent’s computer. In other words, the agent will find the files in that location if it explores the filesystem.
The only_dir argument specifies that we only mount a specific directory from the OpenReward storage for that environment. This is important to ensure agents do not see files we don’t want them to see - e.g. ground truth labels.
See where environment data lives for a conceptual overview, and Storage & Buckets for lower-level configuration.
Other configuration options
Machine sizes
As part of theSandboxSettings, you need to specify a machine size - i.e. the CPU and RAM available to the agent.
You can iterate with these settings, but it’s important to consider a number of things:
- Is the agent dealing with big files?. If the RAM isn’t big enough then the sandbox could suffer from out-of-memory errors (OOMs).
- Does the agent need to conduct expensive workloads?. For example if the agent is training machine learning models, then it might have need for more CPUs (and perhaps GPUs too).
- Am I underelicitating agent performance?. If you don’t give the agents enough resources, you may underelicit the agent’s true capabilities, because it can do less with less compute. This is particularly important for evaluation.
Network Isolation
In some cases you may want to limit network access. By default all egress - all internet access - is allowed, but there may be use cases where this is problematic:- Web acccess could allow the agent to cheat - e.g. by finding the ground truth answers to the dataset.
- Web access could leak future information - for example, some environments rely on backtests on past data, whereas web access would give it access to information from the future.
- Safety concerns from web access - in some scenarios, especially safety testing, we may not want the agent to have access to the internet.
block_network=True:
Sidecar Containers
In some situations, you may want to add additional containers to run alongside your main container. This is possible through sidecars:- Running databases (Redis, PostgreSQL)
- Starting services (web servers, message queues)
- Running monitoring tools
Host Aliases
You can add custom DNS entries to/etc/hosts:
Next Steps
External Sandbox Providers
Use E2B, Daytona, or Modal with your environments
Sandbox API Reference
Complete API documentation for OpenReward sandboxes
Storage & Buckets
Configure cloud storage access in sandboxes
Why Use Sandboxes?
Understand sandbox providers and when to use them

