- Environments specify tasks, tools, rewards and basic state.
- Sandboxes (optional) give the agent access to a computer in the environment.
What are environments?
An environment is a simulated space where an agent can perform a task using tools and resources, and receive rewards for good or bad actions. In the language of reinforcement learning, the underlying abstraction is a POMDP, which specifies a system that an agent interacts with, the actions that allow the agents to interact with it, and rewards for the agent. On OpenReward, we treat environments as passive FastAPI-style servers that an agent can interact with. This maintains strict separation of concerns between the agent and the environment, allowing for easier development and more robustness to changes in agent harnesses. The standard for how agents talk to environments is called ORS, which you can read about in full detail here. Briefly speaking, an ORS server provides:- Tasks - tasks are the core problems to be solved, including the initial prompts
- Tools - tools are the actions an agent can take in the environment
- Splits - splits organise tasks into groups, e.g. for training and evaluation
- Statefulness - agent actions in a session can affect state
- Tool Results - including tool feedback, rewards and termination signals
What are sandboxes?
A sandbox is an isolated container for running code. They are often used in conjunction with environments when an agent needs access to a computer. For example, a software engineering task would require interacting with a filesystem, writing and executing files, and more. We use sandboxes, rather than the same compute as the environment, to isolate the agent’s actions - and prevent them from interfering with the running of the environment! A sandbox provides:- Isolated execution environment
- Configurable resources (CPU, memory)
- Network isolation options
- Automatic cleanup after use
How environments and sandboxes work together
We have briefly touched upon how these two pillars of infrastructure interact, but we consider a few popular patterns below. Pattern 1: Environment-Only For environments that don’t need code execution:Where is data stored?
Data on OpenReward can come from external sources, for example HuggingFace datasets or public buckets, or you can upload assets to the environment and use those. There are two main uses for data in ORS environments:- Environment data. This is data that powers the underlying environment; for example, tasks that are contained in a parquet or .jsonl, or it could be information the environment but not the agent has access to (e.g. ground truth labels).
- Sandbox data. This is data that the agent has access to for solving the task. For example, in a Kaggle competition environment, the agent might have access to a train and validation dataset on their machine to build models with.
- The hosted environment mounts storage at the location:
/orwd_data/ - Sandboxes can choose the path to mount the storage at
Deployment Flow
OpenReward deploys ORS servers, and we do this via the following flow:Getting Started
Environments
Learn about environment servers
Sandboxes
Explore ephemeral execution containers
Storage
Configure cloud storage access
Build Your First Environment
Deploy your first environment

