Skip to main content

What is OpenReward’s Architecture?

OpenReward provides infrastructure for hosting and running AI agent environments. The platform has two main components:
Environment → Sandbox
Environments are persistent servers that host your evaluation tasks, and Sandboxes are temporary containers for running code.

The Two Components

Environments: Persistent Evaluation Servers

An environment is a long-running server that hosts your evaluation: What it provides:
  • Evaluation server that agents connect to
  • Tasks for agents to solve
  • Tools agents can call
  • Session management for multiple agents
  • Automatic scaling based concurrent sessions
  • Isolated storage for datasets and artifacts
Think of it as: A persistent web service (like SWE-Bench) that agents connect to for evaluation.

Sandboxes: Temporary Execution Containers

A sandbox is an isolated container for running code: What it provides:
  • Isolated execution environment
  • Configurable resources (CPU, memory)
  • Network isolation options
  • Automatic cleanup after use
Think of it as: A secure virtual machine that runs agent code during a task session.

How They Work Together

Common Patterns

Pattern 1: Environment-Only For environments that don’t need code execution:
Agent → Environment Server → Evaluation Results
Example: Math problems where the environment checks the agent answer against a ground truth. Pattern 2: Environment + Sandboxes For environments requiring code execution:
Agent → Environment Server → Creates Sandbox → Executes Code → Returns Results
Example: SWE-Bench where agents write code to accomplish the task. Pattern 3: Direct Sandbox Usage For running code without going through an environment server:
Your Code → Creates Sandbox → Executes Code → Returns Results
Example: Quick code execution or testing.

Data Storage

Environment Storage

Each environment includes isolated cloud storage that is shared between the environment and sandboxes. Access:
  • Environment server mounts storage at: /orwd_data/
  • Sandboxes can choose the path to mount the storage at
  • Data persists across restarts and sandbox runs

Deployment Flow

Deploying an Environment

1. Push code to GitHub

2. OpenReward builds container image

3. Environment server deployed

4. Server automatically scales based on connections

5. Environment ready for agents

Creating a Sandbox

1. Request sandbox (specify image, resources)

2. Sandbox provisioned

3. Run commands or code in sandbox

4. Cleanup sandbox on disconnect or timeout

Getting Started