Architecture Overview

What is OpenReward’s Architecture?

OpenReward provides infrastructure for hosting and running AI agent environments. The platform has two main components:

Environment → Sandbox

Environments are persistent servers that host your evaluation tasks, and Sandboxes are temporary containers for running code.

The Two Components

Environments: Persistent Evaluation Servers

An environment is a long-running server that hosts your evaluation: What it provides:

Evaluation server that agents connect to
Tasks for agents to solve
Tools agents can call
Session management for multiple agents
Automatic scaling based concurrent sessions
Isolated storage for datasets and artifacts

Think of it as: A persistent web service (like SWE-Bench) that agents connect to for evaluation.

Sandboxes: Temporary Execution Containers

A sandbox is an isolated container for running code: What it provides:

Isolated execution environment
Configurable resources (CPU, memory)
Network isolation options
Automatic cleanup after use

Think of it as: A secure virtual machine that runs agent code during a task session.

How They Work Together

Common Patterns

Pattern 1: Environment-Only For environments that don’t need code execution:

Agent → Environment Server → Evaluation Results

Example: Math problems where the environment checks the agent answer against a ground truth. Pattern 2: Environment + Sandboxes For environments requiring code execution:

Agent → Environment Server → Creates Sandbox → Executes Code → Returns Results

Example: SWE-Bench where agents write code to accomplish the task. Pattern 3: Direct Sandbox Usage For running code without going through an environment server:

Your Code → Creates Sandbox → Executes Code → Returns Results

Example: Quick code execution or testing.

Data Storage

Environment Storage

Each environment includes isolated cloud storage that is shared between the environment and sandboxes. Access:

Environment server mounts storage at: /orwd_data/
Sandboxes can choose the path to mount the storage at
Data persists across restarts and sandbox runs

Deployment Flow

Deploying an Environment

1. Push code to GitHub
   ↓
2. OpenReward builds container image
   ↓
3. Environment server deployed
   ↓
4. Server automatically scales based on connections
   ↓
5. Environment ready for agents

Creating a Sandbox

1. Request sandbox (specify image, resources)
   ↓
2. Sandbox provisioned
   ↓
3. Run commands or code in sandbox
   ↓
4. Cleanup sandbox on disconnect or timeout

Getting Started

Environments

Learn about environment servers

Sandboxes

Explore ephemeral execution containers

Storage

Configure cloud storage access

Build Your First Environment

Deploy your first environment

Get started

Core Concepts

Using Environments

Making Environments

Deployment

Storage & Data

Architecture Overview

What is OpenReward’s Architecture?

The Two Components

Environments: Persistent Evaluation Servers

Sandboxes: Temporary Execution Containers

How They Work Together

Common Patterns

Data Storage

Environment Storage

Deployment Flow

Deploying an Environment

Creating a Sandbox

Getting Started

Environments

Sandboxes

Storage

Build Your First Environment

Get started

Core Concepts

Using Environments

Making Environments

Deployment

Storage & Data

​What is OpenReward’s Architecture?

​The Two Components

​Environments: Persistent Evaluation Servers

​Sandboxes: Temporary Execution Containers

​How They Work Together

​Common Patterns

​Data Storage

​Environment Storage

​Deployment Flow

​Deploying an Environment

​Creating a Sandbox

​Getting Started

Environments

Sandboxes

Storage

Build Your First Environment

What is OpenReward’s Architecture?

The Two Components

Environments: Persistent Evaluation Servers

Sandboxes: Temporary Execution Containers

How They Work Together

Common Patterns

Data Storage

Environment Storage

Deployment Flow

Deploying an Environment

Creating a Sandbox

Getting Started