What is OpenReward?

OpenReward is a platform for hosting environments to train and evaluate language model agents. It is built on Open Reward Standard (ORS) which specifies a standard interface by which language models can interact with an environment. An environment is a simulated space where an agent can perform a task using tools and resources. For example, a Kaggle environment gives an agent tools to interact with a computer, data to build models, and tools to submit a solution. Environments are used to train agents with reinforcement learning. Actions in an environment yield states with rewards; for example, submitting a correct solution to a mathematics problem may yield a positive reward. Environments are also used to evaluate agents. For example, to see how well a language model agent does in a company workflow task, you can make an environment to simulate it and test the agent.

What is ORS?

The Open Reward Standard (ORS) is an open-source standard for connecting language model agents to environments. It specifies how a language model agent can interact with an environment to manipulate its state and obtain results and rewards. In ORS, an environment is a server that a language model agent can interact with. A key assumption in ORS is that actions are tools. The only way the agent can interact with an environment is by calling tools. This allows ORS environments to utilise existing function calling support by model providers. An ORS environment server provides access to:

Tools: the core methods for interacting with an environment, for example a bash tool or a submit_solution tool.
Tasks: the core tasks to be accomplished in an environment, for example math problems in a mathematics environment.
Splits: different lists of tasks for use cases like training, evaluation and development. These task lists are associated with strings, e.g. train.
Prompts: instructions to the agent associated with tasks. These are used to prompt the language model at the beginning of a task, for example Who is the author of the document at /home/ubuntu/the-bitter-lesson.md?.

How does ORS compare to MCP?

The Model Context Protocol (MCP) is a standard for connecting language models to external tools, data sources and workflows. It does not specify all the necessary elements for an agentic environment, including reward and when an episode should be terminated. It also does not contain information on tasks and splits that are useful for training and evaluation. For these reasons, OpenReward uses a new standard (ORS) to standardise how an agent should interact with an environment. Much of the specification is strongly aligned to MCP, such as how tools are listed and called. Tool results differ in that they also surface fields such as reward and finished. Other concepts are not present in MCP at all, such as tasks and splits.

What is the benefit of OpenReward?

There are several core use cases with OpenReward:

Hosting environments - OpenReward provides infrastructure to host environments and a simple way to write them through ORS. Developers can focus on writing environments without worrying about infrastructure.
Training models - OpenReward has integrations with major training frameworks allowing for low friction training of community or user-defined environments.
Evaluating models - OpenReward allows for fast evaluation of new and existing models on agentic environments, allowing evaluators to focus on evaluation without worrying about the infrastructure.

Start Building

Get started

Sample from an environment in minutes

Build a simple environment

Learn how to run evals with OpenReward.

Train with environments

Learn how to use OpenReward for reinforcement learning

Get started

Using Environments

Making Environments

What is ORS?

How does ORS compare to MCP?

What is the benefit of OpenReward?

Start Building

Get started

Build a simple environment

Train with environments

Get started

Using Environments

Making Environments

​What is ORS?

​How does ORS compare to MCP?

​What is the benefit of OpenReward?

​Start Building

Get started

Build a simple environment

Train with environments

What is ORS?

How does ORS compare to MCP?

What is the benefit of OpenReward?

Start Building