
What is ORS?
The Open Reward Standard (ORS) is an open-source standard for connecting language model agents to environments. It specifies how a language model agent can interact with an environment to manipulate its state and obtain results and rewards. In ORS, an environment is a server that a language model agent can interact with. A key assumption in ORS is that actions are tools. The only way the agent can interact with an environment is by calling tools. This allows ORS environments to utilise existing function calling support by model providers. An ORS environment server provides access to:- Tools: the core methods for interacting with an environment, for example a
bashtool or asubmit_solutiontool. - Tasks: the core tasks to be accomplished in an environment, for example math problems in a mathematics environment.
- Splits: different lists of tasks for use cases like training, evaluation and development. These task lists are associated with strings, e.g.
train. - Prompts: instructions to the agent associated with tasks. These are used to prompt the language model at the beginning of a task, for example
Who is the author of the document at /home/ubuntu/the-bitter-lesson.md?.
How does ORS compare to MCP?
The Model Context Protocol (MCP) is a standard for connecting language models to external tools, data sources and workflows. It does not specify all the necessary elements for an agentic environment, including reward and when an episode should be terminated. It also does not contain information on tasks and splits that are useful for training and evaluation. For these reasons, OpenReward uses a new standard (ORS) to standardise how an agent should interact with an environment. Much of the specification is strongly aligned to MCP, such as how tools are listed and called. Tool results differ in that they also surface fields such asreward and finished. Other concepts are not present in MCP at all, such as tasks and splits.
What is the benefit of OpenReward?
There are several core use cases with OpenReward:- Hosting environments - OpenReward provides infrastructure to host environments and a simple way to write them through ORS. Developers can focus on writing environments without worrying about infrastructure.
- Training models - OpenReward has integrations with major training frameworks allowing for low friction training of community or user-defined environments.
- Evaluating models - OpenReward allows for fast evaluation of new and existing models on agentic environments, allowing evaluators to focus on evaluation without worrying about the infrastructure.

