Goals
- Make a mathematics environment using the OpenReward library.
- Deploy the environment to OpenReward.
- Sample from the environment using a model of your choice.
Prerequisites
- An OpenReward account
- An OpenReward API key
- An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
Setup
Environments in OpenReward are written using ORS. ORS is implemented in the OpenReward Python library, and we will use it for this tutorial. You can install the library using pip or uv:Background: GSM8K
GSM8K is a classic language model dataset for math word problems released by OpenAI in 2021. Problems are at the grade school level and answers are integers. An example problem and answer from this dataset is shown below:Dean’s mother gave him $28 to go to the toy store. Dean bought 6 toy cars and 5 teddy bears. Each toy car cost $2 and each teddy bear cost $1. His mother then feels generous and decides to give him an extra $10. How much money does Dean have left?
Answer: 21We will learn how to build an ORS environment server for GSM8K in this tutorial.
Understanding ORS servers
To begin we’ll initialise our GSM8K project with abasic template and investigate how ORS servers work.
Initialise a project using the OpenReward cli:
server.py, Dockerfile and requirements.txt.
If you look inside server.py, you can see a BasicEnvironment is defined.
To run this ORS server, install the requirements and run server.py:
8080. We will leave this running.
Now let’s see how we can interact with the BasicEnvironment. In a different terminal, write the following file to test_environment.py:
BasicEnvironment, we will pass in the name basicenvironment. We’ve
also passed in localhost base_url since we are running locally.
This is the same API that we use to get production environments on OpenReward. The difference is that we have not passed in a namespace (e.g. OpenAI/SimpleQA) and we are pointing to a local base_url.
Now let’s test interacting with this ORS server.
Environments have splits, which are lists of tasks for different purposes such as training and evaluation. To see the available splits on the test_environment.py, add the following to test_environment.py and run:
train split.
Task object specifies the task specification, which is one of the primitives of ORS.
Next, let us see the available tools in the environment. In ORS, actions are tools, and executing tools is the only way to interact with the environment.
answer. Note this tool specification is the same as a tool specification in MCP, allowing compatibility with existing model function calling capabilities.
Now let’s test calling the answer tool. Write the following script test_tool.py:
prompt contains a TextBlock with the prompt text for this task.
After calling the tool on the session with call_tool - with the tool name answer and tool arguments {"answer": "4"} - we obtain a ToolOutput.
The ToolOutput also contains a list of blocks, in this case a TextBlock showing us some text for the agent (Correct!). We also obtain a reward of 1.0, as well as an finished state of True, denoting that the episode has finished.
This shows us the basics of how we can interface with an ORS environment server. In the next section we will build a GSM8K environment.
Building the GSM8K environment
First we’ll download the twoparquet files from the GSM8K HuggingFace repository and put them in the root of our project:
- Loading the tasks from the
parquetfiles - Verifying the answer is correct - we’ll use the MathVerify library for this.
- OpenAI
- Anthropic
- Google
- OpenRouter
Set your API key
Make sure you have an API key for OpenAI, and set the environment variable:
- Infrastructure: you do not have to set up infrastructure and compute to host the environment yourself. We take care of this and you are only charged based on your actual usage of the environment.
- Discovery: your environment can be discovered and used by other users of the platform, helping drive adoption and attention to your work.
Host on OpenReward
Log into OpenReward, press the plus icon in the navbar and press New Environment:


Upload environment files
We will need a way to use the train and test parquet files in our environment. We’ll upload these to the environment files: Click on the Files tab and upload each file:
/orwd_data directory.
We’ll need to reference this folder in our server.py. Make the following change:
/orwd_data prefix).
Write the Dockerfile and requirements
We’ll need aDockerfile in our repository:
requirements.txt:
Push to GitHub and connect
Next, push your environment code to a GitHub repository. Once your GitHub repository is ready, go to your OpenReward environment and connect the repository:


Sample from your environment
Now your environment is hosted on OpenReward, we can sample from it:- OpenAI
- Anthropic
- Google
- OpenRouter
- Other Models
Set your API keys
Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:

