Goals
- Make a mathematics environment using the OpenReward library.
- Deploy the environment to OpenReward.
- Sample from the environment using a model of your choice.
Prerequisites
- An OpenReward account
- An OpenReward API key
- An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
Setup
Environments in OpenReward are written using ORS. ORS is implemented in the OpenReward Python library, and we will use it for this tutorial. You can install the library using pip or uv:Background: GSM8K
GSM8K is a classic language model dataset for math word problems released by OpenAI in 2021. Problems are at the grade school level and answers are integers. An example problem and answer from this dataset is shown below:Dean’s mother gave him $28 to go to the toy store. Dean bought 6 toy cars and 5 teddy bears. Each toy car cost $2 and each teddy bear cost $1. His mother then feels generous and decides to give him an extra $10. How much money does Dean have left?
Answer: 21We will learn how to build an ORS environment server for GSM8K in this tutorial.
Understanding ORS servers
To begin we’ll initialise our GSM8K project with abasic template and investigate how ORS servers work.
Initialise a project using the OpenReward cli:
server.py, Dockerfile and requirements.txt.
If you look inside server.py, you can see a BasicEnvironment is defined.
To run this ORS server, install the requirements and run server.py:
8080. We will leave this running.
Now let’s see how we can interact with the BasicEnvironment. In a different terminal, write the following file to test_environment.py:
BasicEnvironment, we will pass in the name basicenvironment.
This is the same API that we use to get production environments on OpenReward. But in this case we have not passed in a namespace (e.g. OpenAI/SimpleQA), so it defaults to using localhost instead.
If your server is running in a different location, you can specify a base_url if you want more control. For example:
test_environment.py, add the following to test_environment.py and run:
train split.
Task object specifies the task specification, which is one of the primitives of ORS.
Next, let us see the available tools in the environment. In ORS, actions are tools, and executing tools is the only way to interact with the environment.
answer. Note this tool specification is the same as a tool specification in MCP, allowing compatibility with existing model function calling capabilities.
Now let’s test calling the answer tool. Write the following script test_tool.py:
prompt contains a TextBlock with the prompt text for this task.
After calling the tool on the session with call_tool - with the tool name answer and tool arguments {"answer": "4"} - we obtain a ToolOutput.
The ToolOutput also contains a list of blocks, in this case a TextBlock showing us some text for the agent (Correct!). We also obtain a reward of 1.0, as well as an finished state of True, denoting that the episode has finished.
This shows us the basics of how we can interface with an ORS environment server. In the next section we will build a GSM8K environment.
Building the GSM8K environment
First we’ll download the twoparquet files from the GSM8K HuggingFace repository and put them in the root of our project:
- Loading the tasks from the
parquetfiles - Verifying the answer is correct - we’ll use the MathVerify library for this.
- OpenAI
- Anthropic
- Google
- OpenRouter
1
Set your API key
Make sure you have an API key for OpenAI, and set the environment variable:
2
Create your code
Save this as
sample_agent.py:3
Run your code
- Infrastructure sorted: you do not have to set up infrastructure and compute to host the environment yourself. We take care of this for you and you are only charged based on your actual usage of the environment.
- Discovery supercharged: your environment can be discovered and used by other users of the platform, helping drive adoption and attention to your work.
Host on OpenReward
Log into OpenReward, press the plus icon in the navbar and press New Environment:


Upload environment assets
We will need a way to use the train and test parquet files in our environment. We’ll upload these to the environment assets: Click on the Assets tab and press Upload Environment Assets: [IMAGE HERE] You can then upload these two files: Which will give you two URLs you can use for developing your environment locally:Write the Dockerfile and requirements
We now need to amend the Dockerfile to download the parquet files:requirements.txt:
Push to GitHub and connect
Next, push your environment to a GitHub repository. Once ready, go to your OpenReward environment and connect the repository: [IMAGE HERE] You will be given a choice for how much compute you would like to allocate for it. We’ll use a low compute configuration since this is a simple environment. Press build and your environment will be hosted!Sample from your environment
Now your environment is hosted on OpenReward, we can sample from it:- OpenAI
- Anthropic
- Google
- OpenRouter
- Other Models
1
Set your API keys
Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
2
Create your code
Save this as
quickstart.py:3
Run your code

