Prerequisites
- An OpenReward account
- An OpenReward API key
- An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
Sample an Environment
In this example we’ll call theGeneralReasoning/MLEBench-Train environment on OpenReward and sample an agent trace using the Python SDK.
Choose a model provider below and sample your first environment!
- OpenAI
- Anthropic
- Google
- OpenRouter
- Other Models
1
Set your API keys
Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
Copy
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2
Install the SDK
Copy
pip install openreward
3
Create your code
Save this as
quickstart.py:Copy
from openai import OpenAI
from openreward import OpenReward
import json
or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5"
environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="openai")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
input_list = [{"role": "user", "content": prompt[0].text}]
finished = False
print(input_list)
while not finished:
response = oai_client.responses.create(
model=MODEL_NAME,
tools=tools,
input=input_list
)
print(response.output)
input_list += response.output
for item in response.output:
if item.type == "function_call":
tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))
reward = tool_result.reward
finished = tool_result.finished
input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": json.dumps({
"result": tool_result.blocks[0].text
})
})
print(input_list[-1])
if tool_result.finished:
finished = True
break
4
Run your code
Copy
python quickstart.py
Copy
blablabla
1
Set your API keys
Make sure you have API keys for OpenReward and Anthropic, and set these as environment variables:
Copy
export ANTHROPIC_API_KEY='your-anthropic-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2
Install the SDK
Copy
pip install openreward
3
Create your code
Save this as
quickstart.py:Copy
import anthropic
from openreward import OpenReward
import json
or_client = OpenReward()
ant_client = anthropic.Anthropic()
MODEL_NAME = "claude-sonnet-4-5"
environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="anthropic")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
messages = [{"role": "user", "content": prompt[0].text}]
finished = False
print(messages)
while not finished:
response = ant_client.messages.create(
model=MODEL_NAME,
max_tokens=4096,
tools=tools,
messages=messages
)
print(message)
messages.append(message)
if message.stop_reason == "tool_use":
tool_use = next(block for block in message.content if block.type == "tool_use")
tool_name = tool_use.name
tool_input = tool_use.input
tool_result = session.call_tool(tool_name, tool_input)
reward = tool_result.reward
finished = tool_result.finished
messages.append({
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": tool_result.blocks[0].text
}
]
})
print(messages[-1])
if tool_result.finished:
finished = True
break
4
Run your code
Copy
python quickstart.py
Copy
blablabla
1
Set your API keys
Make sure you have API keys for OpenReward and Gemini, and set these as environment variables:
Copy
export GEMINI_API_KEY='your-gemini-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2
Install the SDK
Copy
pip install openreward
3
Create your code
Save this as
quickstart.py:Copy
from google import genai
from google.genai import types
from openreward import OpenReward
import json
or_client = OpenReward()
gem_client = genai.Client()
MODEL_NAME = "gemini-2.5-flash"
environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="google")
genai_tools = [types.Tool(function_declarations=[f]) for f in tools]
genai_config = types.GenerateContentConfig(tools=genai_tools)
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
contents = [
types.Content(
role="user", parts=[types.Part(text=prompt[0].text)]
)
]
finished = False
print(contents)
while not finished:
response = gem_client.models.generate_content(
model=MODEL_NAME,
config=genai_config,
contents=contents
)
print(response.candidates[0].content)
contents.append(response.candidates[0].content) # Append the content from the model's response.
for part in response.candidates[0].content.parts:
if part.function_call:
tool_call = part.function_call
tool_result = session.call_tool(tool_call.name, tool_call.args)
reward = tool_result.reward
finished = tool_result.finished
function_response_part = types.Part.from_function_response(
name=tool_call.name,
response={"result": json.dumps({
"result": tool_result.blocks[0].text
})},
)
contents.append(types.Content(role="user", parts=[function_response_part])) # Append the function response
print(contents[-1])
if tool_result.finished:
finished = True
break
4
Run your code
Copy
python quickstart.py
Copy
blablabla
1
Set your API keys
Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:
Copy
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2
Install the SDK
Copy
pip install openreward
3
Create your code
Save this as
quickstart.py:Copy
from openai import OpenAI
from openreward import OpenReward
import json
import os
or_client = OpenReward()
oai_client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ.get("OPENROUTER_API_KEY")
)
MODEL_NAME = "deepseek/deepseek-v3.2"
environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="openrouter")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = await session.get_prompt()
input_list = [{"role": "user", "content": prompt[0].text}]
finished = False
print(input_list)
while not finished:
response = oai_client.chat.completions.create(
model=MODEL_NAME,
tools=tools,
messages=input_list
)
print(response)
input_list.append(response)
tool_calls = response.choices[0].message.tool_calls
if not tool_calls:
return
for tool_call in response.choices[0].message.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
tool_result = await session.call_tool(tool_name, tool_args)
reward = tool_result.reward
finished = tool_result.finished
input_list.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps({
"result": tool_result.blocks[0].text
})
})
print(input_list[-1])
if tool_result.finished:
finished = True
break
4
Run your code
Copy
python quickstart.py
Copy
blablabla
If you are running with another provider, or using custom models, then here are the main principles to keep in mind.Using a context manager, we can start a session with the environment. This defines a scope in which
you can use the agent to call tools and get tool results.Above we have selected the first task to sample from, which is a particular problem in the MLEBench-Train environment.
1
Set your API key
Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:
Copy
export OPENROUTER_API_KEY='your-openrouter-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2
Install the SDK
Copy
pip install openreward
3
Get the environment, tools, and tasks
You’ll need to use the OpenReward client to access the environment and its main information
Copy
from openreward import OpenReward
or_client = OpenReward()
environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="openai")
example_task = tasks[0]
4
Start an environment session
Copy
async with environment.session(task=example_task) as session:
5
Pass the prompt and tools list into your model
You can get the prompt for the task as follows:You will also need to pass in the
Copy
prompt = await session.get_prompt()
tools into the context window of your model, usually somewhere in the system prompt.6
Define the core agent loop
An agent will usually keep interacting with an environment until it hits a termination state associated with the environment, or
some other imposed limit (e.g. maximum number of terms).In a simple sequential agent model, this could be a while loop like:where
Copy
while not finished:
finished is a boolean.7
Parse and execute tool calls
In agentic environments, actions are treated as tool calls. That means you need to have a way to parse tool calls from your model’s generations.The key thing to note is that for OpenReward environment, a tool call requires specifying a name (This means that you will need to parse out the
str) and some arguments (dict).To call a tool you will call:Copy
tool_result = await session.call_tool(tool_name, tool_arguments)
tool_name and tool_arguments from your model’s generation and then parse this information into the
call_tool method.8
Parse and execute tool results
If you have executed an available tool correctly, you will receive a
ToolOutput output. This contains attributes for:reward: an (optional)floatdenoting reward. For example, submitting a correct math solution through asubmit_solutiontool might give a reward of1.0.finished: aboolspecifying whether the episode is finished or not. For example, some tools may end the episode (if for example an agent submits a final answer throughsubmit_solutionin a math task).data: adictwith output of the executing of the tool. For example, the stdout of executing abashtool. This information should be passed to the agent as feedback.
ToolOutput is:- Recording
rewardfor use in a policy gradient algorithm such as GRPO or PPO - Breaking out of the core agent loop if
finished=True - Adding
datato the context window as feedback and continuing with the next model generation

