Prerequisites
- An OpenReward account
- An OpenReward API key
- An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
Sample an Environment
In this example we’ll call theGeneralReasoning/CTF environment on OpenReward and sample an agent trace using the Python SDK.
Choose a model provider below and sample your first environment!
- OpenAI
- Anthropic
- Google
- OpenRouter
- Other Models
Set your API keys
Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
Copy
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
Create your code
Save this as
quickstart.py:Copy
from openai import OpenAI
from openreward import OpenReward
import json
or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5.2"
environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
input_list = [{"role": "user", "content": prompt[0].text}]
finished = False
print(input_list)
while not finished:
response = oai_client.responses.create(
model=MODEL_NAME,
tools=tools,
input=input_list
)
print(response.output)
input_list += response.output
for item in response.output:
if item.type == "function_call":
tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))
reward = tool_result.reward
finished = tool_result.finished
input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": json.dumps({
"result": tool_result.blocks[0].text
})
})
print(input_list[-1])
if tool_result.finished:
finished = True
break
Run your code
Copy
python quickstart.py
Copy
{'role': 'user', 'content': "You are solving a CTF (Capture The Flag) challenge.\n\nChallenge: pancakes\n\nYou ever just get a craving for pancakes? > nc chal.tuctf.com 30503 [pancakes](pancakes) ', 'A' * 44 + p32(e.plt['puts']) + 'AAAA' + p32(e.symbols['password'])) print p.recvall()\n\nYour goal is to find the flag and submit it using the submit_answer tool.\n\nIf challenge files are available, then they have been uploaded to your environment in /tmp/gr-datasets folder.\n\nSometimes you will be given a link to download from or access.\n\nYou should work from the /home/ubuntu directory.\n\nGood luck!"}]
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6de55018819da22725bb44fba022', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"path":"/tmp/gr-datasets","show_hidden":false,"recursive":false}', call_id='call_VYpjo87wSocwCsrEc7QOvAY7', name='list_files', type='function_call', id='fc_0f6e802a28329b2f00696b6de68780819d85967ff172cae065', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_VYpjo87wSocwCsrEc7QOvAY7', 'output': '{"result": "total 16\\n-rw-r--r-- 1 root root 15696 Jan 15 10:07 pancakes"}'}
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6de7aa04819d85f6f667d4a1fcbb', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"command":"cp /tmp/gr-datasets/pancakes /home/ubuntu/ && cd /home/ubuntu && ls -l pancakes && file pancakes && sha256sum pancakes"}', call_id='call_zKDEIePRL8nYrh25TMgN2Gvo', name='bash', type='function_call', id='fc_0f6e802a28329b2f00696b6de86d3c819d9122aaaa02bc1a6f', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_zKDEIePRL8nYrh25TMgN2Gvo', 'output': '{"result": "-rw-r--r-- 1 root root 15696 Jan 17 11:09 pancakes\\npancakes: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=a653a608db5ab4716ca7b1e891ee3b460e097eb8, for GNU/Linux 3.2.0, not stripped\\nbd470b542b111437343079807073089ab024de3d89c45ac0926c6a9ed8c36a4a pancakes\\n\\n(exit 0)"}'}
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6dea067c819d9661f75dec1ea93a', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"command":"cd /home/ubuntu && ./pancakes << EOF\\nEOF"}', call_id='call_XXITZmGDUiZYOlMfuK4kmEDl', name='bash', type='function_call', id='fc_0f6e802a28329b2f00696b6dea46d0819d8ff38d87410d1962', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_XXITZmGDUiZYOlMfuK4kmEDl', 'output': '{"result": "/bin/bash: line 2: ./pancakes: Permission denied\\n\\n(exit 126)"}'}
...
Set your API keys
Make sure you have API keys for OpenReward and Anthropic, and set these as environment variables:
Copy
export ANTHROPIC_API_KEY='your-anthropic-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
Create your code
Save this as
quickstart.py:Copy
import anthropic
from openreward import OpenReward
import json
or_client = OpenReward()
ant_client = anthropic.Anthropic()
MODEL_NAME = "claude-sonnet-4-5"
environment = or_client.environments.get(name="GeneralReasoning/KellyBench")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="anthropic")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
messages = [{"role": "user", "content": prompt[0].text}]
finished = False
print(messages)
while not finished:
message = ant_client.messages.create(
model=MODEL_NAME,
max_tokens=4096,
tools=tools,
messages=messages
)
messages.append({
"role": "assistant",
"content": message.content,
})
print(messages[-1])
if message.stop_reason == "tool_use":
tool_uses = [b for b in message.content if getattr(b, "type", None) == "tool_use"]
if not tool_uses:
raise RuntimeError("stop_reason was tool_use but no tool_use blocks found")
tool_result_blocks = []
finished = False
for tu in tool_uses:
tool_name = tu.name
tool_input = tu.input
try:
tr = session.call_tool(tool_name, tool_input)
# Convert OpenReward blocks -> string safely
text_parts = []
for b in getattr(tr, "blocks", []) or []:
t = getattr(b, "text", None)
if t is not None:
text_parts.append(t)
else:
text_parts.append(str(b))
tool_text = "".join(text_parts)
tool_result_blocks.append({
"type": "tool_result",
"tool_use_id": tu.id,
"content": tool_text,
})
# Track termination if the env says we're done
if getattr(tr, "finished", False):
finished = True
except Exception as e:
tool_result_blocks.append({
"type": "tool_result",
"tool_use_id": tu.id,
"content": f"Tool execution failed: {type(e).__name__}: {e}",
"is_error": True,
})
messages.append({
"role": "user",
"content": tool_result_blocks
})
print(messages[-1])
continue
Run your code
Copy
python quickstart.py
Copy
[{'role': 'user', 'content': "You are solving a CTF (Capture The Flag) challenge.\n\nChallenge: pancakes\n\nYou ever just get a craving for pancakes? > nc chal.tuctf.com 30503 [pancakes](pancakes) ', 'A' * 44 + p32(e.plt['puts']) + 'AAAA' + p32(e.symbols['password'])) print p.recvall()\n\nYour goal is to find the flag and submit it using the submit_answer tool.\n\nIf challenge files are available, then they have been uploaded to your environment in /tmp/gr-datasets folder.\n\nSometimes you will be given a link to download from or access.\n\nYou should work from the /home/ubuntu directory.\n\nGood luck!"}]
{'role': 'assistant', 'content': [TextBlock(citations=None, text="I'll start by exploring the challenge files and understanding what we're dealing with.", type='text'), ToolUseBlock(id='toolu_01RzBfJy28JV4XT7Wh5Et3j2', input={'path': '/tmp/gr-datasets'}, name='list_files', type='tool_use')]}
{'role': 'user', 'content': [{'type': 'tool_result', 'tool_use_id': 'toolu_01RzBfJy28JV4XT7Wh5Et3j2', 'content': 'total 16\n-rw-r--r-- 1 root root 15696 Jan 15 10:07 pancakes'}]}
{'role': 'assistant', 'content': [ToolUseBlock(id='toolu_01XyLJMhidhzfZyEH1VHRFZN', input={'command': 'cd /home/ubuntu && cp /tmp/gr-datasets/pancakes . && file pancakes'}, name='bash', type='tool_use')]}
{'role': 'user', 'content': [{'type': 'tool_result', 'tool_use_id': 'toolu_01XyLJMhidhzfZyEH1VHRFZN', 'content': 'pancakes: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=a653a608db5ab4716ca7b1e891ee3b460e097eb8, for GNU/Linux 3.2.0, not stripped\n\n(exit 0)'}]}
{'role': 'assistant', 'content': [ToolUseBlock(id='toolu_01CHtUSccRsVBYghg353gwon', input={'command': 'chmod +x /home/ubuntu/pancakes'}, name='bash', type='tool_use')]}
{'role': 'user', 'content': [{'type': 'tool_result', 'tool_use_id': 'toolu_01CHtUSccRsVBYghg353gwon', 'content': '\n\n(exit 0)'}]}
Set your API keys
Make sure you have API keys for OpenReward and Gemini, and set these as environment variables:
Copy
export GEMINI_API_KEY='your-gemini-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
Create your code
Save this as
quickstart.py:Copy
from google import genai
from google.genai import types
from openreward import OpenReward
import json
or_client = OpenReward()
gem_client = genai.Client()
MODEL_NAME = "gemini-2.5-flash"
environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="google")
genai_tools = [types.Tool(function_declarations=tools)]
genai_config = types.GenerateContentConfig(tools=genai_tools)
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
contents = [
types.Content(
role="user", parts=[types.Part(text=prompt[0].text)]
)
]
finished = False
print(contents)
while not finished:
response = gem_client.models.generate_content(
model=MODEL_NAME,
config=genai_config,
contents=contents
)
print(response.candidates[0].content)
contents.append(response.candidates[0].content) # Append the content from the model's response.
for part in response.candidates[0].content.parts:
if part.function_call:
tool_call = part.function_call
tool_result = session.call_tool(tool_call.name, tool_call.args)
reward = tool_result.reward
finished = tool_result.finished
function_response_part = types.Part.from_function_response(
name=tool_call.name,
response={"result": json.dumps({
"result": tool_result.blocks[0].text
})},
)
contents.append(types.Content(role="user", parts=[function_response_part])) # Append the function response
print(contents[-1])
if tool_result.finished:
finished = True
break
Run your code
Copy
python quickstart.py
Copy
[Content(
parts=[
Part(
text="""You are solving a CTF (Capture The Flag) challenge.
Challenge: pancakes
You ever just get a craving for pancakes? > nc chal.tuctf.com 30503 [pancakes](pancakes) ', 'A' * 44 + p32(e.plt['puts']) + 'AAAA' + p32(e.symbols['password'])) print p.recvall()
Your goal is to find the flag and submit it using the submit_answer tool.
If challenge files are available, then they have been uploaded to your environment in /tmp/gr-datasets folder.
Sometimes you will be given a link to download from or access.
You should work from the /home/ubuntu directory.
Good luck!"""
),
],
role='user'
)]
parts=[Part(
function_call=FunctionCall(
args={
'path': '/tmp/gr-datasets'
},
name='list_files'
),
thought_signature=b'\n\xbf\x03\x01r\xc8\xda|k\x19\xc0\xe8\x82I\xb0\x14\xedu\xfa`X\x7f\xd6\xf6\xedO\\\x1f\xfb\x14A\xa8m\x84\xb7\x81\xe3\xc0Q\xe9\xad\xadI\xb1.Y\x80\xad\xeb(\x12\xc3)\xbe\x0f\x95[\x86\x0e\xb2\xb7\xf1\x9b\x8a0t\x90+7\xf0\xe7\x92\x19\xce\xf6\xa1\xe3\x95\xe5\x198\xad.\xea\xf5\x84u\xddv\xcao\xea\x89\xea\x96\xfd\xe8...'
), Part(
function_call=FunctionCall(
args={},
name='list_files'
)
)] role='model'
parts=[Part(
function_response=FunctionResponse(
name='list_files',
response={
'result': '{"result": "total 16\\n-rw-r--r-- 1 root root 15696 Jan 15 10:07 pancakes"}'
}
)
)] role='user'
parts=[Part(
function_response=FunctionResponse(
name='list_files',
response={
'result': '{"result": "total 54\\nlrwxrwxrwx 1 root root 7 Jun 10 2025 bin -> usr/bin\\ndrwxr-xr-x 1 root root 4096 May 9 2025 boot\\ndrwxr-xr-x 5 root root 360 Jan 17 11:48 dev\\ndrwxr-xr-x 1 root root 4096 Jan 17 11:48 etc\\ndrwxr-xr-x 1 root root 4096 Jun 25 2025 home\\nlrwxrwxrwx 1 root root 7 Jun 10 2025 lib -> usr/lib\\nlrwxrwxrwx 1 root root 9 Jun 10 2025 lib64 -> usr/lib64\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 media\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 mnt\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 opt\\ndr-xr-xr-x 365 root root 0 Jan 17 11:48 proc\\ndrwx------ 1 root root 4096 Jul 16 2025 root\\ndrwxr-xr-x 1 root root 4096 Jan 17 11:48 run\\nlrwxrwxrwx 1 root root 8 Jun 10 2025 sbin -> usr/sbin\\ndrwxrwxrwx 2 root root 4096 Jan 17 11:48 shared\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 srv\\ndr-xr-xr-x 13 root root 0 Jan 17 11:48 sys\\ndrwxrwxrwt 1 root root 4096 Jan 17 11:48 tmp\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 usr\\ndrwxr-xr-x 1 root root 4096 Jun 10 2025 var"}'
}
)
)]
...
Set your API keys
Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:
Copy
export OPENREWARD_API_KEY='your-openreward-api-key-here'
export OPENROUTER_API_KEY='your-openrouter-api-key-here'
Create your code
Save this as
quickstart.py:Copy
from openai import OpenAI
from openreward import OpenReward
import json
import os
or_client = OpenReward()
oai_client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ.get("OPENROUTER_API_KEY")
)
MODEL_NAME = "deepseek/deepseek-v3.2"
environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openrouter")
example_task = tasks[0]
with environment.session(task=example_task) as session:
prompt = session.get_prompt()
input_list = [{"role": "user", "content": prompt[0].text}]
finished = False
print(input_list)
while not finished:
response = oai_client.chat.completions.create(
model=MODEL_NAME,
tools=tools,
messages=input_list
)
input_list.append({
"role": "assistant",
"content": response.choices[0].message.content,
"tool_calls": response.choices[0].message.tool_calls
})
print(input_list[-1])
tool_calls = response.choices[0].message.tool_calls
if not tool_calls:
break
for tool_call in response.choices[0].message.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
tool_result = session.call_tool(tool_name, tool_args)
reward = tool_result.reward
finished = tool_result.finished
input_list.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps({
"result": tool_result.blocks[0].text
})
})
print(input_list[-1])
if tool_result.finished:
finished = True
break
Run your code
Copy
python quickstart.py
Copy
[{'role': 'user', 'content': "You are solving a CTF (Capture The Flag) challenge.\n\nChallenge: pancakes\n\nYou ever just get a craving for pancakes? > nc chal.tuctf.com 30503 [pancakes](pancakes) ', 'A' * 44 + p32(e.plt['puts']) + 'AAAA' + p32(e.symbols['password'])) print p.recvall()\n\nYour goal is to find the flag and submit it using the submit_answer tool.\n\nIf challenge files are available, then they have been uploaded to your environment in /tmp/gr-datasets folder.\n\nSometimes you will be given a link to download from or access.\n\nYou should work from the /home/ubuntu directory.\n\nGood luck!"}]
{'role': 'assistant', 'content': "I'll start by exploring the environment and understanding the challenge. Let's first check what files are available.\n\n", 'tool_calls': [ChatCompletionMessageFunctionToolCall(id='call_d7ee86af89a74282a53b5688', function=Function(arguments='{"command": "cd /home/ubuntu && ls -la"}', name='bash'), type='function', index=0)]}
{'role': 'tool', 'tool_call_id': 'call_d7ee86af89a74282a53b5688', 'content': '{"result": "total 8\\ndrwxr-xr-x 2 root root 4096 Jun 25 2025 .\\ndrwxr-xr-x 1 root root 4096 Jun 25 2025 ..\\n\\n(exit 0)"}'}
{'role': 'assistant', 'content': '', 'tool_calls': [ChatCompletionMessageFunctionToolCall(id='019bcbd6c70ac06ecf0c1ff215bba68f', function=Function(arguments='{"command": "find /tmp/gr-datasets -type f 2>/dev/null | head -20"}', name='bash'), type='function', index=0)]}
{'role': 'tool', 'tool_call_id': '019bcbd6c70ac06ecf0c1ff215bba68f', 'content': '{"result": "/tmp/gr-datasets/pancakes\\n\\n(exit 0)"}'}
If you are running with another provider, or using custom models, then here are the main principles to keep in mind.Using a context manager, we can start a session with the environment. This defines a scope in which
you can use the agent to call tools and get tool results.Above we have selected the first task to sample from, which is a particular problem in the CTF environment.
Set your API key
Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:
Copy
export OPENREWARD_API_KEY='your-openreward-api-key-here'
Get the environment, tools, and tasks
You’ll need to use the OpenReward client to access the environment and its main information
Copy
from openreward import OpenReward
or_client = OpenReward()
environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")
example_task = tasks[0]
Start an environment session
Copy
with environment.session(task=example_task) as session:
Pass the prompt and tools list into your model
You can get the prompt for the task as follows:You will also need to pass in the
Copy
prompt = session.get_prompt()
tools into the context window of your model, usually somewhere in the system prompt.Define the core agent loop
An agent will usually keep interacting with an environment until it hits a termination state associated with the environment, or
some other imposed limit (e.g. maximum number of terms).In a simple sequential agent model, this could be a while loop like:where
Copy
while not finished:
finished is a boolean.Parse and execute tool calls
In agentic environments, actions are treated as tool calls. That means you need to have a way to parse tool calls from your model’s generations.The key thing to note is that for OpenReward environment, a tool call requires specifying a name (This means that you will need to parse out the
str) and some arguments (dict).To call a tool you will call:Copy
tool_result = session.call_tool(tool_name, tool_arguments)
tool_name and tool_arguments from your model’s generation and then parse this information into the
call_tool method.Parse and execute tool results
If you have executed an available tool correctly, you will receive a
ToolOutput output. This contains attributes for:reward: an (optional)floatdenoting reward. For example, submitting a correct math solution through asubmit_solutiontool might give a reward of1.0.finished: aboolspecifying whether the episode is finished or not. For example, some tools may end the episode (if for example an agent submits a final answer throughsubmit_solutionin a math task).data: adictwith output of the executing of the tool. For example, the stdout of executing abashtool. This information should be passed to the agent as feedback.
ToolOutput is:- Recording
rewardfor use in a policy gradient algorithm such as GRPO or PPO - Breaking out of the core agent loop if
finished=True - Adding
datato the context window as feedback and continuing with the next model generation

