Skip to main content

Prerequisites

  • An OpenReward account
  • An OpenReward API key
  • An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)

Sample an Environment

In this example we’ll call the GeneralReasoning/CTF environment on OpenReward and sample an agent trace using the Python SDK. Choose a model provider below and sample your first environment!
1

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2

Install the SDK

pip install openreward
3

Create your code

Save this as quickstart.py:
from openai import OpenAI
from openreward import OpenReward
import json

or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5.2"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")

example_task = tasks[0]

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False
    print(input_list)

    while not finished:
        response = oai_client.responses.create(
            model=MODEL_NAME,
            tools=tools,
            input=input_list
        )
        print(response.output)

        input_list += response.output

        for item in response.output:
            if item.type == "function_call":
                tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                reward = tool_result.reward
                finished = tool_result.finished

                input_list.append({
                    "type": "function_call_output",
                    "call_id": item.call_id,
                    "output": json.dumps({
                        "result": tool_result.blocks[0].text
                    })
                })

                print(input_list[-1])

                if tool_result.finished:
                    finished = True
                    break
4

Run your code

  python quickstart.py
Example output:
{'role': 'user', 'content': "You are solving a CTF (Capture The Flag) challenge.\n\nChallenge: pancakes\n\nYou ever just get a craving for pancakes? > nc chal.tuctf.com 30503 [pancakes](pancakes) ', 'A' * 44 + p32(e.plt['puts']) + 'AAAA' + p32(e.symbols['password'])) print p.recvall()\n\nYour goal is to find the flag and submit it using the submit_answer tool.\n\nIf challenge files are available, then they have been uploaded to your environment in /tmp/gr-datasets folder.\n\nSometimes you will be given a link to download from or access.\n\nYou should work from the /home/ubuntu directory.\n\nGood luck!"}]
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6de55018819da22725bb44fba022', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"path":"/tmp/gr-datasets","show_hidden":false,"recursive":false}', call_id='call_VYpjo87wSocwCsrEc7QOvAY7', name='list_files', type='function_call', id='fc_0f6e802a28329b2f00696b6de68780819d85967ff172cae065', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_VYpjo87wSocwCsrEc7QOvAY7', 'output': '{"result": "total 16\\n-rw-r--r-- 1 root root 15696 Jan 15 10:07 pancakes"}'}
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6de7aa04819d85f6f667d4a1fcbb', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"command":"cp /tmp/gr-datasets/pancakes /home/ubuntu/ && cd /home/ubuntu && ls -l pancakes && file pancakes && sha256sum pancakes"}', call_id='call_zKDEIePRL8nYrh25TMgN2Gvo', name='bash', type='function_call', id='fc_0f6e802a28329b2f00696b6de86d3c819d9122aaaa02bc1a6f', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_zKDEIePRL8nYrh25TMgN2Gvo', 'output': '{"result": "-rw-r--r-- 1 root root 15696 Jan 17 11:09 pancakes\\npancakes: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=a653a608db5ab4716ca7b1e891ee3b460e097eb8, for GNU/Linux 3.2.0, not stripped\\nbd470b542b111437343079807073089ab024de3d89c45ac0926c6a9ed8c36a4a  pancakes\\n\\n(exit 0)"}'}
[ResponseReasoningItem(id='rs_0f6e802a28329b2f00696b6dea067c819d9661f75dec1ea93a', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"command":"cd /home/ubuntu && ./pancakes << EOF\\nEOF"}', call_id='call_XXITZmGDUiZYOlMfuK4kmEDl', name='bash', type='function_call', id='fc_0f6e802a28329b2f00696b6dea46d0819d8ff38d87410d1962', status='completed')]
{'type': 'function_call_output', 'call_id': 'call_XXITZmGDUiZYOlMfuK4kmEDl', 'output': '{"result": "/bin/bash: line 2: ./pancakes: Permission denied\\n\\n(exit 126)"}'}
...

Next steps

Now that you know how to sample from OpenReward environments, explore these key features: