Skip to main content

Prerequisites

  • An OpenReward account
  • An OpenReward API key
  • An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)

Sample an Environment

In this example we’ll call the GeneralReasoning/MLEBench-Train environment on OpenReward and sample an agent trace using the Python SDK. Choose a model provider below and sample your first environment!
1

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2

Install the SDK

pip install openreward
3

Create your code

Save this as quickstart.py:
  from openai import OpenAI
  from openreward import OpenReward
  import json

  or_client = OpenReward()
  oai_client = OpenAI()
  MODEL_NAME = "gpt-5"

  environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
  tasks = environment.get_tasks(split="train")
  tools = environment.get_tools(format="openai")

  example_task = tasks[0]

  with environment.session(task=example_task) as session:
      prompt = session.get_prompt()
      input_list = [{"role": "user", "content": prompt[0].text}]
      finished = False
      print(input_list)

      while not finished:
          response = oai_client.responses.create(
              model=MODEL_NAME,
              tools=tools,
              input=input_list
          )
          print(response.output)

          input_list += response.output

          for item in response.output:
              if item.type == "function_call":
                  tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                  reward = tool_result.reward
                  finished = tool_result.finished

                  input_list.append({
                      "type": "function_call_output",
                      "call_id": item.call_id,
                      "output": json.dumps({
                          "result": tool_result.blocks[0].text
                      })
                  })

                  print(input_list[-1])

                  if tool_result.finished:
                      finished = True
                      break
4

Run your code

  python quickstart.py
Example output:
  blablabla

Next steps

Now that you know how to sample from OpenReward environments, explore these key features: