Quickstart

Prerequisites

An OpenReward account
An OpenReward API key
An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)

Sample an Environment

In this example we’ll call the GeneralReasoning/MLEBench-Train environment on OpenReward and sample an agent trace using the Python SDK. Choose a model provider below and sample your first environment!

OpenAI
Anthropic
Google
OpenRouter
Other Models

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:

export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Install the SDK

pip install openreward

Create your code

Save this as quickstart.py:

  from openai import OpenAI
  from openreward import OpenReward
  import json

  or_client = OpenReward()
  oai_client = OpenAI()
  MODEL_NAME = "gpt-5"

  environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
  tasks = environment.get_tasks(split="train")
  tools = environment.get_tools(format="openai")

  example_task = tasks[0]

  with environment.session(task=example_task) as session:
      prompt = session.get_prompt()
      input_list = [{"role": "user", "content": prompt[0].text}]
      finished = False
      print(input_list)

      while not finished:
          response = oai_client.responses.create(
              model=MODEL_NAME,
              tools=tools,
              input=input_list
          )
          print(response.output)

          input_list += response.output

          for item in response.output:
              if item.type == "function_call":
                  tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                  reward = tool_result.reward
                  finished = tool_result.finished

                  input_list.append({
                      "type": "function_call_output",
                      "call_id": item.call_id,
                      "output": json.dumps({
                          "result": tool_result.blocks[0].text
                      })
                  })

                  print(input_list[-1])

                  if tool_result.finished:
                      finished = True
                      break

Run your code

  python quickstart.py

Example output:

  blablabla

Set your API keys

Make sure you have API keys for OpenReward and Anthropic, and set these as environment variables:

export ANTHROPIC_API_KEY='your-anthropic-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Install the SDK

pip install openreward

Create your code

Save this as quickstart.py:

  import anthropic
  from openreward import OpenReward
  import json

  or_client = OpenReward()
  ant_client = anthropic.Anthropic()
  MODEL_NAME = "claude-sonnet-4-5"

  environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
  tasks = environment.get_tasks(split="train")
  tools = environment.get_tools(format="anthropic")

  example_task = tasks[0]

  with environment.session(task=example_task) as session:
      prompt = session.get_prompt()
      messages = [{"role": "user", "content": prompt[0].text}]
      finished = False
      print(messages)

      while not finished:
          response = ant_client.messages.create(
              model=MODEL_NAME,
              max_tokens=4096,
              tools=tools,
              messages=messages
          )

          print(message)
          messages.append(message)

          if message.stop_reason == "tool_use":
              tool_use = next(block for block in message.content if block.type == "tool_use")
              tool_name = tool_use.name
              tool_input = tool_use.input
              
              tool_result = session.call_tool(tool_name, tool_input)
              reward = tool_result.reward
              finished = tool_result.finished

              messages.append({
                  "role": "user",
                  "content": [
                      {
                          "type": "tool_result",
                          "tool_use_id": tool_use.id,
                          "content": tool_result.blocks[0].text
                      }
                  ]
              })

              print(messages[-1])
              
              if tool_result.finished:
                  finished = True
                  break

Run your code

  python quickstart.py

Example output:

  blablabla

Set your API keys

Make sure you have API keys for OpenReward and Gemini, and set these as environment variables:

export GEMINI_API_KEY='your-gemini-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Install the SDK

pip install openreward

Create your code

Save this as quickstart.py:

  from google import genai
  from google.genai import types
  from openreward import OpenReward
  import json

  or_client = OpenReward()
  gem_client = genai.Client()
  MODEL_NAME = "gemini-2.5-flash"

  environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
  tasks = environment.get_tasks(split="train")
  tools = environment.get_tools(format="google")

  genai_tools = [types.Tool(function_declarations=[f]) for f in tools]
  genai_config = types.GenerateContentConfig(tools=genai_tools)

  example_task = tasks[0]

  with environment.session(task=example_task) as session:
      prompt = session.get_prompt()
      contents = [
          types.Content(
              role="user", parts=[types.Part(text=prompt[0].text)]
          )
      ]
      finished = False
      print(contents)

      while not finished:

          response = gem_client.models.generate_content(
              model=MODEL_NAME,
              config=genai_config,
              contents=contents
          )

          print(response.candidates[0].content)

          contents.append(response.candidates[0].content) # Append the content from the model's response.

          for part in response.candidates[0].content.parts:
              if part.function_call:
                  tool_call = part.function_call
                  tool_result = session.call_tool(tool_call.name, tool_call.args)

                  reward = tool_result.reward
                  finished = tool_result.finished

                  function_response_part = types.Part.from_function_response(
                      name=tool_call.name,
                      response={"result": json.dumps({
                          "result": tool_result.blocks[0].text
                      })},
                  )

                  contents.append(types.Content(role="user", parts=[function_response_part])) # Append the function response

                  print(contents[-1])

                  if tool_result.finished:
                      finished = True
                      break

Run your code

  python quickstart.py

Example output:

  blablabla

Set your API keys

Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:

export OPENREWARD_API_KEY='your-openreward-api-key-here'

Install the SDK

pip install openreward

Create your code

Save this as quickstart.py:

from openai import OpenAI
from openreward import OpenReward
import json
import os

or_client = OpenReward()
oai_client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=os.environ.get("OPENROUTER_API_KEY")
)
MODEL_NAME = "deepseek/deepseek-v3.2"

environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="openrouter")

example_task = tasks[0]

with environment.session(task=example_task) as session:
    prompt = await session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False
    print(input_list)

    while not finished:
        response = oai_client.chat.completions.create(
            model=MODEL_NAME,
            tools=tools,
            messages=input_list
        )
        print(response)

        input_list.append(response)

        tool_calls = response.choices[0].message.tool_calls
        if not tool_calls:
            return
        
        for tool_call in response.choices[0].message.tool_calls:
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            tool_result = await session.call_tool(tool_name, tool_args)

            reward = tool_result.reward
            finished = tool_result.finished

            input_list.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps({
                    "result": tool_result.blocks[0].text
                })
            })

            print(input_list[-1])

            if tool_result.finished:
                finished = True
                break

Run your code

  python quickstart.py

Example output:

  blablabla

If you are running with another provider, or using custom models, then here are the main principles to keep in mind.

Set your API key

Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:

export OPENROUTER_API_KEY='your-openrouter-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Install the SDK

pip install openreward

Get the environment, tools, and tasks

You’ll need to use the OpenReward client to access the environment and its main information

from openreward import OpenReward
or_client = OpenReward()

environment = or_client.environments.get(name="GeneralReasoning/MLEBench-Train")
tasks = environment.get_tasks(split="train")
tools = environment.get_tools(format="openai")
example_task = tasks[0]

Start an environment session

async with environment.session(task=example_task) as session:

Using a context manager, we can start a session with the environment. This defines a scope in which you can use the agent to call tools and get tool results.Above we have selected the first task to sample from, which is a particular problem in the MLEBench-Train environment.

Pass the prompt and tools list into your model

You can get the prompt for the task as follows:

prompt = await session.get_prompt()

You will also need to pass in the tools into the context window of your model, usually somewhere in the system prompt.

Define the core agent loop

An agent will usually keep interacting with an environment until it hits a termination state associated with the environment, or some other imposed limit (e.g. maximum number of terms).In a simple sequential agent model, this could be a while loop like:

while not finished:

where finished is a boolean.

Parse and execute tool calls

In agentic environments, actions are treated as tool calls. That means you need to have a way to parse tool calls from your model’s generations.The key thing to note is that for OpenReward environment, a tool call requires specifying a name (str) and some arguments (dict).To call a tool you will call:

   tool_result = await session.call_tool(tool_name, tool_arguments)

This means that you will need to parse out the tool_name and tool_arguments from your model’s generation and then parse this information into the call_tool method.

Parse and execute tool results

If you have executed an available tool correctly, you will receive a ToolOutput output. This contains attributes for:

reward : an (optional) float denoting reward. For example, submitting a correct math solution through a submit_solution tool might give a reward of 1.0.
finished : a bool specifying whether the episode is finished or not. For example, some tools may end the episode (if for example an agent submits a final answer through submit_solution in a math task).
data : a dict with output of the executing of the tool. For example, the stdout of executing a bash tool. This information should be passed to the agent as feedback.

For our agent loop, some example control flow we might want after receiving a ToolOutput is:

Recording reward for use in a policy gradient algorithm such as GRPO or PPO
Breaking out of the core agent loop if finished=True
Adding data to the context window as feedback and continuing with the next model generation

Next steps

Now that you know how to sample from OpenReward environments, explore these key features:

Evaluate with OpenReward

Learn how to run evals with OpenReward.

Get started

Using Environments

Making Environments

Prerequisites

Sample an Environment

Next steps

Evaluate with OpenReward

Get started

Using Environments

Making Environments

​Prerequisites

​Sample an Environment

​Next steps

Evaluate with OpenReward

Prerequisites

Sample an Environment

Next steps