Skip to main content

Goals

  • Understand what rollouts are and why you would record them.
  • Record rollouts from an agent interacting with an OpenReward environment.
  • Use provider-specific logging methods for OpenAI, Anthropic, Google Gemini, and OpenRouter.
  • View recorded rollouts in the OpenReward dashboard.

Prerequisites

  • An OpenReward account
  • An OpenReward API key
  • An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
  • Familiarity with the Your First Environment tutorial

Setup

Install the OpenReward Python library:
pip install openreward

What are Rollouts?

A rollout is a recorded trace of an agent interacting with an environment. It captures every message exchange — user prompts, assistant responses, reasoning steps, tool calls, and tool results — along with associated rewards and metadata. Recording rollouts is useful for:
  • Debugging: Inspect exactly what your agent did step-by-step.
  • Analysis: Compare performance across models, prompts, or environment configurations.
  • Training data: Collect trajectories for supervised finetuning or midtraining.
  • Collaboration: Your team can view runs on your organisation’s OpenReward page.
The OpenReward Python SDK provides provider-specific logging methods that automatically serialize messages from OpenAI, Anthropic, and Google Gemini into a normalized format. For OpenRouter (which uses the OpenAI SDK), you use the OpenAI completions logger.

Key Concepts

Runs and Rollouts

A run groups related rollouts together under a single name (e.g. "gpt-5.2-ctf-eval"). Each rollout within a run represents one episode or conversation with the environment.

Creating a Rollout

import openreward

or_client = openreward.OpenReward()

rollout = or_client.rollout.create(
    run_name="my-eval-run",              # Required: groups rollouts together
    rollout_name="episode-1",            # Optional: name for this specific rollout
    environment="GeneralReasoning/CTF",  # Optional: environment identifier
    split="train",                       # Optional: data split
    metadata={"model": "gpt-5.2"},       # Optional: custom key-value pairs
    task_spec={"id": "task-0"}           # Optional: task specification
)

Logging Methods

The SDK provides these logging methods on a Rollout object:
MethodProviderInput Type
rollout.log_openai_response(message)OpenAI (Responses API)Response object or individual output items
rollout.log_openai_completions(message)OpenAI (Chat Completions) / OpenRouterChat message dicts
rollout.log_anthropic_message(message)AnthropicMessage dicts (Anthropic format)
rollout.log_gdm_message(message)Google Geminigoogle.genai.types.Content objects
rollout.log(message)GenericUserMessage, AssistantMessage, ToolCall, etc.
Each method also accepts these parameters:
  • reward (Optional[float]): Reward signal for this step
  • is_finished (Optional[bool]): Whether this step ends the episode
  • metadata (Optional[dict]): Arbitrary metadata for this step

Flushing Events

When you are done logging, call or_client.rollout.close() to ensure all pending events are flushed and uploaded:
or_client.rollout.close()

Recording Rollouts by Provider

In this example we will sample the GeneralReasoning/CTF environment and record the full rollout trace to OpenReward.
1

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2

Create your code

Save this as record_rollout.py:
from openai import OpenAI
from openreward import OpenReward
import json

or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5.2"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="openai-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False

    # Log the initial user prompt
    rollout.log_openai_response(input_list[0])

    while not finished:
        response = oai_client.responses.create(
            model=MODEL_NAME,
            tools=tools,
            input=input_list
        )

        # Log the full model response (handles reasoning, text, tool calls)
        rollout.log_openai_response(response)

        input_list += response.output

        for item in response.output:
            if item.type == "function_call":
                tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                reward = tool_result.reward
                finished = tool_result.finished

                tool_output = {
                    "type": "function_call_output",
                    "call_id": item.call_id,
                    "output": json.dumps({
                        "result": tool_result.blocks[0].text
                    })
                }
                input_list.append(tool_output)

                # Log the tool result with reward info
                rollout.log_openai_response(
                    tool_output,
                    reward=reward,
                    is_finished=finished
                )

                if finished:
                    break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")
3

Run your code

python record_rollout.py

Viewing Rollouts

After running your code, head to OpenReward and your profile page. Go to the Runs tab to see your recorded runs: Runs in the OpenReward UI Click on a run to see its rollouts: Example run Click on a specific rollout to inspect the full message timeline, including reasoning steps, tool calls, and rewards: Example rollout

Next Steps