Skip to main content

Goals

  • Understand what rollouts are and why you would record them.
  • Record rollouts from an agent interacting with an OpenReward environment.
  • Use provider-specific logging methods for OpenAI, Anthropic, Google Gemini, and OpenRouter.
  • View recorded rollouts in the OpenReward dashboard.

Prerequisites

  • An OpenReward account
  • An OpenReward API key
  • An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
  • Familiarity with the Your First Environment tutorial

Setup

Install the OpenReward Python library:
pip install openreward

What are Rollouts?

A rollout is a recorded trace of an agent interacting with an environment. It captures every message exchange — user prompts, assistant responses, reasoning steps, tool calls, and tool results — along with associated rewards and metadata. Recording rollouts is useful for:
  • Debugging: Inspect exactly what your agent did step-by-step.
  • Analysis: Compare performance across models, prompts, or environment configurations.
  • Training data: Collect trajectories for supervised finetuning or midtraining.
  • Collaboration: Your team can view runs on your organisation’s OpenReward page.
The OpenReward Python SDK provides provider-specific logging methods that automatically serialize messages from OpenAI, Anthropic, and Google Gemini into a normalized format. For OpenRouter (which uses the OpenAI SDK), you use the OpenAI completions logger.

Key Concepts

Runs and Rollouts

A run groups related rollouts together under a single name (e.g. "gpt-5.4-ctf-eval"). Each rollout within a run represents one episode or conversation with the environment.

Creating a Rollout

import openreward

or_client = openreward.OpenReward()

rollout = or_client.rollout.create(
    run_name="my-eval-run",              # Required: groups rollouts together
    rollout_name="episode-1",            # Optional: name for this specific rollout
    environment="GeneralReasoning/CTF",  # Optional: environment identifier
    split="train",                       # Optional: data split
    metadata={"model": "gpt-5.4"},       # Optional: custom key-value pairs
    task_spec={"id": "task-0"}           # Optional: task specification
)

Logging Methods

The SDK provides these logging methods on a Rollout object:
MethodProviderInput Type
rollout.log_openai_response(message)OpenAI (Responses API)Response object or individual output items
rollout.log_openai_completions(message)OpenAI (Chat Completions) / OpenRouterChat message dicts
rollout.log_anthropic_message(message)AnthropicMessage dicts (Anthropic format)
rollout.log_gdm_message(message)Google Geminigoogle.genai.types.Content objects
rollout.log(message)GenericUserMessage, AssistantMessage, ToolCall, etc.
Each method also accepts these parameters:
  • reward (Optional[float]): Reward signal for this step
  • is_finished (Optional[bool]): Whether this step ends the episode
  • metadata (Optional[dict]): Arbitrary metadata for this step

Flushing Events

When you are done logging, call or_client.rollout.close() to ensure all pending events are flushed and uploaded:
or_client.rollout.close()

Recording Rollouts by Provider

In this example we will sample the GeneralReasoning/CTF environment and record the full rollout trace to OpenReward.
1

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:
export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'
2

Create your code

Save this as record_rollout.py:
from openai import OpenAI
from openreward import OpenReward
import json

or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5.4"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="openai-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False

    # Log the initial user prompt
    rollout.log_openai_response(input_list[0])

    while not finished:
        response = oai_client.responses.create(
            model=MODEL_NAME,
            tools=tools,
            input=input_list
        )

        # Log the full model response (handles reasoning, text, tool calls)
        rollout.log_openai_response(response)

        input_list += response.output

        for item in response.output:
            if item.type == "function_call":
                tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                reward = tool_result.reward
                finished = tool_result.finished

                tool_output = {
                    "type": "function_call_output",
                    "call_id": item.call_id,
                    "output": json.dumps({
                        "result": tool_result.blocks[0].text
                    })
                }
                input_list.append(tool_output)

                # Log the tool result with reward info
                rollout.log_openai_response(
                    tool_output,
                    reward=reward,
                    is_finished=finished
                )

                if finished:
                    break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")
3

Run your code

python record_rollout.py

Viewing Rollouts

After running your code, head to OpenReward and your profile page. Go to the Runs tab to see your recorded runs: Runs in the OpenReward UI Click on a run to see its rollouts: Example run Click on a specific rollout to inspect the full message timeline, including reasoning steps, tool calls, and rewards: Example rollout

Next Steps

Train with OpenReward

Use recorded rollouts as part of a reinforcement learning training pipeline.

Evaluate with OpenReward

Run systematic evaluations across environments and models.

Build your own environment

Create custom environments for your use case.

Using the AsyncClient

Record rollouts asynchronously for better performance.