Recording Rollouts

Goals

Understand what rollouts are and why you would record them.
Record rollouts from an agent interacting with an OpenReward environment.
Use provider-specific logging methods for OpenAI, Anthropic, Google Gemini, and OpenRouter.
View recorded rollouts in the OpenReward dashboard.

Prerequisites

An OpenReward account
An OpenReward API key
An API key and SDK for your model provider of choice (e.g. OpenAI, Anthropic, Google, OpenRouter)
Familiarity with the Your First Environment tutorial

Setup

Install the OpenReward Python library:

pip install openreward

What are Rollouts?

A rollout is a recorded trace of an agent interacting with an environment. It captures every message exchange — user prompts, assistant responses, reasoning steps, tool calls, and tool results — along with associated rewards and metadata. Recording rollouts is useful for:

Debugging: Inspect exactly what your agent did step-by-step.
Analysis: Compare performance across models, prompts, or environment configurations.
Training data: Collect trajectories for supervised finetuning or midtraining.
Collaboration: Your team can view runs on your organisation’s OpenReward page.

The OpenReward Python SDK provides provider-specific logging methods that automatically serialize messages from OpenAI, Anthropic, and Google Gemini into a normalized format. For OpenRouter (which uses the OpenAI SDK), you use the OpenAI completions logger.

Key Concepts

Runs and Rollouts

A run groups related rollouts together under a single name (e.g. "gpt-5.2-ctf-eval"). Each rollout within a run represents one episode or conversation with the environment.

Creating a Rollout

import openreward

or_client = openreward.OpenReward()

rollout = or_client.rollout.create(
    run_name="my-eval-run",              # Required: groups rollouts together
    rollout_name="episode-1",            # Optional: name for this specific rollout
    environment="GeneralReasoning/CTF",  # Optional: environment identifier
    split="train",                       # Optional: data split
    metadata={"model": "gpt-5.2"},       # Optional: custom key-value pairs
    task_spec={"id": "task-0"}           # Optional: task specification
)

Logging Methods

The SDK provides these logging methods on a Rollout object:

Method	Provider	Input Type
`rollout.log_openai_response(message)`	OpenAI (Responses API)	Response object or individual output items
`rollout.log_openai_completions(message)`	OpenAI (Chat Completions) / OpenRouter	Chat message dicts
`rollout.log_anthropic_message(message)`	Anthropic	Message dicts (Anthropic format)
`rollout.log_gdm_message(message)`	Google Gemini	`google.genai.types.Content` objects
`rollout.log(message)`	Generic	`UserMessage`, `AssistantMessage`, `ToolCall`, etc.

Each method also accepts these parameters:

reward (Optional[float]): Reward signal for this step
is_finished (Optional[bool]): Whether this step ends the episode
metadata (Optional[dict]): Arbitrary metadata for this step

Flushing Events

When you are done logging, call or_client.rollout.close() to ensure all pending events are flushed and uploaded:

or_client.rollout.close()

Recording Rollouts by Provider

In this example we will sample the GeneralReasoning/CTF environment and record the full rollout trace to OpenReward.

OpenAI
Anthropic
Google
OpenRouter
Other Models

Set your API keys

Make sure you have API keys for OpenReward and OpenAI, and set these as environment variables:

export OPENAI_API_KEY='your-openai-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Create your code

Save this as record_rollout.py:

from openai import OpenAI
from openreward import OpenReward
import json

or_client = OpenReward()
oai_client = OpenAI()
MODEL_NAME = "gpt-5.2"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openai")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="openai-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False

    # Log the initial user prompt
    rollout.log_openai_response(input_list[0])

    while not finished:
        response = oai_client.responses.create(
            model=MODEL_NAME,
            tools=tools,
            input=input_list
        )

        # Log the full model response (handles reasoning, text, tool calls)
        rollout.log_openai_response(response)

        input_list += response.output

        for item in response.output:
            if item.type == "function_call":
                tool_result = session.call_tool(item.name, json.loads(str(item.arguments)))

                reward = tool_result.reward
                finished = tool_result.finished

                tool_output = {
                    "type": "function_call_output",
                    "call_id": item.call_id,
                    "output": json.dumps({
                        "result": tool_result.blocks[0].text
                    })
                }
                input_list.append(tool_output)

                # Log the tool result with reward info
                rollout.log_openai_response(
                    tool_output,
                    reward=reward,
                    is_finished=finished
                )

                if finished:
                    break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")

Run your code

python record_rollout.py

Set your API keys

Make sure you have API keys for OpenReward and Anthropic, and set these as environment variables:

export ANTHROPIC_API_KEY='your-anthropic-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Create your code

Save this as record_rollout.py:

import anthropic
from openreward import OpenReward
import json

or_client = OpenReward()
ant_client = anthropic.Anthropic()
MODEL_NAME = "claude-sonnet-4-5"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="anthropic")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="anthropic-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    messages = [{"role": "user", "content": prompt[0].text}]
    finished = False

    # Log the initial user prompt
    rollout.log_anthropic_message(messages[0])

    while not finished:
        response = ant_client.messages.create(
            model=MODEL_NAME,
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        assistant_msg = {"role": "assistant", "content": response.content}
        messages.append(assistant_msg)

        # Log the assistant response (handles text, thinking, tool_use blocks)
        rollout.log_anthropic_message(assistant_msg)

        if response.stop_reason == "tool_use":
            tool_use = next(block for block in response.content if block.type == "tool_use")
            tool_name = tool_use.name
            tool_input = tool_use.input

            tool_result = session.call_tool(tool_name, tool_input)
            reward = tool_result.reward
            finished = tool_result.finished

            user_msg = {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": tool_result.blocks[0].text
                    }
                ]
            }
            messages.append(user_msg)

            # Log the tool result with reward info
            rollout.log_anthropic_message(
                user_msg,
                reward=reward,
                is_finished=finished
            )

            if finished:
                break
        else:
            break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")

Run your code

python record_rollout.py

Set your API keys

Make sure you have API keys for OpenReward and Gemini, and set these as environment variables:

export GEMINI_API_KEY='your-gemini-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Create your code

Save this as record_rollout.py:

from google import genai
from google.genai import types
from openreward import OpenReward
import json

or_client = OpenReward()
gem_client = genai.Client()
MODEL_NAME = "gemini-2.5-flash"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="google")

genai_tools = [types.Tool(function_declarations=tools)]
genai_config = types.GenerateContentConfig(tools=genai_tools)

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="gemini-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    contents = [
        types.Content(
            role="user", parts=[types.Part(text=prompt[0].text)]
        )
    ]
    finished = False

    # Log the initial user prompt
    rollout.log_gdm_message(contents[0])

    while not finished:
        response = gem_client.models.generate_content(
            model=MODEL_NAME,
            config=genai_config,
            contents=contents
        )

        model_content = response.candidates[0].content
        contents.append(model_content)

        # Log the model response (handles text, thinking, function calls)
        rollout.log_gdm_message(model_content)

        for part in model_content.parts:
            if part.function_call:
                tool_call = part.function_call
                tool_result = session.call_tool(tool_call.name, tool_call.args)

                reward = tool_result.reward
                finished = tool_result.finished

                function_response_part = types.Part.from_function_response(
                    name=tool_call.name,
                    response={"result": json.dumps({
                        "result": tool_result.blocks[0].text
                    })},
                )

                user_content = types.Content(role="user", parts=[function_response_part])
                contents.append(user_content)

                # Log the tool result with reward info
                rollout.log_gdm_message(
                    user_content,
                    reward=reward,
                    is_finished=finished
                )

                if finished:
                    break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")

Run your code

python record_rollout.py

Set your API keys

Make sure you have API keys for OpenReward and OpenRouter, and set these as environment variables:

export OPENROUTER_API_KEY='your-openrouter-api-key-here'
export OPENREWARD_API_KEY='your-openreward-api-key-here'

Create your code

Save this as record_rollout.py.Since OpenRouter uses the OpenAI SDK (Chat Completions format), we use log_openai_completions for logging:

from openai import OpenAI
from openreward import OpenReward
import json
import os

or_client = OpenReward()
oai_client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ.get("OPENROUTER_API_KEY")
)
MODEL_NAME = "deepseek/deepseek-v3.2"

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")
tools = environment.list_tools(format="openrouter")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="openrouter-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": MODEL_NAME},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    input_list = [{"role": "user", "content": prompt[0].text}]
    finished = False

    # Log the initial user prompt
    rollout.log_openai_completions(input_list[0])

    while not finished:
        response = oai_client.chat.completions.create(
            model=MODEL_NAME,
            tools=tools,
            messages=input_list
        )

        assistant_msg = response.choices[0].message
        input_list.append(assistant_msg)

        # Log the assistant response (handles text + tool calls)
        rollout.log_openai_completions({
            "role": "assistant",
            "content": assistant_msg.content,
            "tool_calls": assistant_msg.tool_calls
        })

        tool_calls = assistant_msg.tool_calls
        if not tool_calls:
            break

        for tool_call in tool_calls:
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            tool_result = session.call_tool(tool_name, tool_args)

            reward = tool_result.reward
            finished = tool_result.finished

            tool_msg = {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps({
                    "result": tool_result.blocks[0].text
                })
            }
            input_list.append(tool_msg)

            # Log the tool result with reward info
            rollout.log_openai_completions(
                tool_msg,
                reward=reward,
                is_finished=finished
            )

            if finished:
                break

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")

Run your code

python record_rollout.py

If you are using a different provider, a custom model, or want full control over what gets logged, you can use the generic rollout.log() method with the built-in message types.

Generic Message Types

The SDK exports the following types from the openreward package:

Type	Fields	Description
`UserMessage`	`content: str`	A user/human message
`AssistantMessage`	`content: str`	An assistant/model response
`SystemMessage`	`content: str`	A system prompt
`ReasoningItem`	`content: Optional[str]`, `content_reference: Optional[str]`, `summary: Optional[str]`	Hidden reasoning / chain-of-thought
`ToolCall`	`name: str`, `content: str`, `call_id: str`	A tool/function invocation
`ToolResult`	`content: str`, `call_id: str`	The result of a tool call

Set your API keys

export OPENREWARD_API_KEY='your-openreward-api-key-here'

Set any additional API keys required by your model provider.

Create your code

Save this as record_rollout.py:

from openreward import (
    OpenReward,
    UserMessage,
    AssistantMessage,
    SystemMessage,
    ReasoningItem,
    ToolCall,
    ToolResult,
)
import json

or_client = OpenReward()

environment = or_client.environments.get(name="GeneralReasoning/CTF")
tasks = environment.list_tasks(split="train")

example_task = tasks[0]

# Create a rollout to record this episode
rollout = or_client.rollout.create(
    run_name="custom-ctf-rollouts",
    rollout_name="episode-0",
    environment="GeneralReasoning/CTF",
    split="train",
    metadata={"model": "my-custom-model"},
    task_spec=example_task.task_spec
)

with environment.session(task=example_task) as session:
    prompt = session.get_prompt()
    finished = False

    # Log the user prompt
    rollout.log(UserMessage(content=prompt[0].text))

    while not finished:
        # --- Replace this section with your model's inference ---
        model_response = "The answer is 42."
        # --------------------------------------------------------

        # Log the assistant response
        rollout.log(AssistantMessage(content=model_response))

        # If your model produces reasoning/thinking, log it:
        # rollout.log(ReasoningItem(content="Let me think about this..."))

        # If your model makes a tool call, log the call and result:
        tool_name = "submit_answer"
        tool_args = {"answer": model_response}
        call_id = "call_001"

        rollout.log(ToolCall(
            name=tool_name,
            content=json.dumps(tool_args),
            call_id=call_id
        ))

        tool_result = session.call_tool(tool_name, tool_args)

        rollout.log(
            ToolResult(
                content=tool_result.blocks[0].text,
                call_id=call_id
            ),
            reward=tool_result.reward,
            is_finished=tool_result.finished
        )

        finished = tool_result.finished

# Flush all pending events
or_client.rollout.close()
print("Rollout recorded successfully!")

Run your code

python record_rollout.py

Viewing Rollouts

After running your code, head to OpenReward and your profile page. Go to the Runs tab to see your recorded runs:

Click on a run to see its rollouts:

Click on a specific rollout to inspect the full message timeline, including reasoning steps, tool calls, and rewards:

Next Steps

Train with OpenReward

Use recorded rollouts as part of a reinforcement learning training pipeline.

Evaluate with OpenReward

Run systematic evaluations across environments and models.

Build your own environment

Create custom environments for your use case.

Using the AsyncClient

Record rollouts asynchronously for better performance.

Get started

Core Concepts

Making Environments

Training

Evaluation

Sandbox Providers

Rollouts

Deployment

Storage & Data

Goals

Prerequisites

Setup

What are Rollouts?

Key Concepts

Runs and Rollouts

Creating a Rollout

Logging Methods

Flushing Events

Recording Rollouts by Provider

Generic Message Types

Viewing Rollouts

Next Steps

Train with OpenReward

Evaluate with OpenReward

Build your own environment

Using the AsyncClient

Get started

Core Concepts

Making Environments

Training

Evaluation

Sandbox Providers

Rollouts

Deployment

Storage & Data

​Goals

​Prerequisites

​Setup

​What are Rollouts?

​Key Concepts

​Runs and Rollouts

​Creating a Rollout

​Logging Methods

​Flushing Events

​Recording Rollouts by Provider

​Generic Message Types

​Viewing Rollouts

​Next Steps

Train with OpenReward

Evaluate with OpenReward

Build your own environment

Using the AsyncClient

Goals

Prerequisites

Setup

What are Rollouts?

Key Concepts

Runs and Rollouts

Creating a Rollout

Logging Methods

Flushing Events

Recording Rollouts by Provider

Generic Message Types

Viewing Rollouts

Next Steps