A rollout is a recorded trace of an agent interacting with an environment. It captures every message exchange — user prompts, assistant responses, reasoning steps, tool calls, and tool results — along with associated rewards and metadata.Recording rollouts is useful for:
Debugging: Inspect exactly what your agent did step-by-step.
Analysis: Compare performance across models, prompts, or environment configurations.
Training data: Collect trajectories for supervised finetuning or midtraining.
Collaboration: Your team can view runs on your organisation’s OpenReward page.
The OpenReward Python SDK provides provider-specific logging methods that automatically serialize messages from OpenAI, Anthropic, and Google Gemini into a normalized format. For OpenRouter (which uses the OpenAI SDK), you use the OpenAI completions logger.
A run groups related rollouts together under a single name (e.g. "gpt-5.4-ctf-eval"). Each rollout within a run represents one episode or conversation with the environment.
Save this as record_rollout.py.Since OpenRouter uses the OpenAI SDK (Chat Completions format), we use log_openai_completions for logging:
from openai import OpenAIfrom openreward import OpenRewardimport jsonimport osor_client = OpenReward()oai_client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ.get("OPENROUTER_API_KEY"))MODEL_NAME = "deepseek/deepseek-v3.2"environment = or_client.environments.get(name="GeneralReasoning/CTF")tasks = environment.list_tasks(split="train")tools = environment.list_tools(format="openrouter")example_task = tasks[0]# Create a rollout to record this episoderollout = or_client.rollout.create( run_name="openrouter-ctf-rollouts", rollout_name="episode-0", environment="GeneralReasoning/CTF", split="train", metadata={"model": MODEL_NAME}, task_spec=example_task.task_spec)with environment.session(task=example_task) as session: prompt = session.get_prompt() input_list = [{"role": "user", "content": prompt[0].text}] finished = False # Log the initial user prompt rollout.log_openai_completions(input_list[0]) while not finished: response = oai_client.chat.completions.create( model=MODEL_NAME, tools=tools, messages=input_list ) assistant_msg = response.choices[0].message input_list.append(assistant_msg) # Log the assistant response (handles text + tool calls) rollout.log_openai_completions({ "role": "assistant", "content": assistant_msg.content, "tool_calls": assistant_msg.tool_calls }) tool_calls = assistant_msg.tool_calls if not tool_calls: break for tool_call in tool_calls: tool_name = tool_call.function.name tool_args = json.loads(tool_call.function.arguments) tool_result = session.call_tool(tool_name, tool_args) reward = tool_result.reward finished = tool_result.finished tool_msg = { "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps({ "result": tool_result.blocks[0].text }) } input_list.append(tool_msg) # Log the tool result with reward info rollout.log_openai_completions( tool_msg, reward=reward, is_finished=finished ) if finished: break# Flush all pending eventsor_client.rollout.close()print("Rollout recorded successfully!")
3
Run your code
python record_rollout.py
If you are using a different provider, a custom model, or want full control over what gets logged, you can use the generic rollout.log() method with the built-in message types.
Set any additional API keys required by your model provider.
2
Create your code
Save this as record_rollout.py:
from openreward import ( OpenReward, UserMessage, AssistantMessage, SystemMessage, ReasoningItem, ToolCall, ToolResult,)import jsonor_client = OpenReward()environment = or_client.environments.get(name="GeneralReasoning/CTF")tasks = environment.list_tasks(split="train")example_task = tasks[0]# Create a rollout to record this episoderollout = or_client.rollout.create( run_name="custom-ctf-rollouts", rollout_name="episode-0", environment="GeneralReasoning/CTF", split="train", metadata={"model": "my-custom-model"}, task_spec=example_task.task_spec)with environment.session(task=example_task) as session: prompt = session.get_prompt() finished = False # Log the user prompt rollout.log(UserMessage(content=prompt[0].text)) while not finished: # --- Replace this section with your model's inference --- model_response = "The answer is 42." # -------------------------------------------------------- # Log the assistant response rollout.log(AssistantMessage(content=model_response)) # If your model produces reasoning/thinking, log it: # rollout.log(ReasoningItem(content="Let me think about this...")) # If your model makes a tool call, log the call and result: tool_name = "submit_answer" tool_args = {"answer": model_response} call_id = "call_001" rollout.log(ToolCall( name=tool_name, content=json.dumps(tool_args), call_id=call_id )) tool_result = session.call_tool(tool_name, tool_args) rollout.log( ToolResult( content=tool_result.blocks[0].text, call_id=call_id ), reward=tool_result.reward, is_finished=tool_result.finished ) finished = tool_result.finished# Flush all pending eventsor_client.rollout.close()print("Rollout recorded successfully!")
After running your code, head to OpenReward and your profile page. Go to the Runs tab to see your recorded runs:Click on a run to see its rollouts:Click on a specific rollout to inspect the full message timeline, including reasoning steps, tool calls, and rewards: