Blog 2 of N: Structured Output and Your First Agent on SAP AI Core

Mohan-Sharma · ‎2026 Mar 23

Blog 2 of N: Structured Output and Your First Agent on SAP AI Core

What Are We Building?

A sales order analysis agent. You ask it about your sales data - "How were sales in January?" - and it fetches the data, calculates summaries (total orders, revenue, top customer), and responds with a typed, structured answer. It streams the response in real time and remembers your conversation.

We'll build this incrementally. Each section teaches a concept (theory), then you apply it to the sales app (lab).

In Blog 1, you built a terminal app that sends a message to an LLM and prints the response. That was a one-shot interaction - no memory, no tools, no structure. This blog takes you from that starting point to a full agent.

Prerequisites

Completed Blog 1 - you have API access to SAP AI Core, uv installed, and know how to create a project with sap-ai-sdk-gen
No new dependencies - sap-ai-sdk-gen already includes LangChain. Everything in this blog works out of the box

Step 1: Create the Project

Same pattern as Blog 1. If any of this is unfamiliar, revisit Blog 1 first.

uv init first-react-agent --package --build-backend hatchling
cd first-react-agent
uv add sap-ai-sdk-gen pydantic-settings

Same dependencies as Blog 1. sap-ai-sdk-gen already includes LangChain - no extra installs needed.

uv init --package creates src/first_react_agent/__init__.py with a default hello world function. Clear it out - we don't need it:

echo "" > src/first_react_agent/__init__.py

Step 2: Reuse Config and Client from Blog 1

Copy config.py and client.py from Blog 1 into src/first_react_agent/. These two files handle authentication and LLM creation - their job hasn't changed.

src/first_react_agent/config.py:

from pydantic_settings import BaseSettings, SettingsConfigDict


class AICoreConfig(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
    )

    aicore_auth_url: str
    aicore_client_id: str
    aicore_client_secret: str
    aicore_resource_group: str = "default"
    aicore_base_url: str
    aicore_model: str
    llm_max_output_tokens: int = 4096

src/first_react_agent/client.py:

from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
from gen_ai_hub.proxy.langchain.init_models import init_llm
from langchain_core.language_models import BaseChatModel

from first_react_agent.config import AICoreConfig


def create_llm(config: AICoreConfig) -> BaseChatModel:
	proxy_client = get_proxy_client(
		proxy_version="gen-ai-hub",
		base_url=config.aicore_base_url,
		auth_url=config.aicore_auth_url,
		client_id=config.aicore_client_id,
		client_secret=config.aicore_client_secret,
		resource_group=config.aicore_resource_group,
	)
	return init_llm(
		model_name=config.aicore_model,
		proxy_client=proxy_client,
		max_tokens=config.llm_max_output_tokens,
	)

create_llm returns a BaseChatModel - LangChain's standard interface for chat models. Everything we build in this blog - messages, tools, agents, streaming, structured output - works through this interface.

Also copy your .env file from Blog 1 into the project root, and add it to .gitignore:

echo ".env" >> .gitignore

Step 3: Create the Sales Data

Create sales.csv in the project root. This is a simple dataset to prove the point - in a real application, your tools would connect to a database, an API, or any other data source. The agent pattern stays the same regardless of where the data comes from.

This is the data our agent will analyze:

order_id,date,customer,product,quantity,unit_price
SO-001,2025-01-05,Acme Corp,Widget A,10,29.99
SO-002,2025-01-12,TechStart Inc,Widget B,5,49.99
SO-003,2025-01-20,Global Ltd,Widget A,15,29.99
SO-004,2025-01-25,Acme Corp,Widget C,8,19.99
SO-005,2025-01-28,Global Ltd,Widget B,16,39.99
SO-006,2025-02-03,TechStart Inc,Widget A,20,29.99
SO-007,2025-02-14,Global Ltd,Widget B,3,49.99
SO-008,2025-02-22,Acme Corp,Widget B,12,49.99
SO-009,2025-03-01,TechStart Inc,Widget C,25,19.99
SO-010,2025-03-10,Global Ltd,Widget A,7,29.99
SO-011,2025-03-18,Acme Corp,Widget A,30,29.99
SO-012,2025-03-25,TechStart Inc,Widget B,10,49.99

Three months of data (January-March 2025). No data for April onward - this will test the "no orders found" path.

Step 4: Messages

Theory

In Blog 1, we used HumanMessage to send a question to the LLM. But LLMs understand four types of messages, each with a different role in the conversation:

Type	Role	Purpose
`SystemMessage`	`system`	Tells the model how to behave and provides context for interactions
`HumanMessage`	`user`	Represents user input and interactions with the model
`AIMessage`	`assistant`	Responses generated by the model, including text content, tool calls, and metadata
`ToolMessage`	`tool`	Represents the outputs of tool calls

Note: You will see the term "system prompt" later in this blog (Step 6). A system prompt and a SystemMessage are the same thing. When you pass system_prompt="You are a sales analyst" to create_agent, it becomes a SystemMessage internally. Different name, same concept.

LangChain accepts messages in three formats. Here is the same conversation in each:

1. Message objects (explicit):

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    SystemMessage("You are a sales analyst. Answer questions about order data concisely."),
    HumanMessage("What was our best selling month?"),
    AIMessage("Based on the data, March had the highest revenue at $2,109.28."),
    HumanMessage("Which customer drove most of that?"),
]

response = llm.invoke(conversation)

Each message is a typed Python object. You can see exactly what role each message plays.

2. Dictionary format (OpenAI-compatible):

conversation = [
    {"role": "system", "content": "You are a sales analyst. Answer questions about order data concisely."},
    {"role": "user", "content": "What was our best selling month?"},
    {"role": "assistant", "content": "Based on the data, March had the highest revenue at $2,109.28."},
    {"role": "user", "content": "Which customer drove most of that?"},
]

response = llm.invoke(conversation)

Same conversation, but using dictionaries with role and content keys.

3. String shortcut (simplest):

response = llm.invoke("What is the average order value for Q1?")

A plain string is automatically wrapped in a HumanMessage.

When the model responds, you get an AIMessage with these key attributes:

Attribute	What It Contains
`text`	The text content of the response
`content`	Raw content (string or list of dicts)
`tool_calls`	List of tool calls the model wants to make (empty if none)
`usage_metadata`	Token counts: input, output, total
`response_metadata`	Provider-specific response data

ToolMessage is how you send tool results back to the model. It has three required fields: content (the tool's output as a string), tool_call_id (must match the ID from the model's tool call), and name (the tool that was called):

from langchain_core.messages import ToolMessage

tool_message = ToolMessage(
    content="Total Orders: 4, Total Revenue: $1,159.62",
    tool_call_id="call_456",
    name="calculate_sales_summary",
)

Lab

No code to write yet. The sales app will use dictionary format for user messages (Step eight) and message objects internally. You now understand the building blocks.

Step 5: Tools

Theory

An LLM by itself can only generate text. It cannot query a database, read a CSV file, or call an API. Tools are Python functions that extend what an LLM can do. You define a function, and the LLM decides when to call it based on the user's question.

The simplest way to create a tool is with the @tool decorator:

from langchain.tools import tool


@tool
def lookup_customer(customer_id: str, include_history: bool = False) -> str:
    """Look up a customer record by their ID.

    Args:
        customer_id: The unique customer identifier
        include_history: Whether to include order history
    """
    return f"Customer {customer_id}: Acme Corp, active since 2023"

Three things matter:

Type hints are required - they define the tool's input schema. The LLM knows customer_id must be a string and include_history must be a boolean
The docstring is what the LLM reads to decide when to call this tool. Make it clear and concise
The return value is what goes back to the LLM as the tool's observation

You can customize the tool name and description:

@tool("order_lookup")
def find_order(order_id: str) -> str:
    """Find a sales order by its ID."""
    return f"Order {order_id}: 10x Widget A, $299.90"

@tool("revenue_report", description="Generate a revenue report for a given time period.")
def report(period: str) -> str:
    """Generate revenue report."""
    return f"Revenue for {period}: $4,618.55"

Use snake_case for tool names - avoid spaces or special characters for provider compatibility.

For tools with complex inputs, define a Pydantic model as the input schema:

from pydantic import BaseModel, Field
from typing import Literal


class OrderFilter(BaseModel):
    """Filters for querying sales orders."""
    customer: str = Field(description="Customer name to filter by")
    status: Literal["open", "closed", "cancelled"] = Field(
        default="open",
        description="Order status filter",
    )


@tool(args_schema=OrderFilter)
def filter_orders(customer: str, status: str = "open") -> str:
    """Filter sales orders by customer and status."""
    return f"Found 3 {status} orders for {customer}"

Field(description=...) tells the LLM what each parameter means. Literal restricts values to a fixed set of choices.

Two parameter names are reserved and cannot be used as tool arguments: config (reserved for RunnableConfig) and runtime (reserved for ToolRuntime).

Tool error handling - when tools fail, you want the error to go back to the LLM so it can retry, not crash your app. Configure via ToolNode:

from langgraph.prebuilt import ToolNode

# Default: catch invocation errors, re-raise execution errors
tool_node = ToolNode(tools)

# Catch all errors and return error message to LLM
tool_node = ToolNode(tools, handle_tool_errors=True)

# Custom error message
tool_node = ToolNode(tools, handle_tool_errors="Something went wrong, please try again.")

# Custom error handler function
def handle_error(e: ValueError) -> str:
    return f"Invalid input: {e}"

tool_node = ToolNode(tools, handle_tool_errors=handle_error)

# Only catch specific exception types
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))

When handle_tool_errors=True, the error message is sent back to the LLM as a ToolMessage instead of crashing. The LLM sees what went wrong and can try again with different inputs.

Lab

Create src/first_react_agent/tools.py. These are the two tools our sales agent will use. The key design: Tool 1's output is Tool 2's input - the agent chains them.

import csv
import io
from datetime import datetime
from pathlib import Path

from langchain.tools import tool


@tool
def get_monthly_sales(month: str, year: int) -> str:
    """Load all sales orders for a given month and year from the sales database.

    Args:
        month: Full month name (e.g., 'January', 'February')
        year: The year (e.g., 2025)
    """
    csv_path = Path(__file__).parent.parent.parent / "sales.csv"

    with open(csv_path) as f:
        reader = csv.DictReader(f)
        rows = [
            row for row in reader
            if datetime.strptime(row["date"], "%Y-%m-%d").strftime("%B") == month
            and datetime.strptime(row["date"], "%Y-%m-%d").year == year
        ]

    if not rows:
        return f"No orders found for {month} {year}."

    header = "order_id,date,customer,product,quantity,unit_price"
    lines = [header]
    for r in rows:
        lines.append(
            f"{r['order_id']},{r['date']},{r['customer']},{r['product']},{r['quantity']},{r['unit_price']}"
        )
    return "\n".join(lines)


@tool
def calculate_sales_summary(sales_data: str) -> str:
    """Calculate summary statistics from sales data. Use this after getting sales data with get_monthly_sales.

    Args:
        sales_data: CSV-formatted sales data with columns: order_id, date, customer, product, quantity, unit_price
    """
    reader = csv.DictReader(io.StringIO(sales_data))
    rows = list(reader)

    if not rows:
        return "No data to summarize."

    total_orders = len(rows)
    total_revenue = sum(int(r["quantity"]) * float(r["unit_price"]) for r in rows)
    avg_order_value = total_revenue / total_orders

    customer_revenue: dict[str, float] = {}
    for r in rows:
        revenue = int(r["quantity"]) * float(r["unit_price"])
        customer_revenue[r["customer"]] = customer_revenue.get(r["customer"], 0) + revenue
    top_customer = max(customer_revenue, key=customer_revenue.get)

    return (
        f"Total Orders: {total_orders}\n"
        f"Total Revenue: ${total_revenue:,.2f}\n"
        f"Average Order Value: ${avg_order_value:,.2f}\n"
        f"Top Customer: {top_customer} (${customer_revenue[top_customer]:,.2f})"
    )

How the chaining works:

User asks: "How were sales in January 2025?"
Agent calls get_monthly_sales("January", 2025) - returns CSV rows
Agent sees the raw data, calls calculate_sales_summary(raw_csv_data) - returns computed stats
Agent responds to the user with the summary

If get_monthly_sales returns "No orders found for April 2025", the agent sees that and responds directly - no need to call calculate_sales_summary.

Notice how calculate_sales_summary's docstring says "Use this after getting sales data with get_monthly_sales." This guides the LLM to chain the tools in the right order.

Path(__file__).parent.parent.parent navigates from tools.py -> first_react_agent/ -> src/ -> project root where sales.csv lives.

Step 6: System Prompts

Theory

A system prompt tells the model how to behave. There are three ways to set it, from simple to advanced.

1. String (simplest):

from langchain.agents import create_agent

agent = create_agent(
    model,
    tools,
    system_prompt="You are a helpful assistant. Be concise and accurate.",
)

A plain string. When omitted, the agent infers its task from the messages directly.

2. SystemMessage (advanced features):

For models that support it, SystemMessage allows structured content with features like cache control:

from langchain.agents import create_agent
from langchain_core.messages import SystemMessage, HumanMessage

product_agent = create_agent(
    model,
    tools=[],
    system_prompt=SystemMessage(
        content=[
            {
                "type": "text",
                "text": "You are an assistant that answers questions about our product catalog.",
            },
            {
                "type": "text",
                "text": "<entire product catalog CSV - thousands of rows>",
                "cache_control": {"type": "ephemeral"},
            },
        ]
    ),
)

result = product_agent.invoke(
    {"messages": [HumanMessage("Which products have the highest margin?")]}
)

The cache_control field with {"type": "ephemeral"} tells Anthropic to cache that content block, reducing latency and costs on repeated calls.

3. Dynamic system prompt with @Dynamic_prompt:

Sometimes the system prompt needs to change based on who is using the agent or what state the conversation is in. The @Dynamic_prompt decorator generates the prompt at runtime:

from typing import TypedDict

from langchain.agents import create_agent
from langchain.agents.middleware import dynamic_prompt, ModelRequest


class Context(TypedDict):
    user_role: str


@dynamic_prompt
def user_role_prompt(request: ModelRequest) -> str:
    """Generate system prompt based on user role."""
    user_role = request.runtime.context.get("user_role", "user")
    base_prompt = "You are a helpful assistant."

    if user_role == "expert":
        return f"{base_prompt} Provide detailed technical responses."
    elif user_role == "beginner":
        return f"{base_prompt} Explain concepts simply and avoid jargon."

    return base_prompt


agent = create_agent(
    model,
    tools=[get_monthly_sales, calculate_sales_summary],
    middleware=[user_role_prompt],
    context_schema=Context,
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Show me this quarter's sales breakdown"}]},
    context={"user_role": "expert"},
)

The @Dynamic_prompt function receives a ModelRequest with access to runtime.context - immutable configuration you pass at invocation time. The context_schema parameter defines what shape that context takes.

Lab

Our sales agent uses a string system prompt. We will wire it in the next step when we create the agent.

Step 7: Your First Agent

Theory

Think of an agent as an LLM that can do things, not just talk. Instead of only generating text, an agent can call tools (Python functions you define), look at the results, and decide what to do next - all on its own. It keeps going until it has enough information to give you a proper answer.

create_agent provides a production-ready agent implementation.

An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - when the model emits a final output or an iteration limit is reached.
ReAct Loop

Walking through the diagram:

Your input (the user's question) goes to the model
The model reasons about what it needs and makes a decision:
- action - it needs more information, so it calls a tool
- finish - it has enough information to answer
If it picks action, the tools node executes the tool and returns an observation (the tool's result) back to the model
The model sees the observation and decides again: call another tool (action) or answer (finish)
When it picks finish, the output is returned to the user

This is the ReAct pattern (Reasoning + Acting) - the model reasons about each step, acts by calling tools, observes the results, and repeats until done.

How `create_agent` Works Under the Hood

create_agent builds a graph-based agent runtime using LangGraph. A graph consists of nodes (steps) and edges (connections) that define how your agent processes information. The agent moves through this graph, executing nodes like the model node (which calls the model), the tools node (which executes tools), or middleware.

You do not need to understand LangGraph to use create_agent - it handles the graph construction for you. We will explore the Graph API in depth in Blog 4.

`create_agent` Parameters

create_agent brings together a model, tools, and a system prompt into a single callable agent:

Parameter	What It Does
`model`	The LLM - a model instance (our `BaseChatModel` from `init_llm`) or a string identifier
`tools`	List of tool functions decorated with `@tool`
`system_prompt`	Optional string or `SystemMessage` directing agent behavior
`response_format`	Optional structured output configuration (Step eight)
`middleware`	Optional list of middleware for customizing execution
`checkpointer`	Optional checkpointer for conversation memory (Step 9)

You invoke the agent by passing messages:

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Summarize our January 2025 sales"}]}
)

You can use message objects instead of dicts:

from langchain_core.messages import HumanMessage

result = agent.invoke(
    {"messages": [HumanMessage("Summarize our January 2025 sales")]}
)

Lab

Create src/first_react_agent/agent.py:

from langchain.agents import create_agent

from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary


def create_app():
    config = AICoreConfig()
    llm = create_llm(config)

    agent = create_agent(
        llm,
        tools=[get_monthly_sales, calculate_sales_summary],
        system_prompt=(
            "You are a sales analyst assistant. You can only help with sales data queries. "
            "The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
            "When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
            "then pass that data to calculate_sales_summary to compute the statistics. "
            "Always report the summary back to the user. "
            "If the user asks about anything other than sales data, politely let them know you can only help with sales queries."
        ),
    )

    return agent

The system prompt does three important things:

Scopes the agent - "You can only help with sales data queries" and "politely let them know you can only help with sales queries" keeps the agent focused
Handles missing context - "If the user does not specify a year, assume 2025" prevents the agent from asking the user for the year every time
Guides tool chaining - "first use get_monthly_sales to fetch the raw data, then pass that data to calculate_sales_summary" tells the agent the correct order

The agent follows the ReAct loop:

User: "How were sales in January 2025?"
Agent reasons: I need to fetch January sales -> calls get_monthly_sales("January", 2025)
Agent sees the raw CSV data -> calls calculate_sales_summary(raw_data)
Agent sees the summary -> responds to the user

If the user asks about a month with no data (e.g., April), get_monthly_sales returns "No orders found for April 2025." The agent sees this and responds directly without calling calculate_sales_summary.

Step 8: Structured Output

Theory

By default, LLMs return free text. But what if you need the response in a specific format - a Python object with typed fields you can use in your code? That is structured output.

create_agent accepts a response_format parameter that constrains the agent's final response to match a schema you define. The structured response is returned in the structured_response key of the result.

ProviderStrategy (preferred):

ProviderStrategy uses the model provider's native structured output generation. This is more reliable because the provider enforces the schema during generation. Supported by OpenAI, Anthropic Claude, Gemini, and xAI Grok.

from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain.agents.structured_output import ProviderStrategy


class OrderSummary(BaseModel):
    """Summary of a customer order."""
    customer: str = Field(description="The customer name")
    total_items: int = Field(description="Total number of items ordered")
    total_amount: float = Field(description="Total order amount in dollars")


agent = create_agent(
    llm,
    tools=[],
    response_format=ProviderStrategy(OrderSummary),
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Summarize this order: Acme Corp bought 10 Widget A at $29.99 and 5 Widget B at $49.99"}]
})

print(result["structured_response"])
# OrderSummary(customer='Acme Corp', total_items=15, total_amount=549.85)

result["structured_response"] is a validated Pydantic instance - not a string. You access .customer, .total_items, .total_amount directly.

ToolStrategy (fallback):

Not all models support native structured output. ToolStrategy is the fallback - it uses tool calling to generate structured responses. It works with any model that supports tool calling:

from langchain.agents.structured_output import ToolStrategy

agent = create_agent(
    llm,
    tools=[],
    response_format=ToolStrategy(OrderSummary),
)

Strategy	When to Use
`ProviderStrategy`	Your model supports native structured output (OpenAI, Anthropic Claude, Gemini, xAI Grok). More reliable
`ToolStrategy`	Your model only supports tool calling but not native structured output. Works as a fallback

Shortcut: Pass the schema directly without wrapping it in a strategy. LangChain automatically selects ProviderStrategy if the model supports it, falling back to ToolStrategy otherwise:

agent = create_agent(llm, response_format=OrderSummary)

Both strategies support multiple schema types:

Schema Type	Returns
Pydantic `BaseModel`	Validated Pydantic instance
`dataclass`	Dictionary
`TypedDict`	Dictionary
JSON Schema dict	Dictionary

ToolStrategy has a handle_errors parameter that controls what happens when the model returns invalid data:

# Default: catch all errors
ToolStrategy(schema=OrderSummary, handle_errors=True)

# Custom error message
ToolStrategy(schema=OrderSummary, handle_errors="Please provide a valid order summary with all required fields.")

# Custom error handler function
def custom_handler(error: Exception) -> str:
    return f"Error: {str(error)}. Please try again."

ToolStrategy(schema=OrderSummary, handle_errors=custom_handler)

# No error handling - exceptions propagate
ToolStrategy(schema=OrderSummary, handle_errors=False)

When errors are handled, the error message is sent back to the LLM so it can retry with corrected output.

SAP AI Core note:ProviderStrategy does not work with sap-ai-sdk-gen when streaming is enabled. The SAP AI SDK injects a deployment_id parameter into all API calls. ProviderStrategy uses the provider's beta.chat.completions.stream() endpoint which rejects deployment_id, causing a TypeError. Use ToolStrategy instead - it goes through the standard API path which handles deployment_id correctly. On SAP AI Core, always wrap your schema explicitly: response_format=ToolStrategy(OrderSummary).

Lab

Create src/first_react_agent/schemas.py. This defines the typed structure for our sales summary:

from pydantic import BaseModel, Field


class SalesSummary(BaseModel):
    """Summary statistics for a month of sales data."""
    month: str = Field(description="The month name")
    year: int = Field(description="The year")
    total_orders: int = Field(description="Total number of orders")
    total_revenue: float = Field(description="Total revenue in dollars")
    average_order_value: float = Field(description="Average revenue per order")
    top_customer: str = Field(description="Customer with the highest total revenue")

Update src/first_react_agent/agent.py to use it:

from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy

from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.schemas import SalesSummary
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary


def create_app():
    config = AICoreConfig()
    llm = create_llm(config)

    agent = create_agent(
        llm,
        tools=[get_monthly_sales, calculate_sales_summary],
        system_prompt=(
            "You are a sales analyst assistant. You can only help with sales data queries. "
            "The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
            "When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
            "then pass that data to calculate_sales_summary to compute the statistics. "
            "Always report the summary back to the user. "
            "If the user asks about anything other than sales data, politely let them know you can only help with sales queries."
        ),
        response_format=ToolStrategy(SalesSummary),
    )

    return agent

We use ToolStrategy(SalesSummary) explicitly instead of passing SalesSummary directly. As noted above, passing the schema directly auto-selects ProviderStrategy which does not work with SAP AI Core. ToolStrategy uses tool calling to enforce the schema, which works correctly.

Now when the agent responds, result["structured_response"] is a validated SalesSummary object. You can access result["structured_response"].total_orders, .total_revenue, etc.

Note: We added "politely let them know you can only help with sales queries" to the system prompt. This works when the agent generates free text (which we will do in Step 10 with streaming). However, with structured output enabled, the agent is forced to return a SalesSummary object for every response, even for off-topic questions. This is because create_agent internally sets tool_choice="any" when response_format is set, meaning the model must always produce a tool call (the structured schema), never plain text. So for now, off-topic questions will still get a SalesSummary response with placeholder values. We will revisit this tradeoff in Step 10.

Step 9: Memory - Wiring It All Up

Theory

Without memory, every call to the agent is independent, it forgets everything from previous turns. Ask about January sales, then ask "How does that compare to February?" and the agent has no idea what "that" refers to. Short-term memory fixes this by letting the agent remember the conversation within a session.

Memory requires two things:

A checkpointer - persists the conversation state to storage
A thread ID - identifies which conversation thread to load/save

InMemorySaver (development) - stores state in a Python dictionary. Fast, zero setup, but lost when the process stops:

from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver

agent = create_agent(
    llm,
    tools=[get_monthly_sales, calculate_sales_summary],
    checkpointer=InMemorySaver(),
)

Thread IDs - pass a thread_id in the config to identify the conversation:

config = {"configurable": {"thread_id": "1"}}

# First message
agent.invoke(
    {"messages": [{"role": "user", "content": "How were sales in January 2025?"}]},
    config,
)

# Second message - agent remembers the previous answer
agent.invoke(
    {"messages": [{"role": "user", "content": "How does that compare to February?"}]},
    config,
)

The agent remembers the January context because both calls use the same thread_id. Change the thread ID and it starts a fresh conversation.

Production checkpointers - InMemorySaver is for development only. For production, you need a persistent checkpointer that survives process restarts.

In an SAP landscape, use langgraph-checkpoint-hana to persist conversation state in SAP HANA Cloud:

pip install langgraph-checkpoint-hana

from langgraph_checkpoint_hana import HANASaver

with HANASaver.from_conn_info(
    address="your-instance.hanacloud.ondemand.com",
    port=443,
    user="DBADMIN",
    password="your-password",
) as checkpointer:
    agent = create_agent(
        llm,
        tools=[get_monthly_sales, calculate_sales_summary],
        checkpointer=checkpointer,
    )

HANASaver also supports environment variables (HANASaver.from_env()) and existing hdbcli connections. It creates LANGGRAPH_CHECKPOINTS and LANGGRAPH_CHECKPOINT_WRITES tables automatically.

Managing long conversations - long conversations can exceed the model's context window. Use SummarizationMiddleware to replace older messages with a summary:

from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    llm,
    tools=[],
    middleware=[
        SummarizationMiddleware(
            model=llm,
            trigger=("tokens", 4000),
            keep=("messages", 20),
        )
    ],
    checkpointer=InMemorySaver(),
)

This kicks in when the conversation exceeds 4000 tokens, keeping the last 20 messages and summarizing the rest.

Lab

Time to wire everything together. We will add memory to the agent, create the entry point, register a CLI command, and run the whole thing.

1. Update src/first_react_agent/agent.py - add InMemorySaver:

from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from langgraph.checkpoint.memory import InMemorySaver

from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.schemas import SalesSummary
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary


def create_app():
    config = AICoreConfig()
    llm = create_llm(config)

    agent = create_agent(
        llm,
        tools=[get_monthly_sales, calculate_sales_summary],
        system_prompt=(
            "You are a sales analyst assistant. "
            "The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
            "When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
            "then pass that data to calculate_sales_summary to compute the statistics. "
            "Always report the summary back to the user. "
            "If the user asks something unrelated to sales data, politely decline and explain that you can only assist with sales queries."
        ),
        response_format=ToolStrategy(SalesSummary),
        checkpointer=InMemorySaver(),
    )

    return agent

The only change from Step 8 is checkpointer=InMemorySaver(). This tells the agent to save conversation state after every turn. We also imported InMemorySaver from langgraph.checkpoint.memory.

2. Create src/first_react_agent/main.py - the entry point:

from first_react_agent.agent import create_app


def main():
    agent = create_app()
    thread_config = {"configurable": {"thread_id": "1"}}

    print("Sales Agent ready. Ask about sales data. Type 'quit' to exit.\n")
    print("Try: 'How were sales in January 2025?' or 'Show me April 2025 sales'\n")

    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break

        result = agent.invoke(
            {"messages": [{"role": "user", "content": user_input}]},
            thread_config,
        )

        summary = result["structured_response"]
        print(f"\nAssistant: {summary}\n")


if __name__ == "__main__":
    main()

A few things to note:

agent.invoke() sends the user message and waits for the complete response. No streaming, we will get to that in Step 10.
thread_config passes the thread_id so the checkpointer knows which conversation to load/save. Every call with the same thread_id shares the same history. The agent can now understand follow-up questions like "How does that compare to February?" because it remembers the January context.
result["structured_response"] is the validated SalesSummary Pydantic object we defined in Step 8. Because we set response_format=ToolStrategy(SalesSummary), the agent always returns structured output in this key.

3. Update pyproject.toml - register a CLI command so we can run the agent with uv run agent:

[project]
name = "first-react-agent"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "pydantic-settings>=2.13.1",
    "sap-ai-sdk-gen>=6.6.0",
]

[project.scripts]
agent = "first_react_agent.main:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

The [project.scripts] section maps the command agent to first_react_agent.main:main. When you run uv run agent, uv installs the project in development mode and calls the main() function.

4. Run it:

uv run agent

Try these prompts:

How were sales in January 2025? - agent fetches data, computes summary, returns structured output
Show me April 2025 sales - agent finds no orders, returns summary with zero values
How does that compare to February? - agent uses memory to understand "that" refers to the previous month

You will see output like this:

You: How were sales in January 2025?
Assistant: month='January' year=2025 total_orders=5 total_revenue=1799.46 average_order_value=359.89 top_customer='Global Ltd'

It works, but... that output is not exactly user-friendly. You get a raw Pydantic object dump - field names, values, no formatting. That is the tradeoff with structured output: your code gets clean, typed data to work with, but the raw representation is not meant for humans to read.

You could format the SalesSummary object yourself before printing (e.g., f"Month: {summary.month}, Revenue: ${summary.total_revenue:,.2f}"), and that is the right approach when you are building an API or a UI that consumes the structured data programmatically.

But what if you want the agent to speak naturally - streaming text word by word as it thinks through the answer, like a real conversation? That is what Step 10 is about.

Why can't we stream with structured output?

When response_format is set, create_agent internally sets tool_choice="any". This forces the model to always make a tool call, specifically, a call to the structured output schema tool. The model never produces free text. Since streaming shows the model's text output token by token, and there is no text output (only tool call arguments), there is nothing to stream. AIMessageChunk.text is always empty.

Streaming and structured output are mutually exclusive. You pick one:

Mode	You Get	You Lose
Structured output (`response_format`)	Typed Pydantic objects, programmatic access	Real-time streaming, natural language responses
Streaming (no `response_format`)	Real-time token-by-token output, natural conversation	Typed schema enforcement

If your use case is an API that feeds data into a dashboard, structured output is the right choice. If your use case is a conversational terminal agent, streaming feels better. Step 10 shows you how to switch.

Step 10: Streaming - Real-Time Responses

Theory

Instead of waiting for the complete response, streaming gives you output as it is generated, word by word, in real time. This makes the agent feel responsive and conversational.

LangChain agents support three stream modes:

Mode	What It Streams
`updates`	State updates after each agent step
`messages`	Tuples of (token, metadata) from LLM invocations
`custom`	Custom data from inside graph nodes using the stream writer

Agent progress streaming - see each step the agent takes:

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
    stream_mode="updates",
    version="v2",
):
    if chunk["type"] == "updates":
        for step, data in chunk["data"].items():
            print(f"Step: {step}")

Token streaming - get individual tokens as they are generated:

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
    stream_mode="messages",
    version="v2",
):
    if chunk["type"] == "messages":
        token, metadata = chunk["data"]
        print(f"Node: {metadata['langgraph_node']}")

Combining modes - pass multiple modes as a list:

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
    stream_mode=["updates", "messages"],
    version="v2",
):
    if chunk["type"] == "updates":
        for step, data in chunk["data"].items():
            print(f"Step: {step}")
    elif chunk["type"] == "messages":
        token, metadata = chunk["data"]
        if isinstance(token, AIMessageChunk) and token.text:
            print(token.text, end="", flush=True)

Direct model streaming (without agent) - you can also stream directly from the chat model:

for chunk in llm.stream("List three strategies to increase Q2 sales"):
    print(chunk.text, end="|", flush=True)

Each chunk is an AIMessageChunk. Accumulate them to build the full response:

full = None
for chunk in llm.stream("What factors affect average order value?"):
    full = chunk if full is None else full + chunk
    print(full.text)

Filtering stream output - when using stream_mode="messages", the stream contains all message types: AIMessageChunk (model text and tool calls), ToolMessage (raw tool output), and others. For a clean chat experience, you only want the model's final text, not the raw CSV data from tools or the tool call metadata. Use isinstance(token, AIMessageChunk) and token.text to filter:

isinstance(token, AIMessageChunk) - only model output, skips ToolMessage and other types
token.text - only chunks with actual text content, skips tool call chunks (which have empty text but populated tool_calls)

Without this filter, you would see the raw CSV sales data dumped into the terminal alongside the agent's response.

Lab

To enable streaming, we need two changes: remove response_format from the agent (so the model can generate free text), and switch from invoke() to stream() in the entry point.

1. Update src/first_react_agent/agent.py - remove response_format:

from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver

from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary


def create_app():
    config = AICoreConfig()
    llm = create_llm(config)

    agent = create_agent(
        llm,
        tools=[get_monthly_sales, calculate_sales_summary],
        system_prompt=(
            "You are a sales analyst assistant. "
            "The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
            "When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
            "then pass that data to calculate_sales_summary to compute the statistics. "
            "Always report the summary back to the user. "
            "If the user asks something unrelated to sales data, politely decline and explain that you can only assist with sales queries."
        ),
        checkpointer=InMemorySaver(),
    )

    return agent

We removed two things: the response_format=ToolStrategy(SalesSummary) parameter and the SalesSummary/ToolStrategy imports. Without response_format, the model is free to generate natural text, which is what we need for streaming.

Notice the system prompt still includes "politely decline and explain that you can only assist with sales queries." Without structured output forcing tool_choice="any", this instruction now actually works, the agent can respond with plain text to decline off-topic questions.

2. Update src/first_react_agent/main.py - switch to streaming:

from langchain_core.messages import AIMessageChunk

from first_react_agent.agent import create_app


def main():
    agent = create_app()
    thread_config = {"configurable": {"thread_id": "1"}}

    print("Sales Agent ready. Ask about sales data. Type 'quit' to exit.\n")
    print("Try: 'How were sales in January 2025?' or 'Show me April 2025 sales'\n")

    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break

        print("\nAssistant: ", end="")

        for chunk in agent.stream(
            {"messages": [{"role": "user", "content": user_input}]},
            thread_config,
            stream_mode="messages",
            version="v2",
        ):
            if chunk["type"] == "messages":
                token, metadata = chunk["data"]
                if isinstance(token, AIMessageChunk) and token.text:
                    print(token.text, end="", flush=True)

        print("\n")


if __name__ == "__main__":
    main()

The key changes from Step 9's main.py:

agent.stream() instead of agent.invoke() - returns chunks as they are generated instead of waiting for the complete response
stream_mode="messages" - gives us token-by-token output as (token, metadata) tuples
isinstance(token, AIMessageChunk) and token.text - filters to only show the model's text output. Without this, you would also see raw tool results (CSV data) and tool call metadata in the terminal
print(token.text, end="", flush=True) - prints each token immediately without a newline, and flush=True forces it to appear in the terminal without buffering
No more result["structured_response"] - since we removed response_format, there is no structured response. The agent speaks in natural language.

3. Run it:

uv run agent

Now the agent streams its response in real time. The agent streams each word as it is generated. You see the response build up in real time - much more natural than the Pydantic dump from Step 9. And because we kept the checkpointer and thread_config, memory still works:

streaming demo

The agent remembered that "that" refers to January - no need to repeat the context.

Final Project Structure

first-react-agent/
+-- .env                          # SAP AI Core credentials (never commit)
+-- .gitignore
+-- .python-version
+-- pyproject.toml                # Dependencies, scripts, build config
+-- uv.lock                       # Pinned dependency versions
+-- sales.csv                     # Sales data
+-- README.md
+-- src/
    +-- first_react_agent/
        +-- __init__.py
        +-- config.py             # Loads and validates .env credentials (from Blog 1)
        +-- client.py             # Creates authenticated LLM connection (from Blog 1)
        +-- schemas.py            # Pydantic models for structured output
        +-- tools.py              # Sales data tools (fetch + summarize)
        +-- agent.py              # Agent creation and configuration
        +-- main.py               # Entry point - chat loop

What's Next?

Blog 3: Build a Chat App for Your Sales Agent with SAPUI5, FastAPI & Real-Time Streaming

Your First SAP AI Core Agent with Structured Output, Tools, Memory, and Streaming

Blog 2 of N: Structured Output and Your First Agent on SAP AI Core

What Are We Building?

Prerequisites

Step 1: Create the Project

Step 2: Reuse Config and Client from Blog 1

Step 3: Create the Sales Data

Step 4: Messages

Theory

Lab

Step 5: Tools

Theory

Lab

Step 6: System Prompts

Theory

Lab

Step 7: Your First Agent

Theory

How create_agent Works Under the Hood

create_agent Parameters

Lab

Step 8: Structured Output

Theory

Lab

Step 9: Memory - Wiring It All Up

Theory

Lab

Step 10: Streaming - Real-Time Responses

Theory

Lab

Final Project Structure

What's Next?

How `create_agent` Works Under the Hood

`create_agent` Parameters