A sales order analysis agent. You ask it about your sales data - "How were sales in January?" - and it fetches the data, calculates summaries (total orders, revenue, top customer), and responds with a typed, structured answer. It streams the response in real time and remembers your conversation.
We'll build this incrementally. Each section teaches a concept (theory), then you apply it to the sales app (lab).
In Blog 1, you built a terminal app that sends a message to an LLM and prints the response. That was a one-shot interaction - no memory, no tools, no structure. This blog takes you from that starting point to a full agent.
sap-ai-sdk-gensap-ai-sdk-gen already includes LangChain. Everything in this blog works out of the boxSame pattern as Blog 1. If any of this is unfamiliar, revisit Blog 1 first.
uv init first-react-agent --package --build-backend hatchling
cd first-react-agent
uv add sap-ai-sdk-gen pydantic-settingsSame dependencies as Blog 1. sap-ai-sdk-gen already includes LangChain - no extra installs needed.
uv init --package creates src/first_react_agent/__init__.py with a default hello world function. Clear it out - we don't need it:
echo "" > src/first_react_agent/__init__.pyCopy config.py and client.py from Blog 1 into src/first_react_agent/. These two files handle authentication and LLM creation - their job hasn't changed.
src/first_react_agent/config.py:
from pydantic_settings import BaseSettings, SettingsConfigDict
class AICoreConfig(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
)
aicore_auth_url: str
aicore_client_id: str
aicore_client_secret: str
aicore_resource_group: str = "default"
aicore_base_url: str
aicore_model: str
llm_max_output_tokens: int = 4096src/first_react_agent/client.py:
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
from gen_ai_hub.proxy.langchain.init_models import init_llm
from langchain_core.language_models import BaseChatModel
from first_react_agent.config import AICoreConfig
def create_llm(config: AICoreConfig) -> BaseChatModel:
proxy_client = get_proxy_client(
proxy_version="gen-ai-hub",
base_url=config.aicore_base_url,
auth_url=config.aicore_auth_url,
client_id=config.aicore_client_id,
client_secret=config.aicore_client_secret,
resource_group=config.aicore_resource_group,
)
return init_llm(
model_name=config.aicore_model,
proxy_client=proxy_client,
max_tokens=config.llm_max_output_tokens,
)create_llm returns a BaseChatModel - LangChain's standard interface for chat models. Everything we build in this blog - messages, tools, agents, streaming, structured output - works through this interface.
Also copy your .env file from Blog 1 into the project root, and add it to .gitignore:
echo ".env" >> .gitignoreCreate sales.csv in the project root. This is a simple dataset to prove the point - in a real application, your tools would connect to a database, an API, or any other data source. The agent pattern stays the same regardless of where the data comes from.
This is the data our agent will analyze:
order_id,date,customer,product,quantity,unit_price
SO-001,2025-01-05,Acme Corp,Widget A,10,29.99
SO-002,2025-01-12,TechStart Inc,Widget B,5,49.99
SO-003,2025-01-20,Global Ltd,Widget A,15,29.99
SO-004,2025-01-25,Acme Corp,Widget C,8,19.99
SO-005,2025-01-28,Global Ltd,Widget B,16,39.99
SO-006,2025-02-03,TechStart Inc,Widget A,20,29.99
SO-007,2025-02-14,Global Ltd,Widget B,3,49.99
SO-008,2025-02-22,Acme Corp,Widget B,12,49.99
SO-009,2025-03-01,TechStart Inc,Widget C,25,19.99
SO-010,2025-03-10,Global Ltd,Widget A,7,29.99
SO-011,2025-03-18,Acme Corp,Widget A,30,29.99
SO-012,2025-03-25,TechStart Inc,Widget B,10,49.99Three months of data (January-March 2025). No data for April onward - this will test the "no orders found" path.
In Blog 1, we used HumanMessage to send a question to the LLM. But LLMs understand four types of messages, each with a different role in the conversation:
| Type | Role | Purpose |
SystemMessage | system | Tells the model how to behave and provides context for interactions |
HumanMessage | user | Represents user input and interactions with the model |
AIMessage | assistant | Responses generated by the model, including text content, tool calls, and metadata |
ToolMessage | tool | Represents the outputs of tool calls |
Note: You will see the term "system prompt" later in this blog (Step 6). A system prompt and a SystemMessage are the same thing. When you pass system_prompt="You are a sales analyst" to create_agent, it becomes a SystemMessage internally. Different name, same concept.
LangChain accepts messages in three formats. Here is the same conversation in each:
1. Message objects (explicit):
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
conversation = [
SystemMessage("You are a sales analyst. Answer questions about order data concisely."),
HumanMessage("What was our best selling month?"),
AIMessage("Based on the data, March had the highest revenue at $2,109.28."),
HumanMessage("Which customer drove most of that?"),
]
response = llm.invoke(conversation)Each message is a typed Python object. You can see exactly what role each message plays.
2. Dictionary format (OpenAI-compatible):
conversation = [
{"role": "system", "content": "You are a sales analyst. Answer questions about order data concisely."},
{"role": "user", "content": "What was our best selling month?"},
{"role": "assistant", "content": "Based on the data, March had the highest revenue at $2,109.28."},
{"role": "user", "content": "Which customer drove most of that?"},
]
response = llm.invoke(conversation)Same conversation, but using dictionaries with role and content keys.
3. String shortcut (simplest):
response = llm.invoke("What is the average order value for Q1?")A plain string is automatically wrapped in a HumanMessage.
When the model responds, you get an AIMessage with these key attributes:
| Attribute | What It Contains |
text | The text content of the response |
content | Raw content (string or list of dicts) |
tool_calls | List of tool calls the model wants to make (empty if none) |
usage_metadata | Token counts: input, output, total |
response_metadata | Provider-specific response data |
ToolMessage is how you send tool results back to the model. It has three required fields: content (the tool's output as a string), tool_call_id (must match the ID from the model's tool call), and name (the tool that was called):
from langchain_core.messages import ToolMessage
tool_message = ToolMessage(
content="Total Orders: 4, Total Revenue: $1,159.62",
tool_call_id="call_456",
name="calculate_sales_summary",
)No code to write yet. The sales app will use dictionary format for user messages (Step eight) and message objects internally. You now understand the building blocks.
An LLM by itself can only generate text. It cannot query a database, read a CSV file, or call an API. Tools are Python functions that extend what an LLM can do. You define a function, and the LLM decides when to call it based on the user's question.
The simplest way to create a tool is with the @tool decorator:
from langchain.tools import tool
@tool
def lookup_customer(customer_id: str, include_history: bool = False) -> str:
"""Look up a customer record by their ID.
Args:
customer_id: The unique customer identifier
include_history: Whether to include order history
"""
return f"Customer {customer_id}: Acme Corp, active since 2023"Three things matter:
customer_id must be a string and include_history must be a booleanYou can customize the tool name and description:
@tool("order_lookup")
def find_order(order_id: str) -> str:
"""Find a sales order by its ID."""
return f"Order {order_id}: 10x Widget A, $299.90"
@tool("revenue_report", description="Generate a revenue report for a given time period.")
def report(period: str) -> str:
"""Generate revenue report."""
return f"Revenue for {period}: $4,618.55"Use snake_case for tool names - avoid spaces or special characters for provider compatibility.
For tools with complex inputs, define a Pydantic model as the input schema:
from pydantic import BaseModel, Field
from typing import Literal
class OrderFilter(BaseModel):
"""Filters for querying sales orders."""
customer: str = Field(description="Customer name to filter by")
status: Literal["open", "closed", "cancelled"] = Field(
default="open",
description="Order status filter",
)
@tool(args_schema=OrderFilter)
def filter_orders(customer: str, status: str = "open") -> str:
"""Filter sales orders by customer and status."""
return f"Found 3 {status} orders for {customer}"Field(description=...) tells the LLM what each parameter means. Literal restricts values to a fixed set of choices.
Two parameter names are reserved and cannot be used as tool arguments: config (reserved for RunnableConfig) and runtime (reserved for ToolRuntime).
Tool error handling - when tools fail, you want the error to go back to the LLM so it can retry, not crash your app. Configure via ToolNode:
from langgraph.prebuilt import ToolNode
# Default: catch invocation errors, re-raise execution errors
tool_node = ToolNode(tools)
# Catch all errors and return error message to LLM
tool_node = ToolNode(tools, handle_tool_errors=True)
# Custom error message
tool_node = ToolNode(tools, handle_tool_errors="Something went wrong, please try again.")
# Custom error handler function
def handle_error(e: ValueError) -> str:
return f"Invalid input: {e}"
tool_node = ToolNode(tools, handle_tool_errors=handle_error)
# Only catch specific exception types
tool_node = ToolNode(tools, handle_tool_errors=(ValueError, TypeError))When handle_tool_errors=True, the error message is sent back to the LLM as a ToolMessage instead of crashing. The LLM sees what went wrong and can try again with different inputs.
Create src/first_react_agent/tools.py. These are the two tools our sales agent will use. The key design: Tool 1's output is Tool 2's input - the agent chains them.
import csv
import io
from datetime import datetime
from pathlib import Path
from langchain.tools import tool
@tool
def get_monthly_sales(month: str, year: int) -> str:
"""Load all sales orders for a given month and year from the sales database.
Args:
month: Full month name (e.g., 'January', 'February')
year: The year (e.g., 2025)
"""
csv_path = Path(__file__).parent.parent.parent / "sales.csv"
with open(csv_path) as f:
reader = csv.DictReader(f)
rows = [
row for row in reader
if datetime.strptime(row["date"], "%Y-%m-%d").strftime("%B") == month
and datetime.strptime(row["date"], "%Y-%m-%d").year == year
]
if not rows:
return f"No orders found for {month} {year}."
header = "order_id,date,customer,product,quantity,unit_price"
lines = [header]
for r in rows:
lines.append(
f"{r['order_id']},{r['date']},{r['customer']},{r['product']},{r['quantity']},{r['unit_price']}"
)
return "\n".join(lines)
@tool
def calculate_sales_summary(sales_data: str) -> str:
"""Calculate summary statistics from sales data. Use this after getting sales data with get_monthly_sales.
Args:
sales_data: CSV-formatted sales data with columns: order_id, date, customer, product, quantity, unit_price
"""
reader = csv.DictReader(io.StringIO(sales_data))
rows = list(reader)
if not rows:
return "No data to summarize."
total_orders = len(rows)
total_revenue = sum(int(r["quantity"]) * float(r["unit_price"]) for r in rows)
avg_order_value = total_revenue / total_orders
customer_revenue: dict[str, float] = {}
for r in rows:
revenue = int(r["quantity"]) * float(r["unit_price"])
customer_revenue[r["customer"]] = customer_revenue.get(r["customer"], 0) + revenue
top_customer = max(customer_revenue, key=customer_revenue.get)
return (
f"Total Orders: {total_orders}\n"
f"Total Revenue: ${total_revenue:,.2f}\n"
f"Average Order Value: ${avg_order_value:,.2f}\n"
f"Top Customer: {top_customer} (${customer_revenue[top_customer]:,.2f})"
)How the chaining works:
get_monthly_sales("January", 2025) - returns CSV rowscalculate_sales_summary(raw_csv_data) - returns computed statsIf get_monthly_sales returns "No orders found for April 2025", the agent sees that and responds directly - no need to call calculate_sales_summary.
Notice how calculate_sales_summary's docstring says "Use this after getting sales data with get_monthly_sales." This guides the LLM to chain the tools in the right order.
Path(__file__).parent.parent.parent navigates from tools.py -> first_react_agent/ -> src/ -> project root where sales.csv lives.
A system prompt tells the model how to behave. There are three ways to set it, from simple to advanced.
1. String (simplest):
from langchain.agents import create_agent
agent = create_agent(
model,
tools,
system_prompt="You are a helpful assistant. Be concise and accurate.",
)A plain string. When omitted, the agent infers its task from the messages directly.
2. SystemMessage (advanced features):
For models that support it, SystemMessage allows structured content with features like cache control:
from langchain.agents import create_agent
from langchain_core.messages import SystemMessage, HumanMessage
product_agent = create_agent(
model,
tools=[],
system_prompt=SystemMessage(
content=[
{
"type": "text",
"text": "You are an assistant that answers questions about our product catalog.",
},
{
"type": "text",
"text": "<entire product catalog CSV - thousands of rows>",
"cache_control": {"type": "ephemeral"},
},
]
),
)
result = product_agent.invoke(
{"messages": [HumanMessage("Which products have the highest margin?")]}
)The cache_control field with {"type": "ephemeral"} tells Anthropic to cache that content block, reducing latency and costs on repeated calls.
3. Dynamic system prompt with @Dynamic_prompt:
Sometimes the system prompt needs to change based on who is using the agent or what state the conversation is in. The @Dynamic_prompt decorator generates the prompt at runtime:
from typing import TypedDict
from langchain.agents import create_agent
from langchain.agents.middleware import dynamic_prompt, ModelRequest
class Context(TypedDict):
user_role: str
@dynamic_prompt
def user_role_prompt(request: ModelRequest) -> str:
"""Generate system prompt based on user role."""
user_role = request.runtime.context.get("user_role", "user")
base_prompt = "You are a helpful assistant."
if user_role == "expert":
return f"{base_prompt} Provide detailed technical responses."
elif user_role == "beginner":
return f"{base_prompt} Explain concepts simply and avoid jargon."
return base_prompt
agent = create_agent(
model,
tools=[get_monthly_sales, calculate_sales_summary],
middleware=[user_role_prompt],
context_schema=Context,
)
result = agent.invoke(
{"messages": [{"role": "user", "content": "Show me this quarter's sales breakdown"}]},
context={"user_role": "expert"},
)The @Dynamic_prompt function receives a ModelRequest with access to runtime.context - immutable configuration you pass at invocation time. The context_schema parameter defines what shape that context takes.
Our sales agent uses a string system prompt. We will wire it in the next step when we create the agent.
Think of an agent as an LLM that can do things, not just talk. Instead of only generating text, an agent can call tools (Python functions you define), look at the results, and decide what to do next - all on its own. It keeps going until it has enough information to give you a proper answer.
create_agent provides a production-ready agent implementation.
An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - when the model emits a final output or an iteration limit is reached.ReAct Loop
Walking through the diagram:
This is the ReAct pattern (Reasoning + Acting) - the model reasons about each step, acts by calling tools, observes the results, and repeats until done.
create_agent Works Under the Hoodcreate_agent builds a graph-based agent runtime using LangGraph. A graph consists of nodes (steps) and edges (connections) that define how your agent processes information. The agent moves through this graph, executing nodes like the model node (which calls the model), the tools node (which executes tools), or middleware.
You do not need to understand LangGraph to use create_agent - it handles the graph construction for you. We will explore the Graph API in depth in Blog 4.
create_agent Parameterscreate_agent brings together a model, tools, and a system prompt into a single callable agent:
| Parameter | What It Does |
model | The LLM - a model instance (our BaseChatModel from init_llm) or a string identifier |
tools | List of tool functions decorated with @tool |
system_prompt | Optional string or SystemMessage directing agent behavior |
response_format | Optional structured output configuration (Step eight) |
middleware | Optional list of middleware for customizing execution |
checkpointer | Optional checkpointer for conversation memory (Step 9) |
You invoke the agent by passing messages:
result = agent.invoke(
{"messages": [{"role": "user", "content": "Summarize our January 2025 sales"}]}
)You can use message objects instead of dicts:
from langchain_core.messages import HumanMessage
result = agent.invoke(
{"messages": [HumanMessage("Summarize our January 2025 sales")]}
)Create src/first_react_agent/agent.py:
from langchain.agents import create_agent
from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary
def create_app():
config = AICoreConfig()
llm = create_llm(config)
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
system_prompt=(
"You are a sales analyst assistant. You can only help with sales data queries. "
"The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
"When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
"then pass that data to calculate_sales_summary to compute the statistics. "
"Always report the summary back to the user. "
"If the user asks about anything other than sales data, politely let them know you can only help with sales queries."
),
)
return agentThe system prompt does three important things:
The agent follows the ReAct loop:
get_monthly_sales("January", 2025)calculate_sales_summary(raw_data)If the user asks about a month with no data (e.g., April), get_monthly_sales returns "No orders found for April 2025." The agent sees this and responds directly without calling calculate_sales_summary.
By default, LLMs return free text. But what if you need the response in a specific format - a Python object with typed fields you can use in your code? That is structured output.
create_agent accepts a response_format parameter that constrains the agent's final response to match a schema you define. The structured response is returned in the structured_response key of the result.
ProviderStrategy (preferred):
ProviderStrategy uses the model provider's native structured output generation. This is more reliable because the provider enforces the schema during generation. Supported by OpenAI, Anthropic Claude, Gemini, and xAI Grok.
from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain.agents.structured_output import ProviderStrategy
class OrderSummary(BaseModel):
"""Summary of a customer order."""
customer: str = Field(description="The customer name")
total_items: int = Field(description="Total number of items ordered")
total_amount: float = Field(description="Total order amount in dollars")
agent = create_agent(
llm,
tools=[],
response_format=ProviderStrategy(OrderSummary),
)
result = agent.invoke({
"messages": [{"role": "user", "content": "Summarize this order: Acme Corp bought 10 Widget A at $29.99 and 5 Widget B at $49.99"}]
})
print(result["structured_response"])
# OrderSummary(customer='Acme Corp', total_items=15, total_amount=549.85)result["structured_response"] is a validated Pydantic instance - not a string. You access .customer, .total_items, .total_amount directly.
ToolStrategy (fallback):
Not all models support native structured output. ToolStrategy is the fallback - it uses tool calling to generate structured responses. It works with any model that supports tool calling:
from langchain.agents.structured_output import ToolStrategy
agent = create_agent(
llm,
tools=[],
response_format=ToolStrategy(OrderSummary),
)
| Strategy | When to Use |
ProviderStrategy | Your model supports native structured output (OpenAI, Anthropic Claude, Gemini, xAI Grok). More reliable |
ToolStrategy | Your model only supports tool calling but not native structured output. Works as a fallback |
Shortcut: Pass the schema directly without wrapping it in a strategy. LangChain automatically selects ProviderStrategy if the model supports it, falling back to ToolStrategy otherwise:
agent = create_agent(llm, response_format=OrderSummary)Both strategies support multiple schema types:
| Schema Type | Returns |
Pydantic BaseModel | Validated Pydantic instance |
dataclass | Dictionary |
TypedDict | Dictionary |
| JSON Schema dict | Dictionary |
ToolStrategy has a handle_errors parameter that controls what happens when the model returns invalid data:
# Default: catch all errors
ToolStrategy(schema=OrderSummary, handle_errors=True)
# Custom error message
ToolStrategy(schema=OrderSummary, handle_errors="Please provide a valid order summary with all required fields.")
# Custom error handler function
def custom_handler(error: Exception) -> str:
return f"Error: {str(error)}. Please try again."
ToolStrategy(schema=OrderSummary, handle_errors=custom_handler)
# No error handling - exceptions propagate
ToolStrategy(schema=OrderSummary, handle_errors=False)When errors are handled, the error message is sent back to the LLM so it can retry with corrected output.
SAP AI Core note:ProviderStrategy does not work with sap-ai-sdk-gen when streaming is enabled. The SAP AI SDK injects a deployment_id parameter into all API calls. ProviderStrategy uses the provider's beta.chat.completions.stream() endpoint which rejects deployment_id, causing a TypeError. Use ToolStrategy instead - it goes through the standard API path which handles deployment_id correctly. On SAP AI Core, always wrap your schema explicitly: response_format=ToolStrategy(OrderSummary).
Create src/first_react_agent/schemas.py. This defines the typed structure for our sales summary:
from pydantic import BaseModel, Field
class SalesSummary(BaseModel):
"""Summary statistics for a month of sales data."""
month: str = Field(description="The month name")
year: int = Field(description="The year")
total_orders: int = Field(description="Total number of orders")
total_revenue: float = Field(description="Total revenue in dollars")
average_order_value: float = Field(description="Average revenue per order")
top_customer: str = Field(description="Customer with the highest total revenue")
Update src/first_react_agent/agent.py to use it:
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.schemas import SalesSummary
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary
def create_app():
config = AICoreConfig()
llm = create_llm(config)
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
system_prompt=(
"You are a sales analyst assistant. You can only help with sales data queries. "
"The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
"When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
"then pass that data to calculate_sales_summary to compute the statistics. "
"Always report the summary back to the user. "
"If the user asks about anything other than sales data, politely let them know you can only help with sales queries."
),
response_format=ToolStrategy(SalesSummary),
)
return agentWe use ToolStrategy(SalesSummary) explicitly instead of passing SalesSummary directly. As noted above, passing the schema directly auto-selects ProviderStrategy which does not work with SAP AI Core. ToolStrategy uses tool calling to enforce the schema, which works correctly.
Now when the agent responds, result["structured_response"] is a validated SalesSummary object. You can access result["structured_response"].total_orders, .total_revenue, etc.
Note: We added "politely let them know you can only help with sales queries" to the system prompt. This works when the agent generates free text (which we will do in Step 10 with streaming). However, with structured output enabled, the agent is forced to return a SalesSummary object for every response, even for off-topic questions. This is because create_agent internally sets tool_choice="any" when response_format is set, meaning the model must always produce a tool call (the structured schema), never plain text. So for now, off-topic questions will still get a SalesSummary response with placeholder values. We will revisit this tradeoff in Step 10.
Without memory, every call to the agent is independent, it forgets everything from previous turns. Ask about January sales, then ask "How does that compare to February?" and the agent has no idea what "that" refers to. Short-term memory fixes this by letting the agent remember the conversation within a session.
Memory requires two things:
InMemorySaver (development) - stores state in a Python dictionary. Fast, zero setup, but lost when the process stops:
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
checkpointer=InMemorySaver(),
)Thread IDs - pass a thread_id in the config to identify the conversation:
config = {"configurable": {"thread_id": "1"}}
# First message
agent.invoke(
{"messages": [{"role": "user", "content": "How were sales in January 2025?"}]},
config,
)
# Second message - agent remembers the previous answer
agent.invoke(
{"messages": [{"role": "user", "content": "How does that compare to February?"}]},
config,
)The agent remembers the January context because both calls use the same thread_id. Change the thread ID and it starts a fresh conversation.
Production checkpointers - InMemorySaver is for development only. For production, you need a persistent checkpointer that survives process restarts.
In an SAP landscape, use langgraph-checkpoint-hana to persist conversation state in SAP HANA Cloud:
pip install langgraph-checkpoint-hanafrom langgraph_checkpoint_hana import HANASaver
with HANASaver.from_conn_info(
address="your-instance.hanacloud.ondemand.com",
port=443,
user="DBADMIN",
password="your-password",
) as checkpointer:
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
checkpointer=checkpointer,
)HANASaver also supports environment variables (HANASaver.from_env()) and existing hdbcli connections. It creates LANGGRAPH_CHECKPOINTS and LANGGRAPH_CHECKPOINT_WRITES tables automatically.
Managing long conversations - long conversations can exceed the model's context window. Use SummarizationMiddleware to replace older messages with a summary:
from langchain.agents.middleware import SummarizationMiddleware
agent = create_agent(
llm,
tools=[],
middleware=[
SummarizationMiddleware(
model=llm,
trigger=("tokens", 4000),
keep=("messages", 20),
)
],
checkpointer=InMemorySaver(),
)This kicks in when the conversation exceeds 4000 tokens, keeping the last 20 messages and summarizing the rest.
Time to wire everything together. We will add memory to the agent, create the entry point, register a CLI command, and run the whole thing.
1. Update src/first_react_agent/agent.py - add InMemorySaver:
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from langgraph.checkpoint.memory import InMemorySaver
from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.schemas import SalesSummary
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary
def create_app():
config = AICoreConfig()
llm = create_llm(config)
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
system_prompt=(
"You are a sales analyst assistant. "
"The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
"When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
"then pass that data to calculate_sales_summary to compute the statistics. "
"Always report the summary back to the user. "
"If the user asks something unrelated to sales data, politely decline and explain that you can only assist with sales queries."
),
response_format=ToolStrategy(SalesSummary),
checkpointer=InMemorySaver(),
)
return agentThe only change from Step 8 is checkpointer=InMemorySaver(). This tells the agent to save conversation state after every turn. We also imported InMemorySaver from langgraph.checkpoint.memory.
2. Create src/first_react_agent/main.py - the entry point:
from first_react_agent.agent import create_app
def main():
agent = create_app()
thread_config = {"configurable": {"thread_id": "1"}}
print("Sales Agent ready. Ask about sales data. Type 'quit' to exit.\n")
print("Try: 'How were sales in January 2025?' or 'Show me April 2025 sales'\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() in ("quit", "exit", "q"):
print("Goodbye!")
break
result = agent.invoke(
{"messages": [{"role": "user", "content": user_input}]},
thread_config,
)
summary = result["structured_response"]
print(f"\nAssistant: {summary}\n")
if __name__ == "__main__":
main()A few things to note:
agent.invoke() sends the user message and waits for the complete response. No streaming, we will get to that in Step 10.thread_config passes the thread_id so the checkpointer knows which conversation to load/save. Every call with the same thread_id shares the same history. The agent can now understand follow-up questions like "How does that compare to February?" because it remembers the January context.result["structured_response"] is the validated SalesSummary Pydantic object we defined in Step 8. Because we set response_format=ToolStrategy(SalesSummary), the agent always returns structured output in this key.3. Update pyproject.toml - register a CLI command so we can run the agent with uv run agent:
[project]
name = "first-react-agent"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"pydantic-settings>=2.13.1",
"sap-ai-sdk-gen>=6.6.0",
]
[project.scripts]
agent = "first_react_agent.main:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"The [project.scripts] section maps the command agent to first_react_agent.main:main. When you run uv run agent, uv installs the project in development mode and calls the main() function.
4. Run it:
uv run agentTry these prompts:
How were sales in January 2025? - agent fetches data, computes summary, returns structured outputShow me April 2025 sales - agent finds no orders, returns summary with zero valuesHow does that compare to February? - agent uses memory to understand "that" refers to the previous monthYou will see output like this:
You: How were sales in January 2025?
Assistant: month='January' year=2025 total_orders=5 total_revenue=1799.46 average_order_value=359.89 top_customer='Global Ltd'
It works, but... that output is not exactly user-friendly. You get a raw Pydantic object dump - field names, values, no formatting. That is the tradeoff with structured output: your code gets clean, typed data to work with, but the raw representation is not meant for humans to read.
You could format the SalesSummary object yourself before printing (e.g., f"Month: {summary.month}, Revenue: ${summary.total_revenue:,.2f}"), and that is the right approach when you are building an API or a UI that consumes the structured data programmatically.
But what if you want the agent to speak naturally - streaming text word by word as it thinks through the answer, like a real conversation? That is what Step 10 is about.
Why can't we stream with structured output?
When response_format is set, create_agent internally sets tool_choice="any". This forces the model to always make a tool call, specifically, a call to the structured output schema tool. The model never produces free text. Since streaming shows the model's text output token by token, and there is no text output (only tool call arguments), there is nothing to stream. AIMessageChunk.text is always empty.
Streaming and structured output are mutually exclusive. You pick one:
| Mode | You Get | You Lose |
Structured output (response_format) | Typed Pydantic objects, programmatic access | Real-time streaming, natural language responses |
Streaming (no response_format) | Real-time token-by-token output, natural conversation | Typed schema enforcement |
If your use case is an API that feeds data into a dashboard, structured output is the right choice. If your use case is a conversational terminal agent, streaming feels better. Step 10 shows you how to switch.
Instead of waiting for the complete response, streaming gives you output as it is generated, word by word, in real time. This makes the agent feel responsive and conversational.
LangChain agents support three stream modes:
| Mode | What It Streams |
updates | State updates after each agent step |
messages | Tuples of (token, metadata) from LLM invocations |
custom | Custom data from inside graph nodes using the stream writer |
Agent progress streaming - see each step the agent takes:
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
stream_mode="updates",
version="v2",
):
if chunk["type"] == "updates":
for step, data in chunk["data"].items():
print(f"Step: {step}")Token streaming - get individual tokens as they are generated:
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
stream_mode="messages",
version="v2",
):
if chunk["type"] == "messages":
token, metadata = chunk["data"]
print(f"Node: {metadata['langgraph_node']}")Combining modes - pass multiple modes as a list:
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "Summarize January 2025 sales"}]},
stream_mode=["updates", "messages"],
version="v2",
):
if chunk["type"] == "updates":
for step, data in chunk["data"].items():
print(f"Step: {step}")
elif chunk["type"] == "messages":
token, metadata = chunk["data"]
if isinstance(token, AIMessageChunk) and token.text:
print(token.text, end="", flush=True)Direct model streaming (without agent) - you can also stream directly from the chat model:
for chunk in llm.stream("List three strategies to increase Q2 sales"):
print(chunk.text, end="|", flush=True)Each chunk is an AIMessageChunk. Accumulate them to build the full response:
full = None
for chunk in llm.stream("What factors affect average order value?"):
full = chunk if full is None else full + chunk
print(full.text)Filtering stream output - when using stream_mode="messages", the stream contains all message types: AIMessageChunk (model text and tool calls), ToolMessage (raw tool output), and others. For a clean chat experience, you only want the model's final text, not the raw CSV data from tools or the tool call metadata. Use isinstance(token, AIMessageChunk) and token.text to filter:
isinstance(token, AIMessageChunk) - only model output, skips ToolMessage and other typestoken.text - only chunks with actual text content, skips tool call chunks (which have empty text but populated tool_calls)Without this filter, you would see the raw CSV sales data dumped into the terminal alongside the agent's response.
To enable streaming, we need two changes: remove response_format from the agent (so the model can generate free text), and switch from invoke() to stream() in the entry point.
1. Update src/first_react_agent/agent.py - remove response_format:
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from first_react_agent.config import AICoreConfig
from first_react_agent.client import create_llm
from first_react_agent.tools import get_monthly_sales, calculate_sales_summary
def create_app():
config = AICoreConfig()
llm = create_llm(config)
agent = create_agent(
llm,
tools=[get_monthly_sales, calculate_sales_summary],
system_prompt=(
"You are a sales analyst assistant. "
"The sales data covers the year 2025. If the user does not specify a year, assume 2025. "
"When a user asks about sales for a specific month, first use get_monthly_sales to fetch the raw data, "
"then pass that data to calculate_sales_summary to compute the statistics. "
"Always report the summary back to the user. "
"If the user asks something unrelated to sales data, politely decline and explain that you can only assist with sales queries."
),
checkpointer=InMemorySaver(),
)
return agentWe removed two things: the response_format=ToolStrategy(SalesSummary) parameter and the SalesSummary/ToolStrategy imports. Without response_format, the model is free to generate natural text, which is what we need for streaming.
Notice the system prompt still includes "politely decline and explain that you can only assist with sales queries." Without structured output forcing tool_choice="any", this instruction now actually works, the agent can respond with plain text to decline off-topic questions.
2. Update src/first_react_agent/main.py - switch to streaming:
from langchain_core.messages import AIMessageChunk
from first_react_agent.agent import create_app
def main():
agent = create_app()
thread_config = {"configurable": {"thread_id": "1"}}
print("Sales Agent ready. Ask about sales data. Type 'quit' to exit.\n")
print("Try: 'How were sales in January 2025?' or 'Show me April 2025 sales'\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() in ("quit", "exit", "q"):
print("Goodbye!")
break
print("\nAssistant: ", end="")
for chunk in agent.stream(
{"messages": [{"role": "user", "content": user_input}]},
thread_config,
stream_mode="messages",
version="v2",
):
if chunk["type"] == "messages":
token, metadata = chunk["data"]
if isinstance(token, AIMessageChunk) and token.text:
print(token.text, end="", flush=True)
print("\n")
if __name__ == "__main__":
main()The key changes from Step 9's main.py:
agent.stream() instead of agent.invoke() - returns chunks as they are generated instead of waiting for the complete responsestream_mode="messages" - gives us token-by-token output as (token, metadata) tuplesisinstance(token, AIMessageChunk) and token.text - filters to only show the model's text output. Without this, you would also see raw tool results (CSV data) and tool call metadata in the terminalprint(token.text, end="", flush=True) - prints each token immediately without a newline, and flush=True forces it to appear in the terminal without bufferingresult["structured_response"] - since we removed response_format, there is no structured response. The agent speaks in natural language.3. Run it:
uv run agentNow the agent streams its response in real time. The agent streams each word as it is generated. You see the response build up in real time - much more natural than the Pydantic dump from Step 9. And because we kept the checkpointer and thread_config, memory still works:streaming demo
The agent remembered that "that" refers to January - no need to repeat the context.
first-react-agent/
+-- .env # SAP AI Core credentials (never commit)
+-- .gitignore
+-- .python-version
+-- pyproject.toml # Dependencies, scripts, build config
+-- uv.lock # Pinned dependency versions
+-- sales.csv # Sales data
+-- README.md
+-- src/
+-- first_react_agent/
+-- __init__.py
+-- config.py # Loads and validates .env credentials (from Blog 1)
+-- client.py # Creates authenticated LLM connection (from Blog 1)
+-- schemas.py # Pydantic models for structured output
+-- tools.py # Sales data tools (fetch + summarize)
+-- agent.py # Agent creation and configuration
+-- main.py # Entry point - chat loop
Blog 3: Build a Chat App for Your Sales Agent with SAPUI5, FastAPI & Real-Time Streaming
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 25 | |
| 19 | |
| 17 | |
| 13 | |
| 11 | |
| 11 | |
| 10 | |
| 9 | |
| 6 | |
| 4 |