
Typically, SAP Systems are configured in a three-tier configuration: the presentation tier or layer (FIORI, SAP GUI, or HTML browser), the application layer (SAP Netweaver ABAP/Java), and the storage/database tier.
Such a three-tier architecture provides several benefits, including improved scalability, maintainability, and security. By separating the different functions into distinct layers, changes to one layer can be made without affecting the others, making the application more flexible and easier to maintain.
The different layers/tiers and components in a three-tier software architecture communicate with each other using a combination of protocols and APIs. Some common protocols used in three-tier architecture include HTTP, HTTPS, TCP/IP, and message queues. The communication between the tiers typically occurs over a network, such as a local area network (LAN), wide area network (WAN), or the internet.
Network performance is critical in a three-tier software architecture because of the way the different tiers communicate with each other. In this architecture, the presentation tier, the application tier, and the storage tier are often located on different physical or virtual machines, and therefore communicate over a network. This communication is critical for the overall performance and reliability of the application.
If the network is slow or unreliable, it can cause significant delays in the processing of data and lead to slow response times for the user. Additionally, if the network experiences high levels of traffic or congestion, it can cause dropped packets or even complete failures in communication between the tiers. This can result in data loss or corruption, which can have serious consequences for the application and its users.
Therefore, it is important to ensure that the network used in a three-tier architecture has high levels of performance, reliability, and security. This may involve using redundant networks, load balancing, and firewalls to protect against network-based attacks, as well as monitoring network performance and capacity to detect and resolve issues before they become critical.
By investing in a robust network infrastructure, organizations can ensure the performance, scalability, and security of their three-tier applications.
The application tier of an SAP NetWeaver ABAP system is composed of several individual components, including:
These various components of those application servers communicate with each other via network using the TCP/IP protocol.
We typically name the network components connecting the users (GUI, Web clients) with the application server as frontend network, the network parts connecting the various component of the application server with each other and the database as backend network.
The performance of this backend network has a high impact on the overall response time – a poor network will significantly increase the database response or enqueue response times. The total DB response time as seen from the dialog work process consists of the network time from the dialog process to the DB, the DB execution time and the time to send the result back to the dialog process.
Especially for HANA customers, which have high expectations to see improved performance compared to their old installations, the network times are very critical; a poor network has the potential to make the very good HANA response times invisible and consequently leading to customer complaints about the HANA related products.
The execution time of a fast DB statement (e.g. select single from table T100) is in the range of 100 μs or even faster. The network time for a fast round trip (SAP > DB > SAP) is typically around 300 μs. The total time for such a fast DB select statement is therefore at approximately 400 μs where the network roundtrip time counts for 75% of the total DB response time. Some customer installations have much higher roundtrip times of more than 700μs which severely impact the total execution time for a SQL statement.
Impact on Response Times
The impact of high network latency can be seen when analyzing the statistical records from transaction STAD. In the below example we see an update process on a system with a high latency of approximately 0.70ms.
The total DB request time was at 65ms for a total of 61 DB requests. If the customer would be able to reduce the latency by 0.3ms, then total time for the 61 database requests would be reduced by approximately 61*0.3ms = 18.3ms. The total DB time therefore be reduced from 65ms to 48ms which is an improvement of 28%. In this example the network time can be estimated at 61*0.7ms = 42ms which is 66% of the total DB time.
To describe the quality of a network we use the following terms:
SAP NetWeaver ABAP does not have advanced features for directly evaluating network quality, but there are some tools available that can help determine if the network performance is sufficient.
In this document we describe the following tools to analyze network performance:
For more detailed information about network performance we refer to the following documents:
The OSI (Open Systems Interconnection) model defines the communication functions and protocols needed for successful data exchange between two end systems in a computer network. Different protocols are used at each layer of the OSI model to achieve this communication. Here is a brief summary of the typical protocols used in each layer:
Note that these protocols are not exhaustive and there are many other protocols that can be used in each layer.
The different network layers/protocols and analysis tools are illustrated below:
Any network analysis on network layers 2 or higher might be impacted by CPU bottlenecks, a very high CPU usage can result in elevated response times which are not related to the physical network itself. It is therefore mandatory to review the CPU usage on both sides in order to rule out that the measurement results are caused by a CPU overload.
The most commonly known tools to analyze a network are PING and NIPING. Both are similar but there are differences one should know. In the OSI/ISO network layer model the PING and NIPING work on different network layers. While PING works on level 3 (IP layer) with protocol ICMP, NIPING (which is part of the SAP software on each application server) works on level 4 (Transport layer TCP). The main difference is that NIPING works on a dedicate port and apart from a performance test it also allows to test firewall settings (e.g. test if a port is open or not).
With PING and NIPING we can measure both, the network latency (using very small packages) or the throughput (using very large packages).
Package Size
The impact of the low bandwidth, reduced throughput or high latency on the database performance depends on the type of select statements and the size of the result set which is transferred back from the database to the applications server.
In a typical transactional SAP system most of the select statements transfer packages back from the DB to the application server of only 66 to 900 bytes – around 70% of all DB response fit into a single TCP package of a Maximum Transmission Unit MTU of 1500 bytes. For those small packages send back from the DB, latency is most important as it represents a constant offset in the transmission time. For very large packages the max. achievable throughput will be more important.
To analyze the network latency which is most critical for fast database selects or enqueue requests, one should use small packages, for a throughput or bandwidth measurement very large packages are recommended. For general analysis one normally uses a package size which fits into a single TCP package. Commonly we use 100 to 300 bytes.
A PING test can be executed from within the SAP application server (transactions OS01, ST06) or via command line on OS level.
LAN Check by Ping (OS01)
Ping Command executed from command line on OS level:
$> ping 10.54.42.101
The ping result is not very accurate (no decimals available).
Pinging 127.0.0.1 with 32 bytes of data:
Reply from 127.0.0.1: bytes=32 time<1ms TTL=64
Reply from 127.0.0.1: bytes=32 time<1ms TTL=64
Reply from 127.0.0.1: bytes=32 time<1ms TTL=64
Reply from 127.0.0.1: bytes=32 time<1ms TTL=64
Ping statistics for 127.0.0.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
Average PING result below 1ms do not necessarily indicate that the times are perfect. The time resolution of PING is not accurate enough. If the average times are above 1ms then a detailed network analysis should be performed using more accurate tools like NIPING or TCPDUMP.
Average PING result below 1ms do not necessarily indicate that the times are perfect. The time resolution of PING is not accurate enough. If the average times are above 1ms then a detailed network analysis should be performed using more accurate tools like NIPING or TCPDUMP.
NIPING is a network diagnostic tool that allows you to check the availability and quality of a network connection between two systems. The tool is commonly used to test the connectivity and performance of a network in SAP environments, where network performance is critical to the overall performance of the system.
The NIPING needs to be started in two separate modes:
NIPING Server: | listening for incoming network packets on a specified port and responds to them |
NIPING client | sending network packets to a specified destination system, such as a Niping server. The RTT of the packets is measured and the results are displayed, providing information on the quality of the network connection from the source system to the destination system. |
The NIPING command must be executed via command line on both sides, on the target host we first need to start the NIPING server which will receive the NPING client requests and reply back.
On the server side we typically start the NIPING server with the command:
$> niping -s -I 0
-s server mode
-I x idle time
x > 0 shutdown after x seconds
x = 0 no automatic shutdown
x < 0 shutdown after x seconds idle or first client disconnect
On the client side we use a command like:
$> niping -c -H 192.168.201.45 -B 100 -L 86400 -D 1000 -P >> niping.txt
-c client mode
-s server mode
-H <IP-Address> target host
-B sss package size
-D nnn delay between sends
-L number of loops (repetitions)
-P detailed output
More information about how to use NIPING is available via command nipping /? or in SAP notes:
500235 - Network Diagnosis with NIPING.
2986631 - How to do NIPING and checking network performance
The NIPING list output typically looks like:
---------------------------------------------------
trc file: "nipingresult.txt", trc level: 1, release: "753"
---------------------------------------------------
Tue Sep 17 09:22:33 2019
connect to server o.k.
0: 9.339 ms
Tue Sep 17 09:22:34 2019
1: 3.112 ms
Tue Sep 17 09:22:35 2019
2: 0.448 ms
Tue Sep 17 09:22:36 2019
3: 0.843 ms
In order to process the result of a NIPING analysis we recommend using the DA Data Analysis tool which is described and available in SAP note 3169320 Data Analysis.
LAN Traffic: If client and server are within the same data center the times should be as indicated below (see SAP notes 1100926, 2081065 )
WAN Traffic: For larger distances one must expect higher NIPING response times. The response times per 100km distance between client and server can be interpreted as indicated below:
(An example for typical WWAN response times can be found at https://wondernetwork.com/pings/New%20York)
Fluctuations
Even for a very good network or a loopback scenario the standard deviation which describes the average difference between the individual measurements from the overall average value is typically very high and does not allow to judge if the fluctuations are within normal range (the histogram shape is closer to a log-normal distribution).
![]() | In a typical normal distribution 68,27 % of all measurements are within the standard deviation σ from the mean (Full width at half maximum). Within the NIPING analysis and monitor tool we calculate the percentage of values found within this standard deviation and compare this value against the expected value within a normal distribution. A value of close or more than 100 % indicates that the fluctuations are normal, lower values indicate that the fluctuations are much higher. |
In the below Example we see three different NIPING results for a fast, medium and slow network where we clearly see the differences between the fluctuations in the corresponding histograms.
Low Fluctuations | Medium Fluctuations | High Fluctuations |
![]() | ![]() | ![]() |
In order to ensure that the NIPING result is not impacted by the host server itself we always recommend performing a loopback ping at the same time for comparison. If the loopback result (nipping server and client on the same machine) is showing high fluctuations then the server itself might have a problem like high CPU usage, small IP/TCP buffers…
The statistical errors of any measurement series which is subject to random fluctuations can be estimated by the formula E% = SQRT(n)/n where n is the number of measurements. In order to keep the statistical errors small at less than 5% we recommend having at least 500 measurements.
Good values: peak below 0.50ms ideally at around 0.25 – 0.35ms with less than 10% of all packages exceeding 0.7 seconds
Bad Values: peak above 0.50ms and/or more than 20% of all packages above 0.70ms
ABAPMETER is a performance benchmark tool available with every Netweaver ABAP stack system. One can execute ABAPMETER via transaction ST13.
Select PERF_TOOL and execute.
Then select ABAPMETER and execute again.
The result typically example is shown below (for a perfect DB & Network Performance)
ABAPMETER is performing multiple different tests mostly related to CPU/MEM performance but two of those tests (columns Acc.DB and E.Acc.DB) can be used to measure the DB and network performance.
ABAPMETER executes on each application server 200 identical select statements against the message table T100. One of the selects is for an existing message, the second select will try to read a non-existing record. Both select statements specify the full primary key of table T100. The ABAP coding is shown below.
ABAPMETER will show the total execution time for 200 identical select statements.
If ABAPMETER is showing elevated times, then those increased execution times could either result from a poor performing database or from a slow network. To distinguish between the poor DB or network performance we must do an SQL trace on the application server (via ST05, ST12 or SQLM) and compare those results with a trace taken directly on the database or by analyzing the SQL statement cache (via ST04).
As mentioned in the previous section the ideal network response for a roundtrip is at 350 μs (for some very fast networks we even see roundtrip times of only 150us). The DB execution time on HANA for the empty T100 select should be faster than 75 μs. The total execution time measured by ABAPMETER for 200 repetitions are therefore between:
SAP recommends that a value shown by ABAPMETER for the empty DB access of more than 140ms (= 0.70ms per single execution) should be investigated further (see SAP note 2222200).
ABAPMETER will normally be executed in the foreground as online report but it can also be scheduled in the background. The result of each ABAPMETER execution is then saved as individual spool file. One can display and download those spool files via transaction SP01.
Start SP01 and specify the username executing ABAPMETER together with suffix 2 = *CAT*
Now first save this list as local file e.g. SPOOL.TXT (System > List > Save > Local File > unconverted). This data is important because it contains date/time of the execution – the ABAPMETER spool file itself does not contain any date/time information which could be extracted.
Now mark all spool files and select Spool Request > Forward > Export as Text…
Once all files are downloaded follow the instruction of SAP note 2879613 - ABAPMETER in NetWeaver AS ABAP or import the files directly into Data Analysis (see SAP Note 3169320 - Data Analysis)
The tool Data.Analysis.HTML allows to display the data as scatterplot and histogram – simply press the Graphic button to display the data.
Although even a few executions of ABAPMETER give already a first glimpse of the network performance one should execute ABAPMETER in intervals of around 5 minutes over a duration of at least 24h. A better understanding can be achieved if ABAPMENETER data is collected for multiple days. The collected data will then show if the performance is showing regular peaks during daily peak hours and what are the typical fluctuation in those measurements.
Ideally the times should be stable throughout a day with only minor fluctuations of +/- 20% around the peak value in a histogram (average value in scatter plot or bar chart).
In the above example we see some extreme fluctuations. We also see that during the daily peak hours the times are regularly increasing by a factor of 4 and more. This is a typical example which required a more in-depth analysis using the other tools described in this document. Overall the times are bad – most measurements are above the SAP recommended value of 140ms for 200 select statements as indicated by the yellow line, the average execution time red line in the above example was at 170ms.
The points below which seem somehow separated are from the primary application server which is located on the same host machine using no physical network but a VIOS partition to connect to the HANA database server.
Below an example of a perfectly tuned SAP HANA system with only minor fluctuations.
A perfect example of ABAPMETER with very fast network connection | If we zoom in, we see only minor fluctuations the average is at only 47ms. |
With ST-PI support package 23 or after implementing SAP note 3318669 - Add Network Roundtrip Time RTT to /SDF/SMON we have the possibility to measure the network latency with the snapshot monitor /SDF/SMON (see SAP Snapshot Monitoring tool SAP Note 2651881).
Important: Due to an unintended program change within ST-PI SP28 the NetRTT times are elevated. To fix this issue please install SAP note 3537775 - Enable Pre-Select for NetRTT measurement.
The network round trip time NetRTT is measured on the application layer of the OSI/ISO network communication model by executing an empty SQL statement on client table T000 for a non-existing entry, similar like program /SSA/CAT (ABAPMETER). The SQL statement is pre-executed 2 times to ensure that the SQL statement is compiled and the SQL cursor is open, then a third single T000 select is measured in microseconds. The total response time depends on multiple factors and therefore a non-optimal response time cannot necessarily be attributed to network problems (high CPU usage on the DB or SAP application server can have a negative impact on the response time).
Ideally the response times of this measurements are around 300-400 μs, values over 750 μs should be investigated further with other tools like NIPING.
The collected data can either be downloaded via transaction /SDF/SMON or directly extracted from table /SDF/SMON_HEADER. Additionally one can use transaction /SDF/SMON_DISPLAY (see SAP note 3210905 or Blog https://blogs.sap.com/2023/05/11/display-snapshot-monitor-data/) to visualize the Network Round Trip Time directly within the SAPGUI browser as shown in the screenshot below:
Compared to the NIPING latency measurements (small package size of 100 Bytes) the measured response time NetRTT via /SDF/SMON are typically around 50-80 μs higher.
We can use SQL traces created from the SAP application server (ST12, ST05) or use the data collected in transaction SQLM (SQL Monitor) to compare the DB execution times measured from the SAP dialog process against the net time measured in the database itself. The difference between these two results can be contributed to the network time. Ideally one should try to trace a select statement which is comparable between different customers. The T100 select statement from ABAPMETER is a perfect candidate but any other fast SELECT SINGLE statement (e.g., from table MAKTX) will work as well. IF the SQL monitor is enabled you don’t need a trace, SQLM has collected all data one will need.
(see https://help.sap.com/viewer/a24970c68fcf4770a64bf9a78e3719e2/7.40.17/en-US/6653101633604f3880f2b9375... or the blog https://blogs.sap.com/2013/11/16/sql-monitor-unleashed/ for more details how to activate SQLM).
ST04 SQL Plan Cache
In the below screenshot we see the statistics from the SQL Plan Cache of a HANA database. Lines 2 and 3 show the times for the T100 select from ABAPMTER, line 3 with 0 number of records shows the statistics for the empty database select for a non-existing message.
The average execution time is very fast (Avg.Cursor Duration) is only 66 us !!!
If we use the SQL monitor (transaction SQLM) to display the statistics for the same select but from an application server perspective which includes the network time the result is very different
On the initial screen of SQLM select Display Data
Then specify the package and table (here package = /SSA/ and table = T100 for the ABAPMETER selects)
An example output is shown below.
In the above example the average execution time as received by the dialog work process on the application server is 1583 us which is 24 times slower or 1500 us longer compared to the time measured directly on the DB.
One can now select the Time Series for this empty select statement and analyze the fluctuations throughout the day.
The minimum average time observed between 02:00:00 and 02:59:59 is 0.535ms, the maximum observed average execution time from 10:00:00 and 10:59:59 is 1.415ms which is 3 times slower.
The average times shown in SQLM are lower than the peak values observed by ABABMETER for individual measurements, because SQLM is calculating an average of 55200 executions over a one-hour duration. The conclusion however is the same, in a customer installation with a perfectly sized network the time should be almost constant with fluctuation no more than 25% during the day. The average execution time for this empty T100 select should be no higher than 0.500ms.
The above time series data from SQLM shown as bar chart is showing the fluctuations during the peak business hours. In the below example the times are above the SAP recommended value of 0.35ms per select statement and during the peak times significantly above the 0.70ms threshold. The time measured in the HANA database was 0.066ms.
Good values: Average SQL execution times measured by SQLM are only about 300us slower than the average execution time measure by the DB.
Bad Values: If the difference between the execution times measured by SQLM and the SQL Plan cache results for a short running execution is more than 0.70ms then the problem should be investigated further. The fluctuations shown in SQLM (hourly averages) should be very low with no more than +/- 10%.
The performance of SAP lock requests (enqueues) is especially critical for systems which do mass processing of transactional data – examples are industry solutions for DSD, Retail, Utilities or systems like EWM. Slow enqueue performance can lead to reduced throughput. Apart from optimal settings of the parameters which control the enqueue server a very fast network connection between the application servers to the primary application server is essential.
SAP provides a dedicated tool to analyze the performance of the enqueue server. If there is a very large difference between the enqueue response times between the PAS itself and the other application servers, it is very likely that a poor network performance is the main culprit.
To start the enqueue performance tool you can use transaction SM12. Then enter the OK-code TEST followed by the OK-code DUDEL or you can start the program RSMONENQ_PERF directly via SA38.
If the program RSMONENQ_PERF is not available in the system, please follow SAP note 1320810 - Z_ENQUEUE_PERF.
The program RSMONENQ_PERF is executing various tests on each application server and can take a few minutes to complete. An example output is shown below:
01.10.2019 Program RSMONENQ_PERF 1 ---------------------------------------------------------- ********************************************************** *********** REPORT RSMONENQ_PERF ********************** ************** Version: Version 2.16 ********************* ********************************************************** Execution DATE and TIME: 01.10.2019 19:36:00 Execution in SYSTEM: PRD Execution on server: PAS_PRD_00 Kernel Release:753-324 Standalone Enqueue Server Host: pas_prd
*********************************************************** **************** CONFIGURATION ************************** ***********************************************************
1.1) Valid values for parameter enque/process_location OK 1.2) Consistency of enque/process_location. OK 1.3) Compare SA enqname and SA Service on all instances OK 1.4) Skipped 1.5) enque/table_size should be identical on all servers OK Skipped for enque/process_location = REMOTESA 1.6) How many appservers run ENQ processes? OK Skipped for enque/process_location = REMOTESA 1.7) More than 3 ENQ processes per instance? OK Skipped for enque/process_location = REMOTESA 1.8) rdisp/restartable_wp must not include ENQ OK Skipped for enque/process_location = REMOTESA 1.9) location of enque/backup_file OK Skipped for enque/process_location = REMOTESA 1.10) enque/query_comm must not be set OK 1.11) enque/force_read_via_rfc must not be set OK 1.12) enque/save_key_always different from default OK 1.13) enque/sync_dequeall should be identical on all servers OK 1.14) rdisp/thsend_mode should be identical on all servers OK
No severe configuration issues detected. Section 1 took 00:00:00 seconds!
*********************************************************** ************ ENQUEUE PERFORMANCE ************************ *********************************************************** All performance test with 100 Requests! All Times in micro seconds if not specified separately
2.1) Performance Test between dispatcher and msg server with ADM Messages!
to APP01_PRD_01 Bytes ms per req 100 200 500 1ms 2ms 5ms 10ms >10ms 458 0,645 0 0 39 48 13 0 0 0 770 0,577 0 0 50 43 7 0 0 0 1.290 0,762 0 0 44 40 8 8 0 0 1.810 1,066 0 0 45 32 11 9 3 0 2.330 0,820 0 0 28 54 14 3 1 0 3.370 0,865 0 0 18 59 18 5 0 0 5.450 0,924 0 0 27 58 9 5 0 1
to APP02_PRD_02 Bytes ms per req 100 200 500 1ms 2ms 5ms 10ms >10ms 458 0,336 0 0 98 2 0 0 0 0 770 0,329 0 0 96 4 0 0 0 0 1.290 0,468 0 0 91 5 1 3 0 0 1.810 1,472 0 0 75 8 5 6 5 1 2.330 1,699 0 0 79 6 5 6 1 3 3.370 1,519 0 0 47 20 17 8 5 3 5.450 0,582 0 0 60 33 6 1 0 0
to PAS_PRD_00 Bytes ms per req 100 200 500 1ms 2ms 5ms 10ms >10ms 458 0,316 0 44 41 12 3 0 0 0 770 0,235 0 64 33 2 0 1 0 0 1.290 0,255 0 65 28 4 3 0 0 0 1.810 0,378 0 57 37 4 0 1 0 1 2.330 0,222 0 55 44 0 1 0 0 0 3.370 0,267 0 43 53 3 1 0 0 0 5.450 0,248 0 39 58 3 0 0 0 0
In the above example we see that the response times on the primary application server PAS_PRD_00 are faster than on the other two application servers but the times on those servers are still within normal range and raise no concern. For a good network connection 90% of the measurement results should be below 1ms. If the times are consistently above 1ms reaching even large counts of 5ms or higher the enqueue performance must be investigated in more detail.
IF high enqueue times are observed on all servers (including the PAS server itself) one should check the parameters and CPU usage of the PAS first and raise an incident if the root cause cannot be detected.
If high enqueue times are only visible on the application servers but not on PAS then we have a strong indication for network issues. We recommend performing further analysis like NIPING/PING between the application servers and PAS to confirm the issue.
Very often those high times appear during times of high system load when the network traffic increases – in any case one should check the overall performance on all involved servers especially CPU/MEM usage and work process availability.
Below an extract from the report RSMONENQ_PERF of a customer facing poor network performance during the business peak hours.
to saperp01 Bytes ms per req 100 200 500 1ms 2ms 5ms 10ms >10ms 458 2,019 0 0 0 5 72 17 5 1 770 1,878 0 0 0 0 71 28 1 0 1.290 2,245 0 0 0 0 64 31 5 0 1.810 2,148 0 0 0 0 63 35 0 2 2.330 2,503 0 0 0 0 59 31 10 0 3.370 2,883 0 0 0 0 41 53 5 1 5.450 2,404 0 0 0 0 45 50 5 0
At the same time the PAS server itself was showing very good results
to sappas00 Bytes ms per req 100 200 500 1ms 2ms 5ms 10ms >10ms 458 2,122 0 99 1 0 0 0 0 0 770 2,623 0 91 8 1 0 0 0 0 1.290 2,652 0 85 12 3 0 0 0 0 1.810 1,733 0 87 13 0 0 0 0 0 2.330 2,002 0 79 19 2 0 0 0 0 3.370 2,245 0 64 31 5 0 0 0 0 5.450 2,408 0 36 51 13 0 0 0 0
Good values: 90% of all measurements below or equal to 1ms on servers different than PAS – the PAS server should be 500us or below.
Bad Values: 90% of all measurements are above 1ms with 20% reaching 5ms or more for servers different than PAS but only if the PAS values are good.
Network package analysis is a valuable tool for network administrators and IT professionals to troubleshoot network connectivity issues. Here are some reasons why and when to use network package analysis: Why to use network package analysis:
When to use network package analysis:
Network package analysis tools like Wireshark and TCPDUMP allow network administrators to capture and analyze network traffic to identify the root cause of network connectivity issues. By analyzing the captured packets, administrators can identify patterns and trends that may indicate network problems and take corrective action to fix the issue. Continuous network monitoring software like Obkio Network Performance Monitoring software can also be used to detect intermittent network problems and identify the cause of the issue.
TCPDUMP is a common packet analyzer that runs under the command line. It allows the user to display TCP/IP and other packets being transmitted or received over a network to which the computer is attached. TCPDUMP works on most Unix-like operating systems: Linux, Solaris, HP-UX 11i, and AIX.
TCPDUMP can print or save selected or all network packages. The output file can then be analyzed with other tools like wireshark.
SAP notes 1370469 - How to perform a TCP trace with Wireshark and 1969914 - Packet scanning tutorial using wireshark are describing how perform such an analysis.
To start tcpdump from a command line please refer to the man pages whichprovide more details on the command syntax. An example is given below.
$> tcpdump -ni eth1 -w /sapmedias/tcpdump.pcap
The amount of data send over a fast ethernet can be very huge. On a 10Gbit line with 10% usage tcpdump will capture around 1.25 Gigabyte very 10 seconds. Typically, such an analysis will be limited to a duration of 30-120 seconds.
Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis, software and communications protocol development, and education. Wireshark is cross-platform, using the Qt widget toolkit in current releases to implement its user interface, and using pcap to capture packets; it runs on Linux, macOS, BSD, Solaris, some other Unix-like operating systems, and Microsoft Windows. Wireshark can be downloaded from www.wireshark.org
On a windows server one can use wireshark to capture and analyze the traffic, on Unix servers one probably must use tcpdump to capture network packets. Once the packates are captured, copy the pcap file to your computer and start wireshark to open the pcap file.
On the main window one will see the sequence number of the captured package, the time stamp, source/target IP addresses, protocol, length of the package in Byte, and additional information like port numbers… If wireshark detects problems (in the above example duplicate ACK acknowledge packages) they will be highlighted.
Wireshark is fully documented – see documentation at www.wireshark.org/docs
Among the several features available in wireshark we list here a few of the most interesting ones:
Follow the TCP stream and see the full communication between App.Server and DB
Packet Length statistics
Endpoint statistics (who was talking with whom – IP-address, Ports, bytes transferred)
Conversations (when did a conversation start)
IO Statistics (bandwidth usage)
The above graph was taken on a server connected with a 10Gbit network – the bandwidth limit is here 1E10 – the 1sec averages were at around 22.5% of the max. capacity.
IO statistics can be produced in different time intervals 1h, 1min, 10sec, 1sec, 100ms, 10ms, 1ms below the same data in 1ms intervals – here the 10Gbit/sec bandwidth translates down to 1E7 bit/ms. We see that in 1ms intervals the network reached 75% of the available bandwidth.
Filters and different color coding can be applied to the IO statistics – in the above example the blue dots are for the src IP = 172… (sending packages of the server were tcpdump was taken, the orange dots are the incoming traffic for the dst IP = ...
Filters and error analysis
One can apply several filters in wireshark to only display various errors:
ICMP errors: icmp.type == 3 || icmp.type == 11
TCP errors: tcp.analysis.flags
Connection Resets: tcp.flags.reset == 1
Filter for IP: ip.addr == x.x.x.x
The various filters can also be applied to all statistics shown earlier.
in the above example we find 72193 packages in TCP Dup ACK error, with a total of 3174948 packages we have 2.3% errors which is very high.
Export Data to Excel
tcpdump data can be exported into CSV files for further analysis. File > Export Package Dissections > As CSV…
one can select if all data capture or displayed/filtered should be exported and the range of packages. (excel can only handle up to 1048576 rows).
Network Disconnections
Analyzing network disconnections can be done in several ways. Here are some steps and tools that can be used:
Using TCPDUMP: Open a terminal and run the following command:
tcpdump -i <interface> -s 0 -w <output_file.pcap>
Replace <interface> with the network interface you want to capture from (e.g., eth0) and <output_file.pcap> with the desired name of the output capture file.
Using Wireshark: Launch Wireshark and select the appropriate network interface to capture packets from.
Look for TCP retransmissions or out-of-order packets, which could indicate network congestion or packet loss.
Analyze the TCP handshake and check for any anomalies or errors during the establishment of the connection.
Examine the IP and MAC addresses involved to ensure there are no issues with the network configuration or addressing.
Look for any error messages or abnormal behavior in the captured packets that could point to a specific network device or protocol causing the disconnection.
Pay attention to any ICMP (Internet Control Message Protocol) messages, as they can provide insights into network errors or communication issues.
Network administrators often find themselves grappling with complex configurations, intricate interconnections, and evolving technologies. To streamline this process and enhance the efficiency of troubleshooting, one vital aspect that should not be overlooked is the proper documentation of the network infrastructure. In this section, we will explore the significance of network documentation and discuss the key components that should be included.
A proper documentation of the network infrastructure should include:
A comprehensive documentation of the used network infrastructure serves various purposes - below a short list describing why a proper network documentation is required:
Below some examples for network diagrams
![]() | ![]() |
(Diagram Examples from https://www.edrawsoft.com/topology-diagram-example.html) |
Reducing network latency between a database server and an application server is crucial for improving performance and responsiveness. Here are several recommendations that a network expert might suggest:
Physical Proximity: Locate the database server and application server in the same data center or as close to each other as possible. The physical distance between servers can significantly impact latency, as signals need less time to travel shorter distances.
Network Infrastructure Upgrades:
Direct Connections: Establish a direct connection between the servers, avoiding any unnecessary hops or intermediaries that can add delay.
Optimized Routing: Implement advanced routing protocols that dynamically find the fastest path for data packets between servers.
Quality of Service (QoS): Configure QoS on network devices to prioritize traffic between the database and application servers. This ensures that critical data isn't delayed by less sensitive traffic.
Network Interface Cards (NICs): Use multi-gigabit or 10 Gigabit NICs that can reduce transmission time and handle higher loads with lower latency.
Reduce Network Traffic: Limit the amount of non-essential traffic on the network segment used by the database and application servers. This can be achieved by network segmentation or Virtual LANs (VLANs).
Server and Network Tuning:
Load Balancers: Use load balancers that can intelligently direct traffic and reduce the load on individual servers, thereby potentially reducing response times.
Use of Content Delivery Networks (CDNs): While more relevant for web content, in some architectures, CDNs or similar technologies might be used to cache database queries and results at network edges closer to where they are needed.
Monitoring and Regular Audits: Continuously monitor network performance and conduct regular audits to find and mitigate any new issues that might cause increased latency.
Implementing these strategies involves both hardware upgrades and software configurations, and the specific choices would depend on the existing infrastructure, budget, and criticality of the application.
To minimize network latency, improve application performance, and increase the reliability of distributed systems many cloud service providers like Azure, AWS, GCP offer the concept of proximity placement. Each major cloud service provider has its strategies and tools to manage this concept, although the specific features and names might differ. Here’s an overview of how proximity placement is handled by Azure, AWS, and Google Cloud Platform (GCP).
Azure: Proximity Placement Groups
Azure provides a feature called Proximity Placement Groups (PPGs) to help achieve lower latency and higher throughput between deployed resources. This is particularly useful when deploying resources that need to communicate with each other frequently or require low network latency, such as SAP HANA databases or other high-performance computing scenarios.
AWS: Placement Groups
AWS offers several types of placement groups that dictate how instances are positioned within the underlying hardware to optimize performance.
Google Cloud Platform (GCP): Resource Locations
Within GCP you can create the following types of placement policies:
Compact placement policy: This policy specifies that VMs are placed close together to reduce network latency. Placing your VMs closer to each other is useful when your VMs need to communicate often among each other, such as when running high-performance computing (HPC), machine learning (ML), or database server workloads.
Spread placement policy: This policy specifies that VMs are placed on separate, discrete hardware—called availability domains—for improved availability and reliability. Placing your VMs on separate availability domains helps to keep your most critical VMs running during live migrations of VMs, or reduce the impact of hardware failure among VMs that share the same hardware.
a list of further links, SAP notes and books about network is provided below:
BOOKS:
Network performance analysis is an essential part of managing SAP landscapes to guarantee that the systems will comply with the performance KPIs. Regular network analysis helps to optimize network performance by identifying bandwidth bottlenecks, inefficient protocols, or misconfigured devices. By analyzing network traffic, one can optimize network settings, prioritize traffic, or upgrade hardware to improve network performance.
If any of the above tests will show a non-optimal result on a regular basis (e.g., daily during peak business times) one should involve network experts to investigate the root cause to mitigate the problem.
Let's use the following analogy: network traffic = traffic on a motorway. With the above tools one can measure the average time from for example Frankfurt to Walldorf (latency). Typical times for the 91km trip are around 58 minutes. If we measure significant higher times, then we can conclude that something is wrong, but we do not know what is causing the problem. If the average times are at 2hours or more, than there might be an accident, a construction area with speed limit or just congestion. The above tools like NIPING, ABAPMETER, SQL traces… only allow us to detect if there is a problem, but these tools will not tell us the root cause. For a root cause analysis, a network specialist needs to be involved who will use more sophisticated tools.
In addition to the above tests with PING/NIPING, ABAPMETER, SQL-Analysis, TCPDUMP one needs to have visibility of the network infrastructure in detail.
Once the network infrastructure and components are documented, the network relevant parameters on operating system level but also on SAP application and database level must be verified.
Hope you find the blog helpful - stay tuned for further information and updates.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
18 | |
16 | |
11 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
5 |