Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

As we all know, in the early days of SAP virtualization, bad things had happened if you weren't following SAP Note 1122388. But today, with vSphere 5.1 being a very stable and performant hypervisor, you can almost run your SAP system out of the box. Almost...

Neither the amount of vCPUs or vRAM nor the pure compute or scheduling overhead of the hypervisor nor the throughput of disk or network I/O is a real concern today. Your SAP system will run with satisfying performance. But as virtualized SAP systems are growing bigger - respectively, customers are keener to virtualize bigger systems - database instances and application server instances got separated, and more and more additional application servers are now connecting to a single database. Customers running a 3-tier architecture on virtualized platforms got exposed to the "new" bottleneck of virtualization: network latency.

The fact that network switching done by the hypervisor increases latency is well known, but the effects can vary widely - from not being noticed at all through to a process step taking significantly longer than on native hardware. The latter is not acceptable of course, so let's take a look at two major virtualization improvements which can be vital running a high-performance 3-tier SAP system:

  • Latency-sensitive configuration of virtual machine and hypervisor
    • With vSphere 5.1, the parameter "latencySensitivity" was introduced. Setting this parameter to high makes changes to the interrupt coalescing and virtual CPU priority for the VM. This can produce better performance results for VMs that are very sensitive to latency.

  • Pass-through technology for I/O devices like network adapter
    • VMware's DirectPath I/O is a generic method to pass through devices.
    • VM-FEX is Cisco's implementation of 802.1BR, which gives some additional advantages.

So how do these two improvements affect performance? First, we need to clarify some parameters, as performance is always a matter of

  1. configuration
  2. workload
  3. influences
  4. expectations

While some performance tuning guides talk a lot about software and hardware configuration, the factors workload characteristics, influences throughout the test landscape and user expectations demand much more attention than they usually get.

In this context, we used the SAP Load Generator (SGEN) as test workload. SGEN does not generate "workload" - it is not a benchmark-tool in itself! What it does is generating "ABAP load". This means that it compiles raw ABAP code and stores it in the database. But there are still some good reasons to use SGEN for testing:

  • SGEN is latency sensitive
  • utilizes application server and database
  • easy to configure and to parallelize
  • produces results with acceptable variation
  • can be compared with parallelized batch workload

Batch workloads should not be compared with SAP dialog processing because of the different workload characteristics. If you are interested in a comparison of transactional performance, maybe this whitepaper is of interest for you (different hypervisor and OS, though).

The database instance resides on a Cisco UCS B200 M2 blade. As application servers, three different Cisco UCS B-series blades were configured:

  • B200 M3 with VIC 1240
  • B230 M2 with M81KR
  • B230 M2 with VIC 1280

With each of the three configurations, four scenarios have been ran through:

  • native application server
  • virtualized application server with VM-FEX pass-through NIC
  • virtualized application server with latency-sensitive configuration
  • virtualized application server with standard configuration

SGEN was limited to six work processes in all test cases for both physical and virtual. All VM tests and the native B200 M3 tests were using 8 cores / vCPUs, but the native B230 M2 tests were using 10 cores. This is an advantage for the native test and we see in the results that it is this comparison that shows the biggest difference between native and virtual in some cases. It is however just a few percentage points in these cases, showing that limiting SGEN to 6 work processes has mitigated most or all of this advantage. We also tested different BIOS settings, UCS adapter policies and so on, but all of these tuning steps were not really visible in the test result. Therefore, we limited the result table to the convincing four scenarios mentioned above.


  • Network hardware has less impact on the overall performance than the CPU (E5 > E7).
  • On all test cases, the overhead is ~ 3 % when using VM-FEX pass-through.
  • The overhead is ~ 100 % when using virtual NIC with a standard vSwitch and VM configuration. On E7, it's even a little worse.
  • Adjusting the virtual machine's latency sensitivity behavior significantly reduces the overhead. However, the results differ noticeably between the test cases:
    • B200 M3, VIC 1240:          ~ 10 % overhead
    • B230 M2, M81KR:          ~ 20 % overhead
    • B230 M2, VIC 1280:          < 10 % overhead

Optimizing batch workload performance of SAP 3-tier systems on VMware vSphere 5.1 and Cisco UCS

Configure VM-FEX pass-through

Adjust the latency sensitivity of the virtual machine

Use latest generation CPU and NIC hardware

Feel free to ask or comment about observations you have made in your landscape.



  • Database server B200 M2 (bare-metal)
    • Oracle on SLES 11 SP2
    • CPU: X5650, 2.67 GHz, 2 sockets, 12 cores, 24 threads
  • Application Server B200 M3 (bare-metal or virtual)
    • SAP NetWeaver 7.0 EhP2 on SLES 11 SP2
    • CPU: E5-2680, 2.7 GHz, 2 sockets, 16 cores, 32 threads
    • bare-metal configuration: 8 cores of socket 1 online, all other threads offline
    • VM configuration: vSphere 5.1, 8 vCPUs, 8 GB RAM, vmxnet3
  • Application Server B230 M2 (bare-metal or virtual)
    • SAP NetWeaver 7.0 EhP2 on SLES 11 SP2
    • CPU: E7-2870, 2.4 GHz, 2 sockets, 20 cores, 40 threads
    • bare-metal configuration: 10 cores of socket 1 online, all other threads offline
    • VM configuration: vSphere 5.1, 8 vCPUs, 8 GB RAM, vmxnet3