S/4 HANA Performance Analysis and tuning

Adam Li

posted Apr 14 '16 at 3:43 pm

Since the S/4 build, we have noticed that the system performace dropped dramatically. Even if there are very few users login, but the users have complained they experienced very slow response.

The symptom: when users try to run a transaction, the initial load is always very slow. If the same transaction runs the second time right away, it seems faster. However, within the transaction, if poke here and there, there will always be somewhere extremely slow, like taking 1 minute to response

This is a very good example to do troubleshooting. We will start an analysis topic to see how we can fix this.

Since the S/4 build, we have noticed that the system performace dropped dramatically. Even if there are very few users login, but the users have complained they experienced very slow response. The symptom: when users try to run a transaction, the initial load is always very slow. If the same transaction runs the second time right away, it seems faster. However, within the transaction, if poke here and there, there will always be somewhere extremely slow, like taking 1 minute to response This is a very good example to do troubleshooting. We will start an analysis topic to see how we can fix this.

Project 'Clam' founder

Adam Li

posted Apr 14 '16 at 5:13 pm

First of all let's take a look at the overall system landscape, regarding the user SAPGUI login session

As you can see, this is not a typical SAP implementation, it has:

3 different OS: Redhat Linux, SUSE Linux, Win2012
The ASCS is sitting in the same box as D01 instance of Redhat Linux
The //SAPMNT is from a separate host running Windows 2012
The Linux systems use CIFS to connect to SAPMNT, the Windows 2012 use SAMBA to connect

From network topological point of view, this isn't a nice architecture. However, considering everything is within a single ESXi box, then a lot of these network communications are actually from the memory.

First of all let's take a look at the overall system landscape, regarding the user SAPGUI login session ![570fc0e07b641.jpg](serve/attachment&path=570fc0e07b641.jpg) As you can see, this is not a typical SAP implementation, it has: - 3 different OS: Redhat Linux, SUSE Linux, Win2012 - The ASCS is sitting in the same box as D01 instance of Redhat Linux - The //SAPMNT is from a separate host running Windows 2012 - The Linux systems use CIFS to connect to SAPMNT, the Windows 2012 use SAMBA to connect From network topological point of view, this isn't a nice architecture. However, considering everything is within a single ESXi box, then a lot of these network communications are actually from the memory.

Project 'Clam' founder

Adam Li

posted Apr 14 '16 at 6:03 pm

From a SAP Basis troubleshooting point of view, the first step is always go to check buffer hit ratio, using Tcode ST02

The result wasn't too bad, we don't see any red alert showing problem

Anyway for preventative purpose, we still increase the size of the following 2 parameters. One of them is related to the SAPGUI performance at presentation server

From a SAP Basis troubleshooting point of view, the first step is always go to check buffer hit ratio, using Tcode ST02 The result wasn't too bad, we don't see any red alert showing problem ![570fc41b2a312.jpg](serve/attachment&path=570fc41b2a312.jpg) Anyway for preventative purpose, we still increase the size of the following 2 parameters. One of them is related to the SAPGUI performance at presentation server ![570fcd4eac2fe.jpg](serve/attachment&path=570fcd4eac2fe.jpg)

Project 'Clam' founder

Adam Li

posted Apr 14 '16 at 6:12 pm

Meanwhile we have also done the following

Upgrade to the latest Kernel version 745 patch level 100
Run SGEN on Basis component as a test
From VMWare perspective, reserve the memory of all 3 application servers to be 8GB each

After restart of all 3 application servers, we understand the system slowness still not changed. Meanwhile from all the time we monitor the CPU usage, I/O and memory of APP1/APP2/APP3 servers as well as HDB server, none of them showing any load. From user's frondend, what typically happens was the screen freezes for about 1 minute, then suddently returns the result. At that moment we can see minor CPU usage on the HANA HDB server

Meanwhile we have also done the following - Upgrade to the latest Kernel version 745 patch level 100 - Run SGEN on Basis component as a test - From VMWare perspective, reserve the memory of all 3 application servers to be 8GB each After restart of all 3 application servers, we understand the system slowness still not changed. Meanwhile from all the time we monitor the CPU usage, I/O and memory of APP1/APP2/APP3 servers as well as HDB server, none of them showing any load. From user's frondend, what typically happens was the screen freezes for about 1 minute, then suddently returns the result. At that moment we can see minor CPU usage on the HANA HDB server

Project 'Clam' founder

edited Apr 14 '16 at 6:13 pm

Adam Li

posted Apr 14 '16 at 7:12 pm

So finally we decided to do the trace on it. First we do ST12 trace, to see if the lag was coming from the ABAP, DB or System

Before running trace, we'll have to create another user 'ALI1', so that it will not mix up with the testing user 'ALI'. Both users login to app1 applicatoin server

Then we login as 'ALI1', then setup ST12, filter by user 'ALI'

The result shows 97% of the slowness comes from the 'System'

This is very misleading. Since when we were doing the testing, we opened multiple consoles to monitor the system usage. None of the CPU, memory or I/O has significant surge, in app server 1 and HANA HDB. From VMWare ESX performance tab, we don't see Network or heavy I/O usage either.

So, what does the 'System' mean?

So finally we decided to do the trace on it. First we do ST12 trace, to see if the lag was coming from the ABAP, DB or System Before running trace, we'll have to create another user 'ALI1', so that it will not mix up with the testing user 'ALI'. Both users login to app1 applicatoin server Then we login as 'ALI1', then setup ST12, filter by user 'ALI' ![570fdc7434acc.jpg](serve/attachment&path=570fdc7434acc.jpg) The result shows 97% of the slowness comes from the 'System' ![570fdcbb8fd0c.jpg](serve/attachment&path=570fdcbb8fd0c.jpg) This is very misleading. Since when we were doing the testing, we opened multiple consoles to monitor the system usage. None of the CPU, memory or I/O has significant surge, in app server 1 and HANA HDB. From VMWare ESX performance tab, we don't see Network or heavy I/O usage either. ![570fdd288af18.jpg](serve/attachment&path=570fdd288af18.jpg) So, what does the 'System' mean?

Project 'Clam' founder

edited Apr 14 '16 at 7:13 pm

Adam Li

posted Apr 14 '16 at 7:21 pm

Finally, we decided to do another trace using ST05, performance trace.

From here we can see most of the time consuming step was doing SQL selection of HANA HDB!

This is very interesting findings. From what we are understanding, when Database activities happen like this (select * from xxx), for HANA basically it's CPU calculation against memory, such as the HANA index server and name server etc, thus at least we should be able to see some CPU usages in the HANA HDB server. But actually, we did not even see 5% of CPU usage in HDB server!

Finally, we decided to do another trace using ST05, performance trace. ![570fde2bbaa3c.jpg](serve/attachment&path=570fde2bbaa3c.jpg) ![570fdea09b390.jpg](serve/attachment&path=570fdea09b390.jpg) ![570fdea0cf274.jpg](serve/attachment&path=570fdea0cf274.jpg) From here we can see most of the time consuming step was doing SQL selection of HANA HDB! This is very interesting findings. From what we are understanding, when Database activities happen like this (select * from xxx), for HANA basically it's CPU calculation against memory, such as the HANA index server and name server etc, thus at least we should be able to see some CPU usages in the HANA HDB server. But actually, we did not even see 5% of CPU usage in HDB server!

Project 'Clam' founder

Adam Li

posted Apr 14 '16 at 7:39 pm

To be able to resolve this issue, the answer is very clear: Increase the HANA HDB performance. The slowness is not related to anything from the application servers, dis-regarding the way we setup: 3 different OS, cifs/samba combined sapmnt export, kernel parameters, or network, etc. It's purely HANA Database performance causes the slow reaction.

Why HANA DB was slow? From a bigger picture perspective, there are other VMs consume resource from HANA DB VM. Especially the CPU/IO/Memory.

So we decided to do the following:

Migrate all the HANA DB files to a separate faster RAID 1 array, meanwhile moved all the other VMs from that RAID 1 array to some other storage LUNs. So that HANA HDB can have kind of a 'dedicated' storage that won't be impacted by other VM activities.
Reserve 10K+ MHZ CPU power for HANA DB
Of the 150GB memory that we allocate for HANA HDB, reserve 100GB

After finished all of these, we started the HDB server again. This time we noticed that the transaction reponse was much faster like 100x times!

Conclusion: Compared to other traditional databases such as Oracle, SQL server and DB2, HANA DB definitely requires more resource to run in a 'normal' way. We believe in a lot of Cloud service providers, they won't tell you whether or not your hosting HANA DB has the hardware resource dedication. In this case, you'll have to the performance analysis alone and show the evidence to the Cloud service provider.

I hope the above troubleshooting sessions are not too boring, and you're feel free to contact me if you have any further questions or doubt.

Cheers,
Adam Li

To be able to resolve this issue, the answer is very clear: Increase the HANA HDB performance. The slowness is not related to anything from the application servers, dis-regarding the way we setup: 3 different OS, cifs/samba combined sapmnt export, kernel parameters, or network, etc. It's purely HANA Database performance causes the slow reaction. Why HANA DB was slow? From a bigger picture perspective, there are other VMs consume resource from HANA DB VM. Especially the CPU/IO/Memory. So we decided to do the following: - Migrate all the HANA DB files to a separate faster RAID 1 array, meanwhile moved all the other VMs from that RAID 1 array to some other storage LUNs. So that HANA HDB can have kind of a 'dedicated' storage that won't be impacted by other VM activities. - Reserve 10K+ MHZ CPU power for HANA DB - Of the 150GB memory that we allocate for HANA HDB, reserve 100GB ![570fe1af1dbb1.jpg](serve/attachment&path=570fe1af1dbb1.jpg) After finished all of these, we started the HDB server again. This time we noticed that the transaction reponse was much faster like 100x times! Conclusion: Compared to other traditional databases such as Oracle, SQL server and DB2, HANA DB definitely requires more resource to run in a 'normal' way. We believe in a lot of Cloud service providers, they won't tell you whether or not your hosting HANA DB has the hardware resource dedication. In this case, you'll have to the performance analysis alone and show the evidence to the Cloud service provider. I hope the above troubleshooting sessions are not too boring, and you're feel free to contact me if you have any further questions or doubt. Cheers, Adam Li

Project 'Clam' founder

Adam Li

posted Apr 19 '16 at 1:52 am

Follow up: Further tuning includes: Hyperthreading disabled from VMWare, NUMA settings, vCPU to physical CPU mapping, make sure 4 vCPUs mapped to the 1st socket CPU, and the other 4 vCPUs mapped to the 2nd socket CPU.

Also patch HANA 1.0 SPS11 to 112.02, which should resolved some big table query performance issue.

After all of above I can feel the S4 is significantly faster than before!

Follow up: Further tuning includes: Hyperthreading disabled from VMWare, NUMA settings, vCPU to physical CPU mapping, make sure 4 vCPUs mapped to the 1st socket CPU, and the other 4 vCPUs mapped to the 2nd socket CPU. Also patch HANA 1.0 SPS11 to 112.02, which should resolved some big table query performance issue. After all of above I can feel the S4 is significantly faster than before!

Project 'Clam' founder

nicky

posted Apr 30 '16 at 2:42 pm

it's a good practice.

ashutyson

posted Oct 29 '18 at 8:44 pm

Hi Adam,
Thanks for such a nice blog. I have followed the steps you have given but still facing same performance issue.
For every new transaction system is taking 4-5 minutes and next time it works faster. I have already done SGEN but still same issue. Its really frustrating to see such a bad performance on HANA system which should be faster than SQL or Oracle database but instead it is 10 times slower than them.
I have installed S4HANA 1709 FP1 with latest version of Kernel on SLES 12 SP3 and HDB 2.0 SP3.

I would appreciate any suggestions on how i can fix this lag.

Hi Adam, Thanks for such a nice blog. I have followed the steps you have given but still facing same performance issue. For every new transaction system is taking 4-5 minutes and next time it works faster. I have already done SGEN but still same issue. Its really frustrating to see such a bad performance on HANA system which should be faster than SQL or Oracle database but instead it is 10 times slower than them. I have installed S4HANA 1709 FP1 with latest version of Kernel on SLES 12 SP3 and HDB 2.0 SP3. I would appreciate any suggestions on how i can fix this lag.

S/4 HANA Performance Analysis and tuning

Pending draft

Edit history