on 2013 Apr 05 4:15 PM
I ask because I don't see any references to huge memory pages in the IQ manuals or white papers. I know ASE supports it. If not, why not?
Jason,
There has been some discussion that the most recent releases (16.0, 15.4 esd3) now support transparent hugepages. But TSP is much different than the dedicated hugepages. We don't support dedicated hugepages like ASE does at this point.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks Mark. What is the ETA on supporting real huge pages?
For those of you that aren't familiar with huge pages or transparent huge pages on Linux, you can read up on it at http://lwn.net/Articles/423584/
For Sybase IQ on Linux. Huge Pages should be set to Zero. Perfomance of IQ is very poor with Linux systems configured with huge pages.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I actually find it hard to believe that IQ would have poor performance with huge pages. I suspect whomever did the internal testing for it misconfigured the huge memory pages or the code within the test binary was not correctly written so that the huge pages were not being used or used consistently.
Huge page support was disabled with SA CR 728597 due to RH bug.
CR 728597:
This problem is related to a possible bug in the transparent huge pages (THP) feature introduced in these operating system versions. Red Hat bug 891857 has been created to track this issue.
The problem can be triggered by calling an external environment, xp_cmdshell, or other procedure that causes a fork while other I/O is occurring. A known limitation with the Linux kernel limits the use of fork while doing O_DIRECT I/O operations. Essentially what can happen is that the data can come from or go to the wrong process’ memory after the fork. SQL Anywhere performs O_DIRECT I/O operations according to the documented safe usage. However, THP appears to cause further problems and the O_DIRECT I/O data comprising database page reads/writes appears to get lost.
This has been fixed by disabling THP on the SQL Anywhere cache memory where possible. We are working with Red Hat to identify a solution within the operating system.
There are two possible workarounds:
1. disable THP on a system-wide basis with one of the following methods:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
boot with transparent_hugepage=never
2. disable O_DIRECT I/O for database file reads/writes with one of the following methods:
use the -u flag on the server command line
set SA_DISABLE_DIRECTIO=1 in the environment before starting the server
Transparent huge pages cannot be disabled just for the SQL Anywhere cache memory on Red Hat Enterprise Linux 6. SQL Anywhere now disables direct I/O if transparent huge pages are enabled and cannot be disabled on a per-allocation basis. A warning will be printed as the database file is being opened to indicate that direct I/O is disabled due to this bug. This is similar to how SQL Anywhere handles file systems that do not support direct I/O.
Customers using RHEL 6 who wish to continue using direct I/O should use the previously-stated command to disable THP at the system level.
A word of caution if you follow item #2. Disabling O_DIRECT can lead to issues as the data pages for the catalog could be in cache but not written to disk. We use O_DIRECT to guarantee data integrity for the catalog. If you choose to disable direct IO, corruption could occur in rare instances as the OS would not have written the catalog data to disk before the system crash. For this reason, it is strongly recommended that TPH be disabled until RedHat has addressed the bug.
Jason,
I am the engineer who investigated this problem and reported it to RH. Although the bug description only mentioned RH, we have also seen the problem on Fedora. Fedora may be related to RH, but it is by no means the same kernel.
It has been brought to my attention that newer versions of SLES now support THP and have made some steps to optimize it further. I will try to reproduce the bug on this version of SLES and report back to this forum.
It would help to know why you are interested in THP support. Do you have an immediate need, or is this strictly academic? If you have an immediate need, that would help to motivate the engineers involved. How urgent is your desire to have THP support?
The question isn't really around THP and the bug, etc.
I think the bigger issue/question is around HUGEPAGES in general and why IQ doesn't directly support them. In the ASE world using HUGEPAGES on large memory systems has been proven to be faster (about 10% in ASE engineering tests). IQ will generally be on larger hardware than ASE, so why wouldn't we support native HUGEPAGES like ASE does.
My two cents....
Phil,
Red Hat Enterprise Linux (or RHEL) is a commercially supported derivative of Fedora tailored to meet the requirements of enterprise customers. It is a commercial product from Red Hat which also sponsors Fedora as a community project. Fedora is upstream for Red Hat Enterprise Linux.. -- https://fedoraproject.org/wiki/Red_Hat_Enterprise_Linux?rd=RHEL
The bug is not a RHEL bug but a Linux Kernel bug (see http://froebe.net/blog/2013/06/17/does-anyone-have-any-details-on-redhat-linux-bug-891857/). Two very different things. A RHEL bug is a bug that is specific to the RHEL distribution and derivatives. A Linux Kernel bug affects all distributions using that kernel release.
As to why the interest in huge memory pages, please take a look at IBM's http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf , specifically the part of the Translation Lookup Buffer. I link to the pdf as it has a nice graphic showing what it is.
Transparent Huge memory Pages are very similar to Huge Pages with the largest benefit of it happening automatically without any special coding or configuration for it.
Oracle actively promotes the use of huge pages for Oracle RDBMS 11g and 12g. Sybase actively promotes huge pages for ASE. IBM actively promotes huge pages for DB2 (UDB).
I don't understand why IQ engineering is dragging their collective feet on this performance and corruption issue. I can think of many unkind reasons but I'm sure it is simply lack of resources and the Linux platform that always seems to get a second class citizen status within Sybase. (at least from the outside of SAP/Sybase)
With respect to ASE performance with huge memory pages, one of the reasons why the increase was only about 10% has to do with the fact that ASE's memory pages max out at 128k if using a 16k data page size server. If the ASE memory page was able to be the x86-64 huge memory page size of 2mb, performance should be much higher.
With IQ, the only thing that is stopping us from using huge pages is the Linux kernel bug.
Thanks for the extra background, Jason, as well as the reference to the upstream patch proposal.
For the record, although this is not an RHEL-specific bug, we had filed the bug with RedHat, since that's where we ran into it, hence the reference to the RH bug number.
Also, Fedora is upstream from RHEL, but RHEL is free to make additional modifications, hence my assertion that they are not the same kernel. I was sure that the bug gets at least that high up, but I wasn't certain if it occurred higher upstream.
As for ASE, I'm not familiar with the architecture of ASE. It's possible that they are not vulnerable to this bug. We notified the ASE developers about this issue and its consequences, but I am not aware of what came of it. For SA/IQ, working around this bug would require significant changes.
For ASE, it shouldn't occur very often due to the architecture of handling i/o.
I've been able to confirm that the problem occurs on the latest Linux kernel (on bare hardware). Next step is to confirm it using Linux KVM. If so, I can just package that up. I haven't tested using kvm yet because of lack of time. I have three kids.
Anyone can reproduce it easily. The repro (C program) is in the link I posted.
Good to hear. FWIW, we haven't had any luck reproducing this on either VMWare or XenServer based VMs. Only real hardware so far. Also, some file systems seem to be more prone to it than others. We've only really had luck reproducing it on ext3/ext4, but we haven't been very scientific about that aspect.
Jason has already verified that the THP bug exists in the latest kernel, but I thought I would explicitly confirm that the THP bug is still an issue for SA/IQ in RHEL 6.4 as well as SLES 11.2. Thus, it is still important for us to disable THP on these platforms.
Note that on platforms other than RHEL 6.x, we disable THP only on the database page cache using madvise(), since that's the only place we use Direct I/O. RHEL 6.x does not support the madvise() call that's required for this, so we require THP to be disabled system-wide. If THP is enabled on RHEL 6.x, we fall back on disabling Direct I/O.
Given the architecture of SA/IQ, I honestly don't think we will be able to support huge pages until this bug has been fixed in the Linux kernel.
That's what I mean if you look at earlier replies. I'm hoping for SAP to push this with Redhat and SuSE (the two fully supported Linux distributions).
The fact that this bug has been open for FOUR YEARS indicates that it has been put on very low priority. We suffer from poor performance and risk corruption of our IQ databases meanwhile.
I can assure you that the pushing is happening. Sadly, Linus has a very negative view toward O_DIRECT, hence the low priority.
Also, there should be no risk of database corruption due to this issue, provided you accept the performance hit. For typical SA-scale databases we determined that the performance hit was only about 1-2%. However, I fully understand that at IQ scales, it's probably much more significant, though I haven't seen any actual numbers since I don't work directly with the IQ product.
I've heard a little bit more information from Red Hat:
The link to the kernel patch mentioned in http://froebe.net/blog/2013/06/17/does-anyone-have-any-details-on-redhat-linux-bug-891857/ does not give the full story about this kernel patch; as you can read on http://comments.gmane.org/gmane.linux.kernel.mm/31569 it was not the intention of the author of the patch to get it included in the Linux kernel in the form it was posted, and unfortunately Linus Torvalds himself seems to have been strongly against including this patch in the Linux Kernel; so even though the blog post you mention makes it sound like that there is a patch ready to fix the underlying issue in the Linux kernel this is unfortunately not the case; and as long as Linus Torvalds is against this it will be very difficult to fix since this is a very critical area of the Linux kernel
They also mentioned that Red Hat has a policy not to add patches to the kernel that are not accepted upstream, with the exception of some hardware drivers that are provided by vendors as binaries.
So now to lobby Linus....
thanks Phil!
Unfortunately, Linus isn't the maintainer for that portion of the Linux Kernel. He gets final say on patches sent to him by the maintainers... for the most part.
Linus Torvalds isn't the one to lobby http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS
Jason
That I can't speak to. I would imagine that it just comes down to prioritization of features.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
71 | |
10 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 | |
6 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.