on 2015 Jul 31 1:15 PM
Hi all,
is someone using SLES system with BtrFS (either / or database or both)? I'd like to hear (and share) experiences.
Regards,
Markus
Help others by sharing your knowledge.
AnswerRequest clarification before answering.
Hi Markus,
we are running btrfs for / since SLES11 SP2 on over 60 Systems.
In the meantime we only have SP3 / SP4 mixed ....
No problems so far.
Regards,
Daniel
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Daniel,
thank you for your input.
We run/ran roughly 80 systems on BtrFS and there seems to be a regression in Kernels > 3.0.101-0.29 that may corrupt the filesystem. SuSE is still investigating.
Seven systems have so far crashed with filesystem errors where the database had to be restored from backup, including two times our central BW (1,5+ TB), two systems could not be restored completely because the database online logs filesystem was hosed. Those crashes happened mostly under no to very little load. Older kernels do not have this problem, they run flawlessly.
Which kernel versions do you use?
Markus
Hi Markus,
We running the newest SP4 Kernels:
Linux 3.0.101-63-default #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c) x86_64 x86_64 x86_64 GNU/Linux
But we don't have databases on btrfs, only / (os installation) ,,, all SAN data are on ext3 since we use snapshots of the storage system.
Best regards, Daniel
For this problem there is one solution:
Boot an other / newer live linux like gentoo boot cd,
add one usb stick or other block device to the fs, delete snaps or data, then shrink the fs to
the orginal device only.
You must use an ohter linux since add / remove devices to a
btrfs is disabled in SLES.
Newer Kernels have also a protection (reserve some space),
so its allways mountable (even if its full) and you can allways
delete files.( Remember, Deletion of Files generates new Metadata ),
but i don't know if suse has backported this to ..
I personally like btrfs, but is has some edges you must know 😉
Best regards,
Daniel
Thank you Daniel.
We have eight broken systems now, mainly showing kernel oopses as the following - and marking the filesystem read only. In this case it's was "just" /usr/sap but we had other occurences, where it was the filesystem that holds the database data or log files. In that case the filesystem is broken and one has to restore from a backup.
[ 39.497688] WARNING: CPU: 5 PID: 3145 at ../fs/btrfs/super.c:259 __btrfs_abort_transaction+0x4b/0x120 [
btrfs]()
[ 39.497690] BTRFS: Transaction aborted (error -5)
[ 39.497692] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet btrfs xfs libcrc32c nls_iso8859_1
nls_cp437 raid6_pq xor vfat fat vmw_balloon coretemp ppdev crc32c_intel vmxnet3 vmw_vmci shpchp parport_pc
pcspkr i2c_piix4 serio_raw processor battery ac parport efivars button efivarfs ext4 crc16 mbcache jbd2 v
mwgfx ttm drm floppy sr_mod cdrom sd_mod ata_generic ata_piix ahci libahci libata vmw_pvscsi dm_mirror dm_
region_hash dm_log dm_mod sg scsi_mod autofs4
[ 39.497743] Supported: Yes
[ 39.497747] CPU: 5 PID: 3145 Comm: sapstartsrv Not tainted 3.12.44-52.10-default #1
[ 39.497750] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B6
4.1410210136 10/21/2014
[ 39.497754] ffffffffa06b5550 ffffffff81510581 ffff8807c4b21ad8 ffffffff81055362
[ 39.497759] ffff8808147ffa28 ffff8807c4b21b28 00000000fffffffb ffffffffa06b3e50
[ 39.497764] 00000000000016b2 ffffffff810553ec ffffffffa06b8c88 0000000000000020
[ 39.497769] Call Trace:
[ 39.497791] [<ffffffff8100471d>] dump_trace+0x7d/0x2d0
[ 39.497798] [<ffffffff81004a04>] show_stack_log_lvl+0x94/0x170
[ 39.497804] [<ffffffff81005e31>] show_stack+0x21/0x50
[ 39.497812] [<ffffffff81510581>] dump_stack+0x41/0x51
[ 39.497821] [<ffffffff81055362>] warn_slowpath_common+0x82/0xc0
[ 39.497829] [<ffffffff810553ec>] warn_slowpath_fmt+0x4c/0x50
[ 39.497844] [<ffffffffa060dc0b>] __btrfs_abort_transaction+0x4b/0x120 [btrfs]
[ 39.497883] [<ffffffffa062065f>] __btrfs_free_extent+0x30f/0xc40 [btrfs]
[ 39.497930] [<ffffffffa0625ad2>] __btrfs_run_delayed_refs+0x912/0x11d0 [btrfs]
[ 39.497981] [<ffffffffa062a459>] btrfs_run_delayed_refs.part.66+0x69/0x280 [btrfs]
[ 39.498037] [<ffffffffa063c40d>] __btrfs_end_transaction+0x2ad/0x3d0 [btrfs]
[ 39.498113] [<ffffffffa0645629>] btrfs_truncate+0x1e9/0x2b0 [btrfs]
[ 39.498195] [<ffffffffa0646100>] btrfs_setattr+0x230/0x2e0 [btrfs]
[ 39.498266] [<ffffffff811bc6e1>] notify_change+0x231/0x390
[ 39.498275] [<ffffffff8119fca5>] do_truncate+0x65/0x90
[ 39.498283] [<ffffffff8119ffff>] do_sys_ftruncate.constprop.11+0x11f/0x180
[ 39.498294] [<ffffffff8151e789>] system_call_fastpath+0x16/0x1b
[ 39.498302] [<00007ffff5e3fa97>] 0x7ffff5e3fa96
[ 39.498305] ---[ end trace 4280fc12485ab7b5 ]---
Those problems seem to occur really randomly, in most of the cases they happen under no load so when the system is just sitting there.
They all appeared when we used kernels of SLES 11 SP3 > 3.0.101-0.29, the most of them with the latest kernel 0.55 but also with SLES12 (as you can see here).
Markus
With SLES12 btrfs is the default file system for /. This is part of the idea to get a rollback functionality for the *system*, for example after a failed system update.
The following tutorial session from SUSECon2014 explains a bit the ideas, concepts, requirements and limits. It's about "myth and truth".
http://www.susecon.com/doc/2014/sessions/TUT5802.pdf
Hope that helps a bit to differ between the SLES11 (SP3) and SLES12 feature set.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Fabian,
thank you for sharing.
I know that and snapshot functionality is just what I planned to use (pre-downtime on upgrades etc.) instead of storage system snapshots.
Unfortunately seven (7) systems crashed with various BtrFS related filesystem errors in the last five months, we even had data los on two (smaller) systems because the log and log mirror filesystem was hosed (on BtrFS).
One system had a root filesystem with LVM and BtrFS and was "full", rebalancing didn't work because there was allegedly "no space available", so eventually we also had to restore that system from a backup (that one was on SLES12). The SuSE support could also not help us.
It may work, our experiences just show it's not really stable, neither for root nor for application or database data, hence we migrated all our instances (60+) away from BtrFS and use ext3 and xfs now.
--
Markus
I hope you informed SUSE support about your migration from BtrFS to ext3, so they could communicate your bad experiences with the developement. Internally I try a research on this issue.
If interested, please send me a direct eMail, because I could not see your contact data here in SCN. Could be needed that I know your company name to reference your issue.
Hi Fabian,
yes, there are a few SR's open with SuSE, we got a kernel who should help in finding out the problem but since we migrated all systems away already in the last weeks (I literally had no weekend in 2 months) we don't have a system on BtrFS any more and hence we can't implement the kernel to see if it helps narrowing down the original issue.
Our SAP customer no. is 36620, if that may help. You can also check the Novell SRs 10954983661 and 10964582722.
Another problem with BtrFS as root filesystem is the fact, that a DB2 installation expects a real "filesystem" for /tmp and not a submount. An OSS call (560294/2015) stated, that BtrFS as /tmp submount for DB2 installation is not supported.
Regards,
Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.