on 2015 Jul 31 1:15 PM
Hi all,
is someone using SLES system with BtrFS (either / or database or both)? I'd like to hear (and share) experiences.
Regards,
Markus
Help others by sharing your knowledge.
AnswerRequest clarification before answering.
Hi Markus,
we are running btrfs for / since SLES11 SP2 on over 60 Systems.
In the meantime we only have SP3 / SP4 mixed ....
No problems so far.
Regards,
Daniel
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Daniel,
thank you for your input.
We run/ran roughly 80 systems on BtrFS and there seems to be a regression in Kernels > 3.0.101-0.29 that may corrupt the filesystem. SuSE is still investigating.
Seven systems have so far crashed with filesystem errors where the database had to be restored from backup, including two times our central BW (1,5+ TB), two systems could not be restored completely because the database online logs filesystem was hosed. Those crashes happened mostly under no to very little load. Older kernels do not have this problem, they run flawlessly.
Which kernel versions do you use?
Markus
Hi Markus,
We running the newest SP4 Kernels:
Linux 3.0.101-63-default #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c) x86_64 x86_64 x86_64 GNU/Linux
But we don't have databases on btrfs, only / (os installation) ,,, all SAN data are on ext3 since we use snapshots of the storage system.
Best regards, Daniel
For this problem there is one solution:
Boot an other / newer live linux like gentoo boot cd,
add one usb stick or other block device to the fs, delete snaps or data, then shrink the fs to
the orginal device only.
You must use an ohter linux since add / remove devices to a
btrfs is disabled in SLES.
Newer Kernels have also a protection (reserve some space),
so its allways mountable (even if its full) and you can allways
delete files.( Remember, Deletion of Files generates new Metadata ),
but i don't know if suse has backported this to ..
I personally like btrfs, but is has some edges you must know 😉
Best regards,
Daniel
Thank you Daniel.
We have eight broken systems now, mainly showing kernel oopses as the following - and marking the filesystem read only. In this case it's was "just" /usr/sap but we had other occurences, where it was the filesystem that holds the database data or log files. In that case the filesystem is broken and one has to restore from a backup.
[ 39.497688] WARNING: CPU: 5 PID: 3145 at ../fs/btrfs/super.c:259 __btrfs_abort_transaction+0x4b/0x120 [
btrfs]()
[ 39.497690] BTRFS: Transaction aborted (error -5)
[ 39.497692] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet btrfs xfs libcrc32c nls_iso8859_1
nls_cp437 raid6_pq xor vfat fat vmw_balloon coretemp ppdev crc32c_intel vmxnet3 vmw_vmci shpchp parport_pc
pcspkr i2c_piix4 serio_raw processor battery ac parport efivars button efivarfs ext4 crc16 mbcache jbd2 v
mwgfx ttm drm floppy sr_mod cdrom sd_mod ata_generic ata_piix ahci libahci libata vmw_pvscsi dm_mirror dm_
region_hash dm_log dm_mod sg scsi_mod autofs4
[ 39.497743] Supported: Yes
[ 39.497747] CPU: 5 PID: 3145 Comm: sapstartsrv Not tainted 3.12.44-52.10-default #1
[ 39.497750] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B6
4.1410210136 10/21/2014
[ 39.497754] ffffffffa06b5550 ffffffff81510581 ffff8807c4b21ad8 ffffffff81055362
[ 39.497759] ffff8808147ffa28 ffff8807c4b21b28 00000000fffffffb ffffffffa06b3e50
[ 39.497764] 00000000000016b2 ffffffff810553ec ffffffffa06b8c88 0000000000000020
[ 39.497769] Call Trace:
[ 39.497791] [<ffffffff8100471d>] dump_trace+0x7d/0x2d0
[ 39.497798] [<ffffffff81004a04>] show_stack_log_lvl+0x94/0x170
[ 39.497804] [<ffffffff81005e31>] show_stack+0x21/0x50
[ 39.497812] [<ffffffff81510581>] dump_stack+0x41/0x51
[ 39.497821] [<ffffffff81055362>] warn_slowpath_common+0x82/0xc0
[ 39.497829] [<ffffffff810553ec>] warn_slowpath_fmt+0x4c/0x50
[ 39.497844] [<ffffffffa060dc0b>] __btrfs_abort_transaction+0x4b/0x120 [btrfs]
[ 39.497883] [<ffffffffa062065f>] __btrfs_free_extent+0x30f/0xc40 [btrfs]
[ 39.497930] [<ffffffffa0625ad2>] __btrfs_run_delayed_refs+0x912/0x11d0 [btrfs]
[ 39.497981] [<ffffffffa062a459>] btrfs_run_delayed_refs.part.66+0x69/0x280 [btrfs]
[ 39.498037] [<ffffffffa063c40d>] __btrfs_end_transaction+0x2ad/0x3d0 [btrfs]
[ 39.498113] [<ffffffffa0645629>] btrfs_truncate+0x1e9/0x2b0 [btrfs]
[ 39.498195] [<ffffffffa0646100>] btrfs_setattr+0x230/0x2e0 [btrfs]
[ 39.498266] [<ffffffff811bc6e1>] notify_change+0x231/0x390
[ 39.498275] [<ffffffff8119fca5>] do_truncate+0x65/0x90
[ 39.498283] [<ffffffff8119ffff>] do_sys_ftruncate.constprop.11+0x11f/0x180
[ 39.498294] [<ffffffff8151e789>] system_call_fastpath+0x16/0x1b
[ 39.498302] [<00007ffff5e3fa97>] 0x7ffff5e3fa96
[ 39.498305] ---[ end trace 4280fc12485ab7b5 ]---
Those problems seem to occur really randomly, in most of the cases they happen under no load so when the system is just sitting there.
They all appeared when we used kernels of SLES 11 SP3 > 3.0.101-0.29, the most of them with the latest kernel 0.55 but also with SLES12 (as you can see here).
Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.