cancel
Showing results for 
Search instead for 
Did you mean: 

How to tune the ASE IO performance on docker container?

former_member232292
Participant
0 Kudos
831

Dear All,

I want to create a docker image for sybase ASE/IQ. And I got some problems these days -- while the DB running, there always are much more higher extra IO on the host generated by kworker threads. It impacts the IO performance heavily. I can't find solution for it. Please kind advise. Here's the details --

I'm using a image of sles11 from docker hub -- https://hub.docker.com/r/darksheer/sles11sp4 -- And installed Sybase ASE 15.7sp141 on the container of it. Then while I'm creating the DB server, I found --

1. The srvbuildres command runs very slow -- usually it only takes 3-5 minutes to finish on host, but it takes 1.5h to complete on the docker container.

2. I used "top -d 1" and "iostat -x -k 1" to check IO busy -- found the io_wait is always low , but the svctm are high -- it means the IO is very slow.

3. I use pidstat to trace the IO request on the host , and found -- most IOs were consumed by kworker threads --

Here's a sample while I creating a small test -- create database -- on docker container --

08:06:05 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command

08:06:06 1105 66252 0.00 16.00 0.00 0 dataserver

08:06:06 0 66528 0.00 0.00 0.00 1 kworker/2:2

08:06:06 0 66574 0.00 2400.00 0.00 0 kworker/1:0

08:06:06 0 66584 0.00 96.00 0.00 0 kworker/u256:1

08:06:06 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command

08:06:07 0 65159 0.00 720.00 0.00 0 kworker/u256:7

08:06:07 1105 66252 0.00 112.00 0.00 0 dataserver

08:06:07 0 66528 0.00 11696.00 0.00 2 kworker/2:2

08:06:07 0 66530 0.00 14368.00 0.00 0 kworker/3:1

08:06:07 0 66573 0.00 4768.00 0.00 0 kworker/0:2

08:06:07 0 66574 0.00 4960.00 0.00 1 kworker/1:0

08:06:07 0 66584 0.00 848.00 0.00 0 kworker/u256:1

08:06:07 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command

08:06:08 0 65159 0.00 2304.00 0.00 0 kworker/u256:7

08:06:08 1105 66252 0.00 208.00 0.00 0 dataserver

08:06:08 0 66528 0.00 18464.00 0.00 0 kworker/2:2

08:06:08 0 66530 0.00 20608.00 0.00 1 kworker/3:1

08:06:08 0 66573 0.00 2256.00 0.00 0 kworker/0:2

08:06:08 0 66574 0.00 18256.00 0.00 0 kworker/1:0

08:06:08 0 66584 0.00 192.00 0.00 0 kworker/u256:1

The IO of kworker is much higher than the DB process -- "dataserver", and made the "create database" completed in 5minutes. And I made a same test on the host, the pidstat shows --

eisen-suse11:~ # pidstat -d 1

Linux 3.0.101-63-default (eisen-suse11) 01/19/22 _x86_64_

13:30:07 PID kB_rd/s kB_wr/s kB_ccwr/s Command

13:30:08 PID kB_rd/s kB_wr/s kB_ccwr/s Command

13:30:09 PID kB_rd/s kB_wr/s kB_ccwr/s Command

13:30:10 4860 0.00 4.00 0.00 isql

13:30:10 PID kB_rd/s kB_wr/s kB_ccwr/s Command

13:30:11 4845 404.00 404.00 0.00 dataserver

13:30:11 PID kB_rd/s kB_wr/s kB_ccwr/s Command

13:30:12 PID kB_rd/s kB_wr/s kB_ccwr/s Command

So without that kworker, the same "create database" command completed just in 1 seconds... I can't search document for it, only found how to limit the CPU/Memory/GPU resource of kworker-- https://docs.docker.com/config/containers/resource_constraints/ -- But no comments on IO tunning. Please kind help. Thanks in advance for any ideas.

Regards

Eisen

former_member232292
Participant
0 Kudos

I made an analyze on the pidstat output and found --

docker:/tmp # cat d2_pid.log |grep dataserver|awk 'BEGIN{io=0} {io=io+$5} END{print io}'

897640

docker:/tmp # cat d2_pid.log |grep kworker|awk 'BEGIN{io=0} {io=io+$5} END{print io}'

5.21821e+07

the IO from kworker is about 50 times of io from dataserver...And I test with another Sybase ASE docker image from dockerhub -- ASE16.0 over Centos-- it's just the same...No idea if SAP iq would be better...

Accepted Solutions (1)

Accepted Solutions (1)

former_member232292
Participant

I find the key --

Because the docker's host is SLES12 so all the FS on it are all default BTRFS, And this BTRFS will generate lots of jounal logging activities while DB running on docker container.

So now with putting the device file on ext3/ext4 FS and mounted to docker container. issue fixed.

Answers (7)

Answers (7)

former_member232292
Participant

@Chris, Baker

Thanks -- I've tesed all the 3 -- 1. Inside container itself; 2. On the external volume made by "--mount -v..."; 3. On the external folder mounted to container with "--mount type=bind"... All the same... Once there was 1 IO of DB, there would be 50+ IOs on host with kworker... And if there's no activity on DB, the kworkers are all in peace, waiting like wolf hunting...

@Ben, Slade

Thanks a lot. But it's not due to memory fragment -- As I tested, I also monitored the swap and svmon output -- I found there's not many memory defragment. And your option on "direct_io" is really helpful... So I made another test right now -- But -- I found the ASE is already using "direct_io" on device --

00:0000:00000:00000:2022/01/21 08:51:17.38 kernel Virtual device 0 started using asynchronous (with DIRECTIO) i/o.

sladebe
Active Participant
0 Kudos

This forum comment reminded me of your question:

https://github.com/moby/moby/issues/21485#issuecomment-222941103

the forum comment says:

cyphar commented on Jun 1, 2016:

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

  1. Switch IO scheduler on the underlying disk to 'deadline' - by that
    you'll completely lose propotional IO weighting between blkio cgroups and
    also some other features of CFQ IO scheduler. But it may work fine.
    You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

  1. A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

  1. Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

FYI, "CFQ" mean completely fair scheduler. For our regular Linux (non-docker) ASE servers, we use the Linux "deadline" I/O scheduler.
former_member232292
Participant
0 Kudos

Ah. Great! Thanks a lot for your sharing.

sladebe
Active Participant
0 Kudos

This forum comment reminded me of your question:

https://github.com/moby/moby/issues/21485#issuecomment-222941103

the forum comment says:

cyphar commented on Jun 1, 2016:

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

  1. Switch IO scheduler on the underlying disk to 'deadline' - by that
    you'll completely lose propotional IO weighting between blkio cgroups and
    also some other features of CFQ IO scheduler. But it may work fine.
    You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

  1. A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

  1. Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

FYI, "CFQ" mean completely fair scheduler. For our regular Linux (non-docker) ASE servers, we use the Linux "deadline" I/O scheduler.
sladebe
Active Participant
0 Kudos

Re: I used "top -d 1" and "iostat -x -k 1" to check IO busy -- found the io_wait is always low , but the svctm are high -- it means the IO is very slow.

svctm is high is a big deal. That means your Docker I/O implementation is bad. All the kworker busy stuff could be the result rather than the cause of the problem. Ie., ASE queues up a lot of async I/O causing more work in the kernel? (this is conjecture, not a definite cause)

Can you talk to your hosting provider to get faster I/O for your Docker instance? Eg., allocate a host on the cloud with locally attached SSD disks. It'll cost

FYI, I use these iostat options to monitor I/O:

iostat -xmty 5
# -x Display extended statistics.
# -m Display statistics in megabytes per second.
# -t Print the time for each report displayed
# -y Omit first report with statistics since system boot
former_member232292
Participant
0 Kudos

I've made another test on this docker container with fio not ASE... And found -- while fio produced IO-- there are also some extra kworer IOs -- but they are only a few... Only 2.6% of the IO from fio...

ffdd4b529ece:/tmp/installer # fio --ioengine=libaio --bs=8k --direct=1 --rw=randrw --filename=/data/diskspeed --size=200M --iodepth=8 --runtime=60 --name=ttt

ttt: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=8

fio-2.2.10

Starting 1 process

ttt: Laying out IO file(s) (1 file(s) / 200MB)

......

docker:~ # cat /tmp/pidstat.log|grep fio |awk 'BEGIN{ios=0} {ios=ios+$5} END{print ios}'

102159

docker:~ # cat /tmp/pidstat.log|grep kworker |awk 'BEGIN{ios=0} {ios=ios+$5} END{print ios}'

2741.97

sladebe
Active Participant
0 Kudos

kworker I/O might have to do with kernel memory management in Linux. I've had previous problems with the somewhat related kswap process spinning during times when Linux kernel memory is fragmented. Non-uniform-memory-architecture (NUMA) can also get involved if the kernel is spending time moving memory pages to the memory node using the page the most (which doesn't make sense with a database server)

In Linux, all regular I/O (filesystem I/O) uses the virtual memory subsystem to do it's caching. This can sometimes have the effect of causing the Linux virtual memory management system to start thrashing.

One thing to try is use block devices with direct_io instead of buffered filesystem I/O (This can sometimes decrease performance because Sybase has to do the caching of the data instead of the filesystem. But this can sometimes increase performance by preventing the OS from getting overloaded with useless cached I/O)

Having said this, it could also be something totally different like a bad driver interacting poorly with the OS.

c_baker
Product and Topic Expert
Product and Topic Expert
0 Kudos

Where are your data devices? Are they managed under another container instance, on the host, or part of the ASE container instance?