Side views

JimSpath · ‎2024 May 07

A few years ago, I started using open source software. I exaggerate; it's been decades.

X Windows on NetBSD with SAP GUI etc.

The above screenshot show ST04, among other artifacts and a November 1999 time stamp, which puts the SAP R/3 ERP version somewhere between 3.1G / 1998: https://www.hpcwire.com/1998/01/16/sun-posts-record-benchmark-sapr3/ and 4.5 / https://community.sap.com/t5/application-development-discussions/sap-r-3-different-versions-release-... . Among the open source software onscreen are Lynx (text browser) on BSD, links to Lynx on WindowsNT, VNC (network session connector), a music player, a screen-grabber, and a hacked-up version of XClock for multiple time-zones (xchrono). The commercial software was from Platinum, and is a distant relation to today's open source Zabbix suite. Scheduling and benchmark apps are not shown here.

The second image shows a further drill-down into database object: the CDCLS table and its primary index., along with commercial apps showing database space consumption and server performance.

Beta Test

I found out about the OS beta test release about the same time as I learned of the software quality assurance suite that had been added much later than the 1999-timeframe shown above. When I began running tests I figured it would be a pass/fail, run till it stops cycle. little did I know the twists on the road, nor incidentally, the duration of the test cycle window. Researching the quality tests history, I found Google "Summer of Code" projects that resulted in the user test framework being included in the base OS release. What does this mean to other SAP-focused technologists? Nothing specific, just a general idea that many opportunities to contribute exist; I picked one that I enjoyed checking out. Follow the thought process...

The BETA was first cut over a year ago, in December 2022

https://blog.netbsd.org/tnf/entry/netbsd_10_0_beta_available

Case 1 - repaired

I filed my first bug (#57284) in March 2023, on the BETA release of 12-Feb-2023.

NetBSD one.machine 10.0_BETA NetBSD 10.0_BETA (GENERIC) #0: Sun Feb 12 12:39:37 UTC 2023

And fixed in a couple weeks!

State-Changed-When: Sat, 15 Apr 2023 12:14:29 +0000

Let's back up, though. Why did I decide this bug was worthy of reporting, and what steps did I take to send a cogent write-up? One signal I noticed among the passes and failures was a difference on a test between different environments. In this case, I was fortunate to have run the test case on 2 different OS architecture releases (more or less 32-bit vs 64bit) and found repeated errors. I searched for the messages, and read the source code to the failed test. As published, it looked okay to me, but those more knowledgeable unraveled the old code and produced a revamped design that worked across multiple architectures (CPU and environmental instrumentation continues apace). Short explanation: replaced hard-coded values with dynamic ones.

Why was this bug not already reported, and why did it happen anyway? Unsure to me, except that writing good tests is even harder than sending in good bug reports. "Steps to reproduce" is often key, though not always possible to show.

Case 2 - unsolved

The second type of bug I ran into and reported was intended to check for vnode leakage, except I stumbled into a sometimes yes, sometimes no, it depends situation. Case still open and no clues.

Case 3 - random

My friend @Blag would have liked this case, where a regex ran a small instance out of swap memory, as did other tests. But not reliably ("limited to Raspberry Pi Zero 2W systems"). Test conditions are not supposed to harm a running system. Except, that's why we test (or one reason).

[ 316969.2920714] UVM: pid 24060 (h_libarchive), uid 0 killed: out of swap
[ 317517.6748259] UVM: pid 1319 (t_exhaust), uid 0 killed: out of swap

On the tiny Pi, no swap is configured, meaning the error message is skewed.

$ swapctl -l
no swap devices configured

The first "out of swap" message above is from an archive library test run, while the second is the regex. Across different CPU architectures the failures are noticeably different. Swap size matters although not just that. I noted my scale-up results in the regex ticket:

 My test summary (architect/memory):
 i386 3GB - 100% pass
 pi4:arm64 8GB - 100% pass
 pi3:arm64 1GB - 90% pass
 pi0:arm32 512MB - 10% pass (no swap)
       512MB - 90% pass (small swap file)

The archive library ticket was closed due to invalid test conditions (running from an SD card); however hearing about the xz back-door reminded me of the variety of archivers available on a modern OS. For NetBSD, a full set is bundled in the base distribution, and an entire fleet of extra packages (>100) is available under a dedicated directory [/usr/pkgsrc/archivers/].

HISTORY
     The libarchive library first appeared in FreeBSD 5.3.

AUTHORS
     The libarchive library was originally written by Tim Kientzle
     <kientzle acm.org>.

The tests of the archive go through an encyclopedia of flavors. One failure mode is "run out the clock":

/root/tests-am4-202404300842.txt:    
 libarchive: [6000.059110s] Failed: Test case timed out after 6000 seconds

Others are #174 and #459 of ~500 archiver test cases ("se" is standard error)

tc-se:Reference files will be read from: /usr/tests/lib/libarchive
tc-se:Exercising: libarchive 3.4.0 zlib/1.2.13 liblzma/5.2.4 bz2lib/1.0.8

tc-se:174: test_read_disk_directory_traversals FAIL
tc-se:459: test_write_disk_times FAIL

Case 4 - also random

I reported intermittent results from another set of tests related to code sanitization (threads to be specific) that turned out to be compiler version specific. The failures were only seen on one architecture, with passes and fails in varying percentages.

I later found problem report 5770 - "Some tsan tests fail randomly on real hardware" that I had duplicated.

"At this point I recommend to mark these tests as expected-failure and wait for GCC10 in base."

Two items to address here: (1) are the expected failures trapped, and (2) did GCC 10 fix the error?

Because I see errors reported, the expected failure logic seems faulty. I need to investigate this further since one symptom I noted was varying counts of the failure message compared to the end-report summary count. I have run certain sets of test cases in isolation to get a better profile from more data.

The next step will be to add a pointer to the earlier report and request closure of my case as a duplicate. Meanwhile, the first write-up has a handy pattern for documenting random results using a matrix of 40 tests in plain text for a quick summary.

Their notes:

The following have failed in the last 40 runs on my bare metal testbed:

X--XX-XXXX--X----X-XX--X-XX---X-----XX-X   usr.bin/c++/t_tsan_data_race:data_race
---X-X-XX-X---X--XX-X----XX---X-XX--X---   usr.bin/c++/t_tsan_data_race:data_race_pic
XXXXX-XX-XX-X-X-X---------------------X-   usr.bin/c++/t_tsan_data_race:data_race_pie
-----X----------------------------------   usr.bin/c++/t_tsan_vptr_race:vptr_race
-------------------------------------X-X   usr.bin/c++/t_tsan_vptr_race:vptr_race_pie

-----------------X----X------X------X---   usr.bin/cc/t_tsan_data_race:data_race
------------X---X-X---------------------   usr.bin/cc/t_tsan_data_race:data_race_pic
--X-------------------------------------   usr.bin/cc/t_tsan_data_race:data_race_pie

My notes:

6 core / 12 thread AMD CPU (40+ runs so far)

----X---------------------X----X--------   usr.bin/c++/t_tsan_data_race:data_race
----X-----------------------------------   usr.bin/c++/t_tsan_data_race:data_race_pic
--------------------------X----X--------   usr.bin/c++/t_tsan_data_race:data_race_pie
---X-----X---X-------X---------X--------   usr.bin/c++/t_tsan_vptr_race:vptr_race
-------------X--------------------------   usr.bin/c++/t_tsan_vptr_race:vptr_race_pie

X--X----X-----------XX-X-------XX-------   usr.bin/cc/t_tsan_data_race:data_race
---X----X-----------X--X----------------   usr.bin/cc/t_tsan_data_race:data_race_pic
--------------------XX------------------   usr.bin/cc/t_tsan_data_race:data_race_pie

2 core Intel Core Duo 2

X-X---XX-X-X--X-XX--XXXXXXXXXXXX-X--XXX-   usr.bin/c++/t_tsan_data_race:data_race
------X-------X-XX---XX-XXX-X--------X--   usr.bin/c++/t_tsan_data_race:data_race_pic
--X------X-------------XX-XXXXXX-----X--   usr.bin/c++/t_tsan_data_race:data_race_pie
-X--XXXXX--XXXXXXXXX--X-XXX--X-XX-XXXXXX   usr.bin/c++/t_tsan_vptr_race:vptr_race
----XX-X---XXXXX-XXX--X-X-X--X--X-XXXXXX   usr.bin/c++/t_tsan_vptr_race:vptr_race_pie

-X--XXX-X-XX-XXXXXXXX-XXX---XX--X-XXX-X-   usr.bin/cc/t_tsan_data_race:data_race
------X----X-----XX---XXX----X--X-----X-   usr.bin/cc/t_tsan_data_race:data_race_pic
-X------X-XX--XX---XX-------X-----XXX---   usr.bin/cc/t_tsan_data_race:data_race_pie

Looks like more cores/threads = fewer failures. I'll write that up.

Case 5 - wrong values

Math works!

On the tiny system end, I tested on a Raspberry Pi 0W, and a 02W. The former has 1 CPU core and the latter has 4. In a series of math tests, the 0W failed where other systems did not. Example:

tc-se:*** Check failed: subtest 23: exp2f(-127) is 0 (0x0.00000000000000p+0)
  not 5.87747e-39 (0x1.0000000000000p-127), error 1 (0x1.000000p+0) > 0

Case 6 - missing libraries

Profiling failed on one architecture. Still checking this out.

Automated test framework case hello_profile failed on a Raspberry Pi Zero 2W.
The other tests passed, and this case passes on other architectures I have tried.

The GNU C compiler allows for benchmarking ("profile") content to be added to an executable for later tracing. Only, for this architecture, it doesn't.

 tc-se:ld: /tmp//cckFfFOi.o: in function `__gthread_trigger()':
 tc-se:test.cpp:(.text+0xc): undefined reference to `__gnu_mcount_nc'

For a good reference to the ARM profiling, see: https://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/

Side note; I've been at this long enough to recognize committers (Theo, and Chris, respectively):

/* $OpenBSD: gmon.h,v 1.3 1996/04/21 22:31:46 deraadt Exp $ */
/* $NetBSD: gmon.h,v 1.5 1996/04/09 20:55:30 cgd Exp $ */

30 year-old code header? Almost!

Case 7 - missing test cases

"Test for long double omitted in t_fpclassify automated framework cases" is not a test failure per se, it is a coverage error that specific tests are skipped for reasons not obvious. Tests may be specific to an architecture so it make sense to skip some, however I don't see why "long double" might be.

Unlike tests where I could not fathom a fix, I suggested one here.

>Fix:
Remove the skipper:
define TEST_LONG_DOUBLE [...]

Mutiny? I don't think so. 😉

Side views

Beyond viewing logs, I found other places needed examining, such as /var/log/messages and in one instance, the system console. And as I had monitoring agents, I could look back at impact of the tests to see run times and specific stresses. Going back to the first case above (environmental status), it was instructive to view CPU temperature(s) across time as the tests ran.

ARM processor temperature over time.

Other related views showed the test impacts such as process count, and helped me decide if I could run more per day (3 on this system). Starting a full test run before the prior completed should generally be avoided, though I admit stacking up simultaneous runs just out of curiosity to see when is too much.

CPU interrupts

May 5, except that was 2023, not 2024.

Next Steps and Conclusions

While I ran many tests cycles and reported a few bugs, the software version has been released, so my next reports will be against that, and/or I can test against the latest code base ("current'). However, to be a better contributor I plan to review the open reports to see if I can add other results ("yup, still an error"), or like the random failures, locate duplicate reports and tie them together.

Database of results? I found a Google Summer of Code answer: https://summerofcode.withgoogle.com/archive/2018/organizations/5928349130031104

Use Kyua as a future driver instead of cron jobs and standard output text.

Refresh open tickets with better how-to-repeat or how-to-fix suggestions.

Dear reader, you can also contribute in small ways: pick an open source project (archivers, maybe?). Read up on testing, run tests, see results, compare to standard. Ask questions. Then ask more.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Quality Assurance at the Operating System Level - Things Learned

Beta Test

Case 1 - repaired

Case 2 - unsolved

Case 3 - random

Case 4 - also random

Case 5 - wrong values

Case 6 - missing libraries

Case 7 - missing test cases

Side views

Next Steps and Conclusions