cancel
Showing results for 
Search instead for 
Did you mean: 

dbeng12 startup hang on osx

Former Member
5,073

Folks,

We have an intermittent hang starting dbeng12 via the capi on OSX. It is not 100% repeatable but affects our automated integration tests fairly frequently.

It does not happen with the same tests running on windows.

Is there any way to get any diagnostic information out of the dbeng12 process to try and narrow down why it is hanging?

Cheers, Dan

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

So, we finally seem to have reached the bottom of this issue and I thought I'd document it in case anyone else ran into it.

Basically, the issue is that the dynamic library load call we were using has some undesirable behaviour in the debugger when used with a particular flag. In the samples provided with the dbcapi the load library call is as follows:

handle = dlopen(name, RTLD_LAZY);

This is the recommended way of calling this function however there are some potential issues with debugging as documented here: http://tldp.org/HOWTO/Program-Library-HOWTO/dl-libraries.html

So we changed the code to: handle = dlopen(name, RTLD_NOW);

Which gets us past the problems documented above.

Thanks to all for help/suggestions.

Cheers, Dan

Answers (2)

Answers (2)

Former Member

Ok, finally got one to hang, and unfortunately there was no log file created.

The process is stopped on: (marked with **)

libsystem_kernel.dylib`__wait4:
0x7fff8e46214c:  movl   $33554439, %eax
0x7fff8e462151:  movq   %rcx, %r10
0x7fff8e462154:  syscall
**0x7fff8e462156:  jae    0x7fff8e46215d            ; __wait4 + 17**
0x7fff8e462158:  jmpq   3743
0x7fff8e46215d:  ret    
0x7fff8e46215e:  nop    
0x7fff8e46215f:  nop

The assembly code that is running at the point it hangs is:

0x1079b9322:  jne    0x1079b9338               ; sqlany_new_connection_ex + 1416
0x1079b9324:  movq   8(%rdi), %rdi
0x1079b9328:  callq  0x1079becfd               ; symbol stub for: db_init
0x1079b932d:  testl  %eax, %eax
0x1079b932f:  je     0x1079b93a0               ; sqlany_new_connection_ex + 1520
0x1079b9331:  movl   $1, 28(%rbx)
0x1079b9338:  movq   (%rbx), %rax
0x1079b933b:  movq   144(%rax), %rdx
0x1079b9342:  movq   8(%rbx), %rdi
0x1079b9346:  xorl   %esi, %esi
0x1079b9348:  callq  0x1079bed03               ; symbol stub for: db_set_property
0x1079b934d:  movq   8(%rbx), %rdi
0x1079b9351:  movq   %r12, %rsi
0x1079b9354:  callq  0x1079bed09               ; symbol stub for: db_string_connect
**0x1079b9359:  movq   8(%rbx), %rdi**
0x1079b935d:  movl   12(%rdi), %eax
0x1079b9360:  movl   %eax, 32(%rbx)
0x1079b9363:  leaq   36(%rbx), %rsi
0x1079b9367:  movl   $256, %edx
0x1079b936c:  callq  0x1079bede7               ; symbol stub for: sqlerror_message

Looking in "Activity Monitor" I can see a dbeng12 process.

Former Member
0 Kudos

I should add that we're using SQL Anywhere 12.01 for OSX

jeff_albion
Product and Topic Expert
Product and Topic Expert
0 Kudos

It seems like you were having some troubles passing in the "-o" parameter initially to the engine, but may have found a way around that. Is anything reported in the "-o" output from the database server when this happens?

Former Member
0 Kudos

No, the log file is never created.

Former Member
0 Kudos

We managed to get this stack trace out of gdb:

0 0x00007fff8ba3f154 in __wait4 ()

1 0x000000010a531fdb in Java_com_sybase_asa_logon_ASAConnect_findServers ()

2 0x000000010a52cbb3 in Java_com_sybase_asa_logon_ASAConnect_findServers ()

3 0x000000010a51585e in Java_com_sybase_asa_logon_ASAConnect_findServers ()

4 0x000000010a516162 in Java_com_sybase_asa_logon_ASAConnect_findServers ()

5 0x000000010a5165fa in Java_com_sybase_asa_logon_ASAConnect_findServers ()

6 0x000000010a504532 in sqlerror_message ()

7 0x000000010a50487b in sqlerror_message ()

8 0x000000010a33b359 in sqlany_new_connection_ex ()

jeff_albion
Product and Topic Expert
Product and Topic Expert
0 Kudos

Can you show us the connection code you're using and how you're attempting to start the database server? Does it start okay outside of the capi interface?

And I assume this is from a client stack trace in gdb? This stack trace is suggesting the client is trying to find a server, but can't.

Former Member
0 Kudos

I'm replying here to get some better text formatting... I can't figure out how to do the formatting in the comments

I should clarify a few things here. Firstly, this hang only manifests itself in integration testing, either in a debugger on a local machine, or on the build server. It is particularly frequent in the debugger using xcode 4.4. Outside of those environments it all works fine.

The code we use to start and connect to the server is as follows:

{
    _Logger->LogDebug( "Opening database with connection string: " + GetConnectionString() );

    // create a new sqlany connection object, and connect to our configured database.
    _Connection = _Api.sqlany_new_connection();

    if ( !_Api.sqlany_connect(_Connection, GetConnectionString().c_str() ) )
    {
        char err_msg [512];
        _Api.sqlany_error(_Connection, err_msg, sizeof(err_msg) );
        std::stringstream err;
        err << "sqlanywhere err: " << err_msg << std::endl;
        err << "when attempting to open: " << GetConnectionString();

        _Logger->LogError( err.str() );

        /* failed to connect */
        CatchError("Failed to connect");

        /* SQL Anywhere's API requires us to go through the full
         * disconnection process even if a connection attempt failed.
         */
        Close();
        return false;
    }
    return true;
}

An example of a connection string that we're using to start the server and connect to a database is as follows:

ENG=vartmptmp2792eUqhK;uid=DBA;pwd=sql;dbf=/var/tmp/tmp.279.2eUqhK;START=dbeng12 -ga -qi -n vartmptmp2792eUqhK -o /var/tmp/tmp.279.2eUqhK.log.txt

That connection string is for an integration test, which is why the database and server have a semi-random name to stop file and db server conflicts on the build server.

VolkerBarth
Contributor
0 Kudos

I'd suggest to add the LOG=filename connection parameter to get some diagnostic output on the apparantly failing attempts to connect to the engine. You could also add -z to the engine command line.

As Jeff has asked: Does the engine named vartmptmp2792eUqhK really start here? (Well, if not, -z won't help any further...)


FWIW, specifying both ENG and the -n in the START connection parameter is somewhat error-prone: If both are not identical, you may start a different engine than you're trying to connect to... cf. Graeme's explanation here. - Yes, I'm aware that you have set both to the same value - I'm just trying to give hint.

Former Member
0 Kudos

We've tried the LOG= connection parameter and it doesn't appear to produce a log file either. The way the code is written guarantees that the ENG and -n parameter are the same. However, I wonder if this could be some of the problem. If we have a race condition somewhere in thread startup we might indeed end up with a server that we can't connect to despite the names being the same.

From memory we tried the -n without the ENG= parameter and it wouldn't connect. According to that link you posted, it seems we tried it the wrong way round. if we specified ENG= and left the -n off would we connect to the named server?

I also read somewhere that ENG= was deprecated. Is SERVER= the correct replacement?

Also of note is that if the application is compiled with gcc we don't seem to get this issue (well, it hasn't manifested yet). It seems to be under clang that we get it consistently. Sadly we can't switch to gcc because then our Objective-c code doesn't compile.

I'll give the LOG= and the -z another whirl.

Cheers, Dan

Former Member
0 Kudos

...and with this connection string:

Server=vartmptmp2rYWlPv;uid=DBA;pwd=sql;dbf=/var/tmp/tmp.2.rYWlPv;LOG=/var/tmp/tmp.2.rYWlPv.log.txt;START=dbeng12 -ga -qi -z

I get this log file:

Tue Oct 02 2012 18:29:55 18:29:55 Attempting to connect using: UID=DBA;PWD=**;DBF=/var/tmp/tmp.2.rYWlPv;ServerName=vartmptmp2rYWlPv;START='dbeng12 -ga -qi -z';LOG=/var/tmp/tmp.2.rYWlPv.log.txt 18:29:55 Attempting to connect to a running server... 18:29:55 Attempting SharedMemory connection (no sasrv.ini cached address) 18:29:55 Failed to connect over SharedMemory 18:29:55 No server found, attempting to run START line...

...and the process has hung. There is a dbeng12 in the task list with the correct server name.

VolkerBarth
Contributor
0 Kudos

Is this expected to work with shared memory?

Does it work if you connect via TCP/IP? (Add "LINKS=TCPIP" to the connection string and the "-x TCPIP" to the START command)?

Former Member
0 Kudos

We want it to work with shared memory. Most of the time, this does work exactly as we expect it to. This 'hang' only happens during integration testing and in the xcode debugger (quite frequently in 4.4, not so much in 4.3). We really don't want to use TCP/IP because we're deploying SA12 as an embedded DB.

In the case of the log line above, I would expect it NOT to find the server because we've deliberately given it a unique name for this session only. So, that part I'm not worried about. The worry is that when it gets to the "attempting to run START line"... it never comes back...

jeff_albion
Product and Topic Expert
Product and Topic Expert
0 Kudos

My read of the above information suggests that it seems that the database server process is starting up ("dbeng12 in the task list"), but the "-o" log is never created, meaning there is something happening on server start-up that we haven't been able to capture (particularly if the server normally starts okay).

Running a "dtruss -f" of the process using the SQL Anywhere C API, or capturing a core file of the engine process that started would be the next step. If you haven't already, I'd highly recommend opening a technical support case so that we can help you go over this information directly.