Implementing non-stop protocol compatibility in LLDB

By Michał Górny

May 31, 2022 - 10 minutes read - 2034 words

BSD contract debugger FreeBSD GDB LLDB LLVM

Moritz Systems have been contracted by the FreeBSD Foundation to continue our work on modernizing the LLDB debugger’s support for FreeBSD.

The primary goal of our contract is to bring support for full multiprocess debugging into LLDB. The Project Schedule is divided into three milestones, each taking approximately one and a half months:

Support for the non-stop variant of GDB Remote Serial Protocol in lldb-server and gdb-remote plugin in LLDB client.
Full support for multiprocess GDB Remote Serial Protocol extension in lldb-server.
Support for multiprocess debugging in LLDB client through multiplexing multiple LLDB targets via a single GDB Remote Serial Protocol connection.

We have completed the first milestone. Its goal was to enable LLDB client and server components to utilize the non-stop variant of the GDB Remote Serial Protocol. This is an important step towards full multiprocess in LLDB, as it removes the synchronous communication limitation that would force the debugger to stop all processes whenever one of them needed to stop. Effectively, it brings LLDB one step closer to feature parity with GDB and FreeBSD to a fully-featured, permissively licensed stack.

All-stop and non-stop modes of the debugger

There are two modes of debugging multithreaded processes, conversely called all-stop mode and non-stop mode.

non-stop GDB/LLDB mode

In all-stop mode, whenever an event of interest (such as a signal or a breakpoint hit) occurs in one of the debugged program’s threads, the whole process is stopped pending debugger’s response. While generally it is possible to resume the program with a subset of its threads running, and therefore permit the remaining threads to continue working while the debugger’s waiting for user’s action, the debugger will usually have to stop it again before making any other request. This mode is currently used by the FreeBSD kernel.

all-stop GDB/LLDB mode

In non-stop mode, the individual threads are running independently. If an event affects one of them, only that thread is stopped and the remaining threads continue running. Conversely, if the debugger wishes to prevent any more events from being reported while waiting for user’s decision, it needs to stop all the threads manually. This mode is used by the Linux kernel.

On top of the mode implemented by the kernel’s debugging API, there is the mode used by the debugger itself. Both GDB and LLDB use all-stop mode by default, emulating it on Linux as necessary. This is because this mode is more convenient in simpler debugging scenarios where the user does not need to handle events from multiple threads concurrently. In addition to that, GDB currently implements native support for non-stop mode that is sometimes a better choice for more complex multithreaded debugging scenarios.

The use of all-stop or non-stop mode also affects the protocol used between the debugging server (lldb-server, gdbserver) and the client. The traditional form of GDB Remote Serial Protocol is not suitable for non-stop mode, and therefore non-stop extensions for the protocol were designed. However, these extensions can prove useful even without full debugger or kernel support for non-stop mode, as they also make it possible to handle events from multiple debugged processes concurrently in multiprocess mode.

Non-stop mode in GDB Remote Serial Protocol

Essentially, the GDB Remote Serial Protocol is a synchronous request-response protocol. Every packet exchange is initiated by client sending a command to the server, and followed by the server issuing a single response. The client cannot rely on being able to issue another command before the previous one is completed, and the server cannot report any events, except in response to an explicit command.

This layout has three important implications for debugging. Firstly, the debugged process is stopped while the server is waiting for the next command. Secondly, once the debugged process is resumed, the communication is blocked until it stops or terminates (one exception to that is sending a break sequence to force stopping the process immediately). Thirdly, the server can report only one event after the process is resumed, and therefore it has to stop it again. Therefore, the protocol implies running in all-stop mode.

Let’s consider a simple program that starts two threads (in addition to the main thread), and then receives a per-thread SIGUSR signals on both of them. In all-stop mode, the debug session would include the following packets:

>> $vCont;C1e:p5706.5706;c:p5706.-1#..
// the process resumes, communication is blocked until it stops
<< $T1e06:502e58f7ff7f0* ;07:802d58f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5706.57fc;core:3;#76
>> $vCont;C1e:p5706.57fc;c:p5706.-1#..
// the process resumes, communication is blocked until it stops
<< $T1e06:503ed8f7ff7f0* ;07:803dd8f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5706.57fb;core:4;#d6
>> $vCont;C1e:p5ba9.5c43;c:p5ba9.-1#..
// the process resumes, communication is blocked until it exits
<< $W0;process:5ba9#2c

The client issues a vCont packet to resume the debugged process. The process is resumed and the communication is blocked while it is running. When one of the threads receives a signal, the whole process is stopped and the server replies with a T packet indicating stop due to a signal. Even though both threads are signaled almost simultaneously, each of them causes a separate stop. Finally, after the process is resumed one last time, it exits and the server returns W packet with the exit status.

The non-stop mode features a few important changes to the protocol. Most importantly, the server is now permitted to send asynchronous notifications while waiting for the next command or between replies. The packets resuming the process return immediately, and the process remains running while the server is waiting for further commands.

The same process would now generate the following log:

>> $QNonStop:1#..
<< $OK#9a
// ...
>> $vCont;C1e:p5e72.5e72;c#..
<< $OK#9a
// the process resumes but client can issue further requests
// first thread stops
<< %Stop:T1e06:502e58f7ff7f0* ;07:802d58f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5e72.5fa4;core:4;#83
// second threads stops (asynchronously)
>> $vStopped#..
<< $T1e06:503ed8f7ff7f0* ;07:803dd8f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5e72.5fa3;core:5;#03
>> $vStopped#..
<< $OK#9a
>> $vCont;C1e:p5e72.5fa4;C1e:p5e72.5fa3;c#..
<< $OK#9a
// all threads resume
<< %Stop:W0;process:5e72#de
>> $vStopped#..
<< $OK#9a

The client enables non-stop mode using the QNonStop packet. When the process is resumed, the server returns an OK response immediately and unblocks communication. At this point, the client could send further requests and the server would reply to them while the process is running.

Once one of the threads receives the signal, only that thread is stopped and the server sends an asynchronous %Stop notification. The other thread remains running until it receives the other signal, and then it is stopped as well. Since the client does not acknowledge the notification before that happens, its stop is not reported asynchronously but queued for being reported later instead. The main thread is not stopped at all.

At this point, the client sends a vStopped packet to acknowledge the stop notification for the first stopped thread. In response, the server sends the next queued stop reason. The client acknowledges it, and the server sends an OK response indicating that all notifications were delivered. Since the notification queue is empty now, the next event would be delivered asynchronously using the %Stop packet.

The client resumes all stopped threads using the vCont packet. The process exits, and this event is reported via another %Stop notification. The client acknowledges the notification before the connection is terminated.

Implementing non-stop protocol in all-stop debugger

The architectures of both LLDB client and LLDB server are not currently suitable for debugging in non-stop mode. Furthermore, the FreeBSD kernel features an inherently all-stop architecture and e.g. GDB does not feature non-stop mode support on FreeBSD at all. However, this does not mean that the non-stop variation of GDB protocol cannot be used on this system.

On the server side, this is achieved via generating appropriate stop notifications for all threads. Since the process plugins are essentially all-stop, the whole process stops whenever any event occurs. The appropriate non-stop notification for that would be to indicate that all threads have stopped — with the non-affected threads having stopped for “no reason”.

In all-stop mode, lldb-server indicates the process stopping by sending a single stop reason as a response to the continue packet. In non-stop mode, it queues a %Stop notification for every thread, and sends the first one asynchronously if no other notifications are pending. The same mechanism is used to report the stopped threads of a newly attached process.

The ? packet used to query the current stop reason is also changed. In all-stop mode, it reports a single stop reason synchronously. In non-stop mode, it reports the stop reason for the first thread synchronously, and queues the stop reasons for remaining threads so that they can be read using vStopped packets.

On the client side, the primary issue is fitting the asynchronous stop notifications into the synchronous client architecture. This meant modifying the function responsible for receiving synchronous stop responses to follow receiving the OK response by a blocking wait for the asynchronous stop notification, followed by draining the notification queue.

This logic would be sufficient for the current versions of lldb-server that run all-stop mode. However, to achieve greater compatibility with servers that actually implement non-stop mode, the client follows this with an explicit vCtrlC request to stop the remaining threads. This request may or may not result in more stop notifications.

Other client changes include adding a client-side notification packet queue to store asynchronous notifications that are received while waiting for responses to other commands, and updating the logic handling the ? packet to drain the server notification queue.

These changed combined make it possible to use the non-stop protocol in order to debug programs in all-stop mode.

The following snippet illustrates an example LLDB session using non-stop protocol in all-stop mode:

>> $QNonStop:1#8d
<< $OK#9a
// ...
>> $c#63
<< $OK#9a
// process resumes, then stops
<< %Stop:T0athread:cbaf6;name:a.out;threads:cbaf6;thread-pcs:00007ffff7e164ec;00:0000000000000000;01:f6ba0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:f6ba0c0000000000;06:d0d1ffffff7f0000;07:d0d0ffffff7f0000;08:f0d0ffffff7f0000;09:30bbfcf7ff7f0000;0a:e070d9f7ff7f0000;0b:4602000000000000;0c:0a00000000000000;0d:f8d2ffffff7f0000;0e:f07d555555550000;0f:00d0fff7ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#c7
>> $vStopped#55
<< $OK#9a
// the client ensures that all threads stop
>> $vCtrlC#4e
<< $OK#9a
>> $vCont;C0a:cbaf6#15
<< $OK#9a
// process resumes, then stops
<< %Stop:T0athread:cbb62;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:0000000000000000;01:62bb0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:62bb0c0000000000;06:503ed8f7ff7f0000;07:803dd8f7ff7f0000;08:a080555555550000;09:0100000000000000;0a:0000000000000000;0b:4602000000000000;0c:0a00000000000000;0d:0000000000000000;0e:80cfffffff7f0000;0f:004058f7ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#ec
>> $vStopped#55
<< $T00thread:cbaf6;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:00feffffffffffff;01:0000000000000000;02:2512e1f7ff7f0000;03:62bb0c0000000000;04:1049d8f7ff7f0000;05:0901000000000000;06:62bb0c0000000000;07:e0d0ffffff7f0000;08:0000000000000000;09:ffffffff00000000;0a:0000000000000000;0b:4602000000000000;0c:0000000000000000;0d:0000000000000000;0e:1049d8f7ff7f0000;0f:4046d8f7ff7f0000;10:2512e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;#8b
>> $vStopped#55
<< $T0athread:cbb63;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:0000000000000000;01:63bb0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:63bb0c0000000000;06:502e58f7ff7f0000;07:802d58f7ff7f0000;08:a080555555550000;09:0200000000000000;0a:18e4d9f7ff7f0000;0b:4602000000000000;0c:0a00000000000000;0d:0000000000000000;0e:80cfffffff7f0000;0f:0030d8f6ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#22
>> $vStopped#55
<< $OK#9a
// the client ensures that all threads stop
>> $vCtrlC#4e
<< $OK#9a
>> $vCont;c:cbaf6;C0a:cbb62;C0a:cbb63#55
<< $OK#9a
// process resumes, then exits
<< %Stop:W00#b7
>> $vStopped#55
<< $OK#9a

Fixing duplicate stops on FreeBSD

While working on these changes, we have found an interesting bug. After being stopped due to a signal, and then resumed, the process kept stopping again due to SIGSTOP. After debugging the issue, we have discovered that this was caused by the vCtrlC request sent by the client to guarantee that all threads stop.

The FreeBSD process plugin implements stopping the process through sending a SIGSTOP to it. However, it turns out that if the process is stopped already, the SIGSTOP is queued for delivery after it resumes. This causes the process to stop again, and another SIGSTOP is sent to it, effectively causing it to stop forever.

Fortunately, the issue was easy to resolve. Since FreeBSD is an all-stop platform, we could simply omit sending SIGSTOP if we know that the process is already stopped.

Patches created

Future plans

Now that the base for non-stop protocol usage is ready, we are starting to work on server-side support for full multiprocess debugging. In addition to the non-stop protocol, it relies on two other features we have implemented previously: the multiprocess protocol extensions and fork/vfork support.

The existing multiprocess debugging support is limited to being able to detect fork events and respond to them by either following the parent process and detaching the newly forked child, or by detaching the parent and following the child. The server’s support for handling multiple processes is therefore limited to this short period of time when both processes are traced, and only a subset of debugger’s commands is available while two processes are attached.

The goal of our work is to extend this support to enable tracing multiple processes permanently. The debugger will now be able to follow both the parent process and the child, and both of them will be permitted to fork and further extend the traced process list. Effectively, this will enable LLDB to better handle more complex multiprocess scenarios, such as processes forking to perform work in parallel or using pipelines.