Implementing non-stop protocol compatibility in LLDB
By Michał Górny
- 10 minutes read - 2034 wordsMoritz Systems have been contracted by the FreeBSD Foundation to continue our work on modernizing the LLDB debugger’s support for FreeBSD.
The primary goal of our contract is to bring support for full multiprocess debugging into LLDB. The Project Schedule is divided into three milestones, each taking approximately one and a half months:
-
Support for the non-stop variant of GDB Remote Serial Protocol in lldb-server and gdb-remote plugin in LLDB client.
-
Full support for multiprocess GDB Remote Serial Protocol extension in lldb-server.
-
Support for multiprocess debugging in LLDB client through multiplexing multiple LLDB targets via a single GDB Remote Serial Protocol connection.
We have completed the first milestone. Its goal was to enable LLDB client and server components to utilize the non-stop variant of the GDB Remote Serial Protocol. This is an important step towards full multiprocess in LLDB, as it removes the synchronous communication limitation that would force the debugger to stop all processes whenever one of them needed to stop. Effectively, it brings LLDB one step closer to feature parity with GDB and FreeBSD to a fully-featured, permissively licensed stack.
All-stop and non-stop modes of the debugger
There are two modes of debugging multithreaded processes, conversely called all-stop mode and non-stop mode.
In all-stop mode, whenever an event of interest (such as a signal or a breakpoint hit) occurs in one of the debugged program’s threads, the whole process is stopped pending debugger’s response. While generally it is possible to resume the program with a subset of its threads running, and therefore permit the remaining threads to continue working while the debugger’s waiting for user’s action, the debugger will usually have to stop it again before making any other request. This mode is currently used by the FreeBSD kernel.
In non-stop mode, the individual threads are running independently. If an event affects one of them, only that thread is stopped and the remaining threads continue running. Conversely, if the debugger wishes to prevent any more events from being reported while waiting for user’s decision, it needs to stop all the threads manually. This mode is used by the Linux kernel.
On top of the mode implemented by the kernel’s debugging API, there is the mode used by the debugger itself. Both GDB and LLDB use all-stop mode by default, emulating it on Linux as necessary. This is because this mode is more convenient in simpler debugging scenarios where the user does not need to handle events from multiple threads concurrently. In addition to that, GDB currently implements native support for non-stop mode that is sometimes a better choice for more complex multithreaded debugging scenarios.
The use of all-stop or non-stop mode also affects the protocol used between the debugging server (lldb-server, gdbserver) and the client. The traditional form of GDB Remote Serial Protocol is not suitable for non-stop mode, and therefore non-stop extensions for the protocol were designed. However, these extensions can prove useful even without full debugger or kernel support for non-stop mode, as they also make it possible to handle events from multiple debugged processes concurrently in multiprocess mode.
Non-stop mode in GDB Remote Serial Protocol
Essentially, the GDB Remote Serial Protocol is a synchronous request-response protocol. Every packet exchange is initiated by client sending a command to the server, and followed by the server issuing a single response. The client cannot rely on being able to issue another command before the previous one is completed, and the server cannot report any events, except in response to an explicit command.
This layout has three important implications for debugging. Firstly, the debugged process is stopped while the server is waiting for the next command. Secondly, once the debugged process is resumed, the communication is blocked until it stops or terminates (one exception to that is sending a break sequence to force stopping the process immediately). Thirdly, the server can report only one event after the process is resumed, and therefore it has to stop it again. Therefore, the protocol implies running in all-stop mode.
Let’s consider a simple program that starts two threads (in addition
to the main thread), and then receives a per-thread SIGUSR
signals
on both of them. In all-stop mode, the debug session would include
the following packets:
>> $vCont;C1e:p5706.5706;c:p5706.-1#..
// the process resumes, communication is blocked until it stops
<< $T1e06:502e58f7ff7f0* ;07:802d58f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5706.57fc;core:3;#76
>> $vCont;C1e:p5706.57fc;c:p5706.-1#..
// the process resumes, communication is blocked until it stops
<< $T1e06:503ed8f7ff7f0* ;07:803dd8f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5706.57fb;core:4;#d6
>> $vCont;C1e:p5ba9.5c43;c:p5ba9.-1#..
// the process resumes, communication is blocked until it exits
<< $W0;process:5ba9#2c
The client issues a vCont
packet to resume the debugged process.
The process is resumed and the communication is blocked while it is
running. When one of the threads receives a signal, the whole process
is stopped and the server replies with a T
packet indicating stop
due to a signal. Even though both threads are signaled almost
simultaneously, each of them causes a separate stop. Finally, after
the process is resumed one last time, it exits and the server returns
W
packet with the exit status.
The non-stop mode features a few important changes to the protocol. Most importantly, the server is now permitted to send asynchronous notifications while waiting for the next command or between replies. The packets resuming the process return immediately, and the process remains running while the server is waiting for further commands.
The same process would now generate the following log:
>> $QNonStop:1#..
<< $OK#9a
// ...
>> $vCont;C1e:p5e72.5e72;c#..
<< $OK#9a
// the process resumes but client can issue further requests
// first thread stops
<< %Stop:T1e06:502e58f7ff7f0* ;07:802d58f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5e72.5fa4;core:4;#83
// second threads stops (asynchronously)
>> $vStopped#..
<< $T1e06:503ed8f7ff7f0* ;07:803dd8f7ff7f0* ;10:ec64e1f7ff7f0* ;thread:p5e72.5fa3;core:5;#03
>> $vStopped#..
<< $OK#9a
>> $vCont;C1e:p5e72.5fa4;C1e:p5e72.5fa3;c#..
<< $OK#9a
// all threads resume
<< %Stop:W0;process:5e72#de
>> $vStopped#..
<< $OK#9a
The client enables non-stop mode using the QNonStop
packet. When
the process is resumed, the server returns an OK
response
immediately and unblocks communication. At this point, the client could
send further requests and the server would reply to them while
the process is running.
Once one of the threads receives the signal, only that thread is stopped
and the server sends an asynchronous %Stop
notification. The other
thread remains running until it receives the other signal, and then it
is stopped as well. Since the client does not acknowledge
the notification before that happens, its stop is not reported
asynchronously but queued for being reported later instead. The main
thread is not stopped at all.
At this point, the client sends a vStopped
packet to acknowledge
the stop notification for the first stopped thread. In response,
the server sends the next queued stop reason. The client acknowledges
it, and the server sends an OK
response indicating that all
notifications were delivered. Since the notification queue is empty
now, the next event would be delivered asynchronously using
the %Stop
packet.
The client resumes all stopped threads using the vCont
packet.
The process exits, and this event is reported via another %Stop
notification. The client acknowledges the notification before
the connection is terminated.
Implementing non-stop protocol in all-stop debugger
The architectures of both LLDB client and LLDB server are not currently suitable for debugging in non-stop mode. Furthermore, the FreeBSD kernel features an inherently all-stop architecture and e.g. GDB does not feature non-stop mode support on FreeBSD at all. However, this does not mean that the non-stop variation of GDB protocol cannot be used on this system.
On the server side, this is achieved via generating appropriate stop notifications for all threads. Since the process plugins are essentially all-stop, the whole process stops whenever any event occurs. The appropriate non-stop notification for that would be to indicate that all threads have stopped — with the non-affected threads having stopped for “no reason”.
In all-stop mode, lldb-server indicates the process stopping by sending
a single stop reason as a response to the continue packet. In non-stop
mode, it queues a %Stop
notification for every thread, and sends
the first one asynchronously if no other notifications are pending.
The same mechanism is used to report the stopped threads of a newly
attached process.
The ?
packet used to query the current stop reason is also changed.
In all-stop mode, it reports a single stop reason synchronously.
In non-stop mode, it reports the stop reason for the first thread
synchronously, and queues the stop reasons for remaining threads so that
they can be read using vStopped
packets.
On the client side, the primary issue is fitting the asynchronous stop
notifications into the synchronous client architecture. This meant
modifying the function responsible for receiving synchronous stop
responses to follow receiving the OK
response by a blocking wait for
the asynchronous stop notification, followed by draining
the notification queue.
This logic would be sufficient for the current versions of lldb-server
that run all-stop mode. However, to achieve greater compatibility with
servers that actually implement non-stop mode, the client follows this
with an explicit vCtrlC
request to stop the remaining threads. This
request may or may not result in more stop notifications.
Other client changes include adding a client-side notification packet
queue to store asynchronous notifications that are received while
waiting for responses to other commands, and updating the logic handling
the ?
packet to drain the server notification queue.
These changed combined make it possible to use the non-stop protocol in order to debug programs in all-stop mode.
The following snippet illustrates an example LLDB session using non-stop protocol in all-stop mode:
>> $QNonStop:1#8d
<< $OK#9a
// ...
>> $c#63
<< $OK#9a
// process resumes, then stops
<< %Stop:T0athread:cbaf6;name:a.out;threads:cbaf6;thread-pcs:00007ffff7e164ec;00:0000000000000000;01:f6ba0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:f6ba0c0000000000;06:d0d1ffffff7f0000;07:d0d0ffffff7f0000;08:f0d0ffffff7f0000;09:30bbfcf7ff7f0000;0a:e070d9f7ff7f0000;0b:4602000000000000;0c:0a00000000000000;0d:f8d2ffffff7f0000;0e:f07d555555550000;0f:00d0fff7ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#c7
>> $vStopped#55
<< $OK#9a
// the client ensures that all threads stop
>> $vCtrlC#4e
<< $OK#9a
>> $vCont;C0a:cbaf6#15
<< $OK#9a
// process resumes, then stops
<< %Stop:T0athread:cbb62;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:0000000000000000;01:62bb0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:62bb0c0000000000;06:503ed8f7ff7f0000;07:803dd8f7ff7f0000;08:a080555555550000;09:0100000000000000;0a:0000000000000000;0b:4602000000000000;0c:0a00000000000000;0d:0000000000000000;0e:80cfffffff7f0000;0f:004058f7ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#ec
>> $vStopped#55
<< $T00thread:cbaf6;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:00feffffffffffff;01:0000000000000000;02:2512e1f7ff7f0000;03:62bb0c0000000000;04:1049d8f7ff7f0000;05:0901000000000000;06:62bb0c0000000000;07:e0d0ffffff7f0000;08:0000000000000000;09:ffffffff00000000;0a:0000000000000000;0b:4602000000000000;0c:0000000000000000;0d:0000000000000000;0e:1049d8f7ff7f0000;0f:4046d8f7ff7f0000;10:2512e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;#8b
>> $vStopped#55
<< $T0athread:cbb63;name:a.out;threads:cbaf6,cbb62,cbb63;jstopinfo:5b7b226e616d65223a22612e6f7574222c22746964223a3833343239347d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430327d2c7b226e616d65223a22612e6f7574222c22726561736f6e223a227369676e616c222c227369676e616c223a31302c22746964223a3833343430337d5d;thread-pcs:00007ffff7e11225,00007ffff7e164ec,00007ffff7e164ec;00:0000000000000000;01:63bb0c0000000000;02:ec64e1f7ff7f0000;03:0a00000000000000;04:f6ba0c0000000000;05:63bb0c0000000000;06:502e58f7ff7f0000;07:802d58f7ff7f0000;08:a080555555550000;09:0200000000000000;0a:18e4d9f7ff7f0000;0b:4602000000000000;0c:0a00000000000000;0d:0000000000000000;0e:80cfffffff7f0000;0f:0030d8f6ff7f0000;10:ec64e1f7ff7f0000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:signal;#22
>> $vStopped#55
<< $OK#9a
// the client ensures that all threads stop
>> $vCtrlC#4e
<< $OK#9a
>> $vCont;c:cbaf6;C0a:cbb62;C0a:cbb63#55
<< $OK#9a
// process resumes, then exits
<< %Stop:W00#b7
>> $vStopped#55
<< $OK#9a
Fixing duplicate stops on FreeBSD
While working on these changes, we have found an interesting bug. After
being stopped due to a signal, and then resumed, the process kept
stopping again due to SIGSTOP
. After debugging the issue, we have
discovered that this was caused by the vCtrlC
request sent
by the client to guarantee that all threads stop.
The FreeBSD process plugin implements stopping the process through
sending a SIGSTOP
to it. However, it turns out that if the process
is stopped already, the SIGSTOP
is queued for delivery after it
resumes. This causes the process to stop again, and another SIGSTOP
is sent to it, effectively causing it to stop forever.
Fortunately, the issue was easy to resolve. Since FreeBSD
is an all-stop platform, we could simply omit sending SIGSTOP
if we
know that the process is already stopped.
Patches created
- [lldb] [llgs] Implement non-stop style stop notification packets
- [lldb] [gdb-remote] Client support for using the non-stop protocol
- [lldb] Add an integration test for non-stop protocol
- [lldb] [gdb-remote] Be more explicit about notification reading
- [lldb] [Process/FreeBSD] Do not send SIGSTOP to stopped process
Future plans
Now that the base for non-stop protocol usage is ready, we are starting to work on server-side support for full multiprocess debugging. In addition to the non-stop protocol, it relies on two other features we have implemented previously: the multiprocess protocol extensions and fork/vfork support.
The existing multiprocess debugging support is limited to being able to detect fork events and respond to them by either following the parent process and detaching the newly forked child, or by detaching the parent and following the child. The server’s support for handling multiple processes is therefore limited to this short period of time when both processes are traced, and only a subset of debugger’s commands is available while two processes are attached.
The goal of our work is to extend this support to enable tracing multiple processes permanently. The debugger will now be able to follow both the parent process and the child, and both of them will be permitted to fork and further extend the traced process list. Effectively, this will enable LLDB to better handle more complex multiprocess scenarios, such as processes forking to perform work in parallel or using pipelines.