Full multiprocess support in lldb-server

By Michał Górny

July 5, 2022 - 12 minutes read - 2384 words

BSD contract debugger FreeBSD GDB LLDB LLVM

Moritz Systems have been contracted by the FreeBSD Foundation to continue our work on modernizing the LLDB debugger’s support for FreeBSD.

The primary goal of our contract is to bring support for full multiprocess debugging into LLDB. The Project Schedule is divided into three milestones, each taking approximately one and a half months:

Support for the non-stop variant of GDB Remote Serial Protocol in lldb-server and gdb-remote plugin in LLDB client.
Full support for multiprocess GDB Remote Serial Protocol extension in lldb-server.
Support for multiprocess debugging in LLDB client through multiplexing multiple LLDB targets via a single GDB Remote Serial Protocol connection.

FreeBSD is a modern Unix-like operating system that supports debugging multiple multithreaded processes simultaneously. The goal of the second milestone of our project was to enable full multiprocess support in lldb-server. Prior to this, we have already enabled fork and vfork tracing in LLGS, and part of multiprocess extensions to the protocol. However, the server could only continue debugging one process at a time — whenever a fork occurred, the client was required to detach either the parent or the child process before continuing.

Implementing full multiprocess support means that the server can not only continue debugging both processes but handle an arbitrary number of future works, therefore debugging an arbitrary number of inferiors forked from the initial process. Combined with our non-stop protocol work from the previous milestone, this enables the server to run multiple processes simultaneously and respond to their stops independently. This makes it the first debugging server to implement non-stop multiprocess debugging on FreeBSD.

Multitarget and multiprocess extensions to the protocol

The GDB Remote Serial Protocol can be used to debug a wide range of target classes, from regular userspace applications, through kernels and virtual machines, to bare metal targets. These targets can be classified into three groups:

single-thread targets where no parallel execution is possible
multithreaded targets where multiple threads can run in parallel but only a single process can be debugged
multiprocess targets where multiple processes can be debugged simultaneously (every one of them possibly including multiple threads)

Thread, Process, Multiprocess

Appropriately, the protocol could be thought of being layered to cover the needs of each of these groups. The base layer would provide the minimal subset of packets necessary to run and inspect a single thread, with additional layers providing the support for debugging multiple threads and multiple processes, respectively.

Debugging targets with a single execution thread

The simplest use case for the remote protocol is debug a single-threaded process. This could be anything from a userspace program not using threading to a bare metal target. Depending on the exact target, the process and thread identifiers may be present or not — if they are not, LLDB uses (1, 1) internally.

A really trivial debugging session could be illustrated using the following pseudo-packets:

>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T1300:0000000000000000;01:2018000000000000;...;reason:signal#00
>> $c#00
<< $W00#00

In the snippet above, the client is sending two c (continue) packets to resume the program’s execution. The first packet receives two replies. The first one is an O (output) packet that is an LLDB extension used to carry output from the inferior to the LLDB client; in this instance, it is hex-encoded Hello world. The second one is a T packet indicating stopping due to a signal. The two first digits are signal number in hex (SIGSTOP here), and they are followed by additional information about the stopped target. The second packet receives a single W reply indicating that the target has exited with code 0.

Debugging targets with multiple threads

In order to debug multithreaded programs, a subset of the protocol packets need to be extended to be thread-aware. Examples of packets that need extending are execution-related packets and register operations. On the other hand, e.g. memory operations do not need thread awareness since all threads in a multithreaded program share the same memory space.

Let’s consider a little more complex snippet:

>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T13thread:24ac;00:0000000000000000;...;reason:signal#00
>> $qfThreadInfo#00
<< $m2491,24ac,24ad#00
>> $qsThreadInfo#00
<< $l#00
>> $Hc24ad#00
<< $OK#00
>> $c#00
<< T13thread:24ad;00:0000000000000000;...;reason:signal#00
>> $vCont;c:24ac;c:24ad;c#00
<< $W00#00

Here, we see a few differences:

the stop reason packet T now includes the identifier of the thread that has received the signal
there is a new packet pair qfThreadInfo and qsThreadInfo that are used to obtain the list of active threads
there are new Hc and Hg packets that can be used to select the thread to be resumed, and to be used for other operations, respectively
there is a new vCont command that provides greater control over resuming the process — in particular, specifying actions per thread

Of course, there are more extensions than just that, including LLDB extensions (such as the ability to pass thread identifier directly to some commands) and GDB packets that are not implemented in LLDB. However, these illustrated above are the absolute minimum to support multithreaded processes.

Debugging multiprocess targets

The support for debugging multiple processes is indicated explicitly via the multiprocess+ extension in qSupported packet. This extension extends packets that were already thread-aware with process identifier awareness. It also adds process awareness to some more packets.

The most interesting part is extending thread identifiers to include PID. This makes it possible to integrate multiprocess support with minimal changes to the actual protocol. In multiprocess mode, thread identifiers use the following syntax:

p<pid>.<tid>

where pid and tid are the respective identifiers in the hexadecimal notation.

Let’s consider the respective snippet for a process that forks:

>> $qSupported:multiprocess+;fork-events+;vfork-events+#00
<< $multiprocess+;fork-events+;vfork-events+#00
// ...
>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T05thread:p87c.87c;00:daffffffffffffff;...;fork:p896.896#00
>> $qfThreadInfo#00
<< $mp896.896,p87c.87c#00
>> $qsThreadInfo#00
<< $l#00
>> $Hcp896.896#00
<< $OK#00
>> $c#00
<< $T13thread:p896.896;00:0000000000000000;...;reason:signal#00
>> $vCont;c:p87c.87c#00
<< $T13thread:p87c.87c;00:0000000000000000;...;reason:signal#00
>> $c#00
<< $W00;process:896#00
>> $vCont;c#00
<< $T11thread:p87c.87c;00:0000000000000000;...;reason:signal#00
>> $vCont;c#00
<< $W00;process:87c#00

A few things worth noting here are that:

process identifiers are now being included as part of thread identifiers
the fork event indicates that a new process has been created
qfThreadInfo includes threads of all debugged processes
process exit events include the process identifier

Multiprocess debugging in non-stop mode

The main limitation of the regular GDB protocol is that it can report only one stop event for every resume packet. The non-stop mode lifts this limitation, enabling multiple processes to run simultaneously and report their stops independently.

Let’s consider running the part of the above debugging session in non-stop protocol mode:

<< T05thread:p2cfe8.2cfe8;...;fork:p2d009.2d009;#00
>> $QNonStop:1#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:T13thread:p2cfe8.2cfe8;...;reason:signal#00
>> $vStopped#00
<< $T13thread:p2d009.2d009;...;reason:signal#00
>> $vStopped#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:W00;process:2d009#00
>> $vStopped#00
<< $T11thread:p2cfe8.2cfe8;...;reason:signal#00
>> $vStopped#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:W00;process:2cfe8#00
>> $vStopped#00
<< $OK#00

The main difference is that we resume both processes simultaneously, and they both run until they stop. The server reports the first event asynchronously, and queues the remaining events to be obtained through vStopped packet.

Summary of changes in lldb-server

The primary focus of this milestone’s work was to identify and implement the remaining protocol extensions necessary for LLGS to effectively and conveniently support debugging multiple processes. Our earlier work ensured that the client was ready for the changes that would also be visible while debugging a single process.

The code changes also involved refactoring of the existing code. Methods that were assuming that the “current” (i.e. selected via Hg/Hc packets) process was the only process that could be running needed to be rewritten to account for the possibility of being called from other processes.

Process exit handling and stdio forwarding support needed a major revamp to account for the possibility of multiple processes starting, stopping and exiting. Previously, LLGS would assume (unless in non-stop mode) that the server can exit once the process terminates — which is not a valid assumption if multiple processes are being traced.

Furthermore, the support for the vKill packet was implemented. The purpose of this packet is to terminate a specific process. Unlike the k packet used by LLDB before, it supports specifying the process identifier and has well-defined semantics. The behavior of the k packet was left unchanged — which meant that LLGS needed to handle process exits differently based on whether the process exited on its own, as a result of a k packet or as a result of a vKill packet.

The stdio forwarding is an LLDB-specific extension to the use of O packet. GDB uses this only packet to send debugger’s messages to the user when using the qRcmd to run implementation-specific commands. LLDB also uses it to support forwarding debugger’s process from server to the client while the client is waiting for a stop response. Since interspersing the synchronous O packets with other command replies would be dangerous in non-stop mode, LLDB implements a separate %Stdio notification queue to forward the program’s output.

The qfThreadInfo packet was modified to report the threads of all debugged processes. The qC packet (reporting the current thread ID) was modified to include the current process identifier as well. The T packet was implemented that provides a convenient mechanism for verifying whether the specified thread ID (optionally including a process ID) is being traced.

The c and vCont packets were modified to allow running multiple processes in non-stop protocol mode. However, since the process backends in LLDB remain all-stop, it is only possible to resume a process that’s not already running, or to stop all threads of a process that’s running. Resuming or stopping a subset of threads requires stopping the whole process first.

Additionally, a more complete support for the t action was implemented. Previously, this action could be used only to stop the whole process in non-stop mode. Now, it can be combined with other actions. For example, vCont;t:1234;t:1235;c can be used to conveniently express “keep threads 0x1234 and 0x1235 stopped, resume the remaining threads”.

Since good test coverage is an important goal for all LLDB development, all the mentioned changes were assisted by new tests. These tests cover not only the new functionality but also the existing uses of some of the packets that have not been covered by tests sufficiently before.

Test suite update

Our patches introduced 53 new individual tests that covered both added functionality and existing packets that had not been fully tested. We have also been periodically checking the status of existing tests on FreeBSD. Whenever regressions were introduced, we would attempt to resolve them, or to report them and mark the corresponding test as expected failure. As such, it would not cause the test suite to fail and at the same time the test runner would explicitly remind us to reenable it when the underlying problem was fixed.

At the time of writing, these are the results of running the LLDB test suite on a FreeBSD 13.1 amd64 system:

  Unsupported      :  504
  Passed           : 2058
  Expectedly Failed:   13

Please note that a few tests can still be unstable under high system load and they could fail occasionally.

Patches merged

Patches waiting for review

Future plans

Now that the server part of LLDB features full multiprocess support, the remaining part of our work is to implement the client counterpart. This will actually enable the users of LLDB to conveniently debug multiple processes simultaneously.

We are planning to build the multiprocess support on top of the existing support for multiple targets. The client will automatically create a new target for every new process monitored by the server, and the user will be able to switch between debugged processes and control them independently using the separate targets.

Protocol-wise, all the debugged targets will use a single shared connection to LLGS. This will make it possible to debug an arbitrary number of processes over any link, including links that aren’t technically capable of establishing multiple parallel connections (e.g. the serial port), and without being limited e.g. by firewalls. The synchronous requests from multiple targets will be multiplexed using the asynchronous nonstop protocol that permits controlling other processes while some of them are running.