Full multiprocess support in lldb-server
By Michał Górny
- 12 minutes read - 2384 wordsMoritz Systems have been contracted by the FreeBSD Foundation to continue our work on modernizing the LLDB debugger’s support for FreeBSD.
The primary goal of our contract is to bring support for full multiprocess debugging into LLDB. The Project Schedule is divided into three milestones, each taking approximately one and a half months:
-
Support for the non-stop variant of GDB Remote Serial Protocol in lldb-server and gdb-remote plugin in LLDB client.
-
Full support for multiprocess GDB Remote Serial Protocol extension in lldb-server.
-
Support for multiprocess debugging in LLDB client through multiplexing multiple LLDB targets via a single GDB Remote Serial Protocol connection.
FreeBSD is a modern Unix-like operating system that supports debugging multiple multithreaded processes simultaneously. The goal of the second milestone of our project was to enable full multiprocess support in lldb-server. Prior to this, we have already enabled fork and vfork tracing in LLGS, and part of multiprocess extensions to the protocol. However, the server could only continue debugging one process at a time — whenever a fork occurred, the client was required to detach either the parent or the child process before continuing.
Implementing full multiprocess support means that the server can not only continue debugging both processes but handle an arbitrary number of future works, therefore debugging an arbitrary number of inferiors forked from the initial process. Combined with our non-stop protocol work from the previous milestone, this enables the server to run multiple processes simultaneously and respond to their stops independently. This makes it the first debugging server to implement non-stop multiprocess debugging on FreeBSD.
Multitarget and multiprocess extensions to the protocol
The GDB Remote Serial Protocol can be used to debug a wide range of target classes, from regular userspace applications, through kernels and virtual machines, to bare metal targets. These targets can be classified into three groups:
-
single-thread targets where no parallel execution is possible
-
multithreaded targets where multiple threads can run in parallel but only a single process can be debugged
-
multiprocess targets where multiple processes can be debugged simultaneously (every one of them possibly including multiple threads)
Appropriately, the protocol could be thought of being layered to cover the needs of each of these groups. The base layer would provide the minimal subset of packets necessary to run and inspect a single thread, with additional layers providing the support for debugging multiple threads and multiple processes, respectively.
Debugging targets with a single execution thread
The simplest use case for the remote protocol is debug a single-threaded
process. This could be anything from a userspace program not using
threading to a bare metal target. Depending on the exact target,
the process and thread identifiers may be present or not — if they are
not, LLDB uses (1, 1)
internally.
A really trivial debugging session could be illustrated using the following pseudo-packets:
>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T1300:0000000000000000;01:2018000000000000;...;reason:signal#00
>> $c#00
<< $W00#00
In the snippet above, the client is sending two c
(continue) packets
to resume the program’s execution. The first packet receives two replies.
The first one is an O
(output) packet that is an LLDB extension used
to carry output from the inferior to the LLDB client; in this instance,
it is hex-encoded Hello world
. The second one is a T
packet
indicating stopping due to a signal. The two first digits are signal
number in hex (SIGSTOP
here), and they are followed by additional
information about the stopped target. The second packet receives
a single W
reply indicating that the target has exited with code
0
.
Debugging targets with multiple threads
In order to debug multithreaded programs, a subset of the protocol packets need to be extended to be thread-aware. Examples of packets that need extending are execution-related packets and register operations. On the other hand, e.g. memory operations do not need thread awareness since all threads in a multithreaded program share the same memory space.
Let’s consider a little more complex snippet:
>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T13thread:24ac;00:0000000000000000;...;reason:signal#00
>> $qfThreadInfo#00
<< $m2491,24ac,24ad#00
>> $qsThreadInfo#00
<< $l#00
>> $Hc24ad#00
<< $OK#00
>> $c#00
<< T13thread:24ad;00:0000000000000000;...;reason:signal#00
>> $vCont;c:24ac;c:24ad;c#00
<< $W00#00
Here, we see a few differences:
-
the stop reason packet
T
now includes the identifier of the thread that has received the signal -
there is a new packet pair
qfThreadInfo
andqsThreadInfo
that are used to obtain the list of active threads -
there are new
Hc
andHg
packets that can be used to select the thread to be resumed, and to be used for other operations, respectively -
there is a new
vCont
command that provides greater control over resuming the process — in particular, specifying actions per thread
Of course, there are more extensions than just that, including LLDB extensions (such as the ability to pass thread identifier directly to some commands) and GDB packets that are not implemented in LLDB. However, these illustrated above are the absolute minimum to support multithreaded processes.
Debugging multiprocess targets
The support for debugging multiple processes is indicated explicitly
via the multiprocess+
extension in qSupported
packet. This
extension extends packets that were already thread-aware with process
identifier awareness. It also adds process awareness to some more
packets.
The most interesting part is extending thread identifiers to include PID. This makes it possible to integrate multiprocess support with minimal changes to the actual protocol. In multiprocess mode, thread identifiers use the following syntax:
p<pid>.<tid>
where pid and tid are the respective identifiers in the hexadecimal notation.
Let’s consider the respective snippet for a process that forks:
>> $qSupported:multiprocess+;fork-events+;vfork-events+#00
<< $multiprocess+;fork-events+;vfork-events+#00
// ...
>> $c#00
<< $O48656c6c6f20776f726c640d0a#00
<< $T05thread:p87c.87c;00:daffffffffffffff;...;fork:p896.896#00
>> $qfThreadInfo#00
<< $mp896.896,p87c.87c#00
>> $qsThreadInfo#00
<< $l#00
>> $Hcp896.896#00
<< $OK#00
>> $c#00
<< $T13thread:p896.896;00:0000000000000000;...;reason:signal#00
>> $vCont;c:p87c.87c#00
<< $T13thread:p87c.87c;00:0000000000000000;...;reason:signal#00
>> $c#00
<< $W00;process:896#00
>> $vCont;c#00
<< $T11thread:p87c.87c;00:0000000000000000;...;reason:signal#00
>> $vCont;c#00
<< $W00;process:87c#00
A few things worth noting here are that:
-
process identifiers are now being included as part of thread identifiers
-
the fork event indicates that a new process has been created
-
qfThreadInfo
includes threads of all debugged processes -
process exit events include the process identifier
Multiprocess debugging in non-stop mode
The main limitation of the regular GDB protocol is that it can report only one stop event for every resume packet. The non-stop mode lifts this limitation, enabling multiple processes to run simultaneously and report their stops independently.
Let’s consider running the part of the above debugging session in non-stop protocol mode:
<< T05thread:p2cfe8.2cfe8;...;fork:p2d009.2d009;#00
>> $QNonStop:1#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:T13thread:p2cfe8.2cfe8;...;reason:signal#00
>> $vStopped#00
<< $T13thread:p2d009.2d009;...;reason:signal#00
>> $vStopped#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:W00;process:2d009#00
>> $vStopped#00
<< $T11thread:p2cfe8.2cfe8;...;reason:signal#00
>> $vStopped#00
<< $OK#00
>> $vCont;c#00
<< $OK#00
<< %Stop:W00;process:2cfe8#00
>> $vStopped#00
<< $OK#00
The main difference is that we resume both processes simultaneously,
and they both run until they stop. The server reports the first event
asynchronously, and queues the remaining events to be obtained through
vStopped
packet.
Summary of changes in lldb-server
The primary focus of this milestone’s work was to identify and implement the remaining protocol extensions necessary for LLGS to effectively and conveniently support debugging multiple processes. Our earlier work ensured that the client was ready for the changes that would also be visible while debugging a single process.
The code changes also involved refactoring of the existing code. Methods
that were assuming that the “current” (i.e. selected via Hg
/Hc
packets) process was the only process that could be running needed
to be rewritten to account for the possibility of being called from other
processes.
Process exit handling and stdio forwarding support needed a major revamp to account for the possibility of multiple processes starting, stopping and exiting. Previously, LLGS would assume (unless in non-stop mode) that the server can exit once the process terminates — which is not a valid assumption if multiple processes are being traced.
Furthermore, the support for the vKill
packet was implemented.
The purpose of this packet is to terminate a specific process. Unlike
the k
packet used by LLDB before, it supports specifying the process
identifier and has well-defined semantics. The behavior of the k
packet was left unchanged — which meant that LLGS needed to handle
process exits differently based on whether the process exited on its own,
as a result of a k
packet or as a result of a vKill
packet.
The stdio forwarding is an LLDB-specific extension to the use of O
packet. GDB uses this only packet to send debugger’s messages to
the user when using the qRcmd
to run implementation-specific commands.
LLDB also uses it to support forwarding debugger’s process from server
to the client while the client is waiting for a stop response. Since
interspersing the synchronous O
packets with other command replies
would be dangerous in non-stop mode, LLDB implements a separate %Stdio
notification queue to forward the program’s output.
The qfThreadInfo
packet was modified to report the threads of all
debugged processes. The qC
packet (reporting the current thread ID)
was modified to include the current process identifier as well.
The T
packet was implemented that provides a convenient mechanism
for verifying whether the specified thread ID (optionally including
a process ID) is being traced.
The c
and vCont
packets were modified to allow running multiple
processes in non-stop protocol mode. However, since the process backends
in LLDB remain all-stop, it is only possible to resume a process that’s
not already running, or to stop all threads of a process that’s running.
Resuming or stopping a subset of threads requires stopping the whole
process first.
Additionally, a more complete support for the t
action was
implemented. Previously, this action could be used only to stop
the whole process in non-stop mode. Now, it can be combined with other
actions. For example, vCont;t:1234;t:1235;c
can be used to conveniently
express “keep threads 0x1234 and 0x1235 stopped, resume the remaining
threads”.
Since good test coverage is an important goal for all LLDB development, all the mentioned changes were assisted by new tests. These tests cover not only the new functionality but also the existing uses of some of the packets that have not been covered by tests sufficiently before.
Test suite update
Our patches introduced 53 new individual tests that covered both added functionality and existing packets that had not been fully tested. We have also been periodically checking the status of existing tests on FreeBSD. Whenever regressions were introduced, we would attempt to resolve them, or to report them and mark the corresponding test as expected failure. As such, it would not cause the test suite to fail and at the same time the test runner would explicitly remind us to reenable it when the underlying problem was fixed.
At the time of writing, these are the results of running the LLDB test suite on a FreeBSD 13.1 amd64 system:
Unsupported : 504
Passed : 2058
Expectedly Failed: 13
Please note that a few tests can still be unstable under high system load and they could fail occasionally.
Patches merged
- [lldb] [Process/FreeBSD] Do not send SIGSTOP to stopped process
- [lldb] [test] Implement getting thread ID on FreeBSD
- [lldb] [test] Update baseline test status for FreeBSD
- [lldb] [llgs] Include process id in W/X stop reasons
- [lldb] [llgs] Include process ID in stop responses
- [lldb] [llgs] Refactor SendStopReplyPacketForThread for multiprocess
- [lldb] [llgs] Refactor SendStopReasonForState for multiprocess
- [lldb] [test] Disable gmodules testing on FreeBSD
- [lldb] [test] Make AVX/MPX register tests more robust and fix on BSD
- [lldb] [test] Fix test_platform_file_fstat to account for negative ints
- [lldb] [MainLoop] Support “pending callbacks”, to be called once
- [lldb] [llgs] Fix signo sent with fork/vfork/vforkdone events
- [lldb] [llgs] Refactor fork/vfork tests, verify state
- [lldb] [llgs] Add a test for detach-all packet
- [lldb] [llgs] Attempt to fix LLGS tests on Windows
- [lldb] [test] Mark TestNonStop as LLGS-specific
- [lldb] [llgs] Make
k
kill all processes, and fix multiple exits - [lldb] [llgs] Implement the vKill packet
- [lldb] [llgs] Add test for resuming via c in multiprocess scenarios
- [lldb] [llgs] Support resuming one process with PID!=current via vCont
- [lldb] [llgs] Add a test for multiprocess memory read/write
- [lldb] [llgs] Support multiprocess in qfThreadInfo
- [lldb] [llgs] Add a test for multiprocess register read/write
- [lldb] [llgs] Include PID in QC response in multiprocess mode
- [lldb] [llgs] Implement the ‘T’ packet
- [lldb] [llgs] Introduce an AppendThreadIDToResponse() helper
- [lldb] [test] Move part of fork tests to common helper
- Revert “[lldb] [llgs] Support multiprocess in qfThreadInfo”
- Reland “[lldb] [llgs] Support multiprocess in qfThreadInfo”
- [lldb] [llgs] Support “t” vCont action
- [lldb] [llgs] Skip new vCont test on Windows
- [lldb] [test] Mark test_vCont_supports_t llgs-only
- [lldb] [test] Skip llgs tests broken due to #56268 on aarch64
- [lldb] [test] XFAIL llgs tests failing on arm
- [lldb] Add a NativeProcessProtocol::Threads() iterable
- [lldb] [llgs] Add base nonstop fork/vfork tests
- [lldb] [llgs] Fix premature server exit if multiprocess+nonstop
- [lldb] [test] Split TestGdbRemoteFork in two
- [lldb] [test] Fix variable overwrite in non-stop fork tests
- [lldb] [test] Use raise(SIGSTOP) instead of trap in fork tests
- [lldb] [test] Un-XFAIL fork tests on arm as well
- [lldb] [test] Avoid relying on signos in other fork tests
Patches waiting for review
- [lldb] [test] Improve stability of llgs vCont-threads tests
- [lldb] [llgs] Fix multi-resume bugs with nonstop mode
- [lldb] [llgs] Send process output asynchronously in non-stop mode
- [lldb] [llgs] Remove not-really-used m_inferior_prev_state
- [lldb] [llgs] Fix
?
packet response for running threads - [lldb] [llgs] Fix disabling non-stop mode
- [lldb] [llgs] Improve stdio forwarding in multiprocess+nonstop
- [lldb] [llgs] Support resuming multiple processes via vCont w/ nonstop
Future plans
Now that the server part of LLDB features full multiprocess support, the remaining part of our work is to implement the client counterpart. This will actually enable the users of LLDB to conveniently debug multiple processes simultaneously.
We are planning to build the multiprocess support on top of the existing support for multiple targets. The client will automatically create a new target for every new process monitored by the server, and the user will be able to switch between debugged processes and control them independently using the separate targets.
Protocol-wise, all the debugged targets will use a single shared connection to LLGS. This will make it possible to debug an arbitrary number of processes over any link, including links that aren’t technically capable of establishing multiple parallel connections (e.g. the serial port), and without being limited e.g. by firewalls. The synchronous requests from multiple targets will be multiplexed using the asynchronous nonstop protocol that permits controlling other processes while some of them are running.