FreeBSD Remote Process Plugin is now the default in LLDB

By Michał Górny, Kamil Rytarowski

November 5, 2020 - 10 minutes read - 2059 words

bsd contract debugger freebsd lldb llvm

Moritz Systems have been contracted by the FreeBSD Foundation to modernize the LLDB debugger’s support for FreeBSD. We are working on a new plugin utilizing the more modern client-server layout that is already used by Darwin, Linux, NetBSD and (unofficially) OpenBSD. The new plugin is going to gradually replace the legacy one.

FreeBSD LLDB project

The Project Schedule is divided into three milestones, each taking approximately one month:

M1 Introduce new FreeBSD Remote Process Plugin for x86_64 with basic support and upstream to LLVM.
M2 Ensure and add the mandated features in the project (process launch, process attach (pid), process attach (name), userland core files, breakpoints, watchpoints, threads, remote debugging) for FreeBSD/amd64 and FreeBSD/i386.
M3 Iterate over the LLDB tests. Detect and as time permits fix bugs. Ensure bug reports for each non-fixed and known problem. Add missing man pages and update the FreeBSD Handbook.

In the previous report we have announced the completion of the first project’s milestone, that is upstreaming the first functional version of the plugin. We have described the differences between the legacy plugin model and the modern server-client model. We have listed the major components involved in platform support and detailed a few problems found while implementing the support for x87 FPU registers.

This time we would like to announce the completion of the second milestone. We have reached feature parity with the original FreeBSD plugin on amd64 and i386 architectures. This made it possible to enable it by default on these two targets. In this article, we would like to uncover a few facts related to the work we have been doing in the past months, in particular explain more differences between FreeBSD and NetBSD.

Thread identifiers across platforms

A single process can have one or more threads. While process identifiers are standardized by POSIX, the same is not true about threads. Their implementations vary greatly across different platforms, in particular FreeBSD, NetBSD and Linux all use different approaches. The portable threads (pthread) library uses opaque types to avoid relying on a specific implementation.

PID and TID

Linux uses a combined namespace for process and thread identifiers. The first thread of a process has the same identifier as the process itself, while the remaining threads get identifiers that are globally unique, and do not collide with the identifiers of other processes or threads. The ptrace(2) requests that operate on threads accept thread identifiers in place of the PID.

Historically, NetBSD used an approach of separate process and thread namespaces. Thread, or Lightweight Process (in BSD terminology) Identifiers were local to the process and were not unique between different processes. Starting with the upcoming release of NetBSD 10.0, thread identifiers will be globally unique, similarly to the Linux approach. However, for compatibility with the existing code and more flexibility in the future, the syscalls will continue requiring explicitly passing both process and thread identifiers. For this reason, the ptrace(2) requests working on a specific thread generally dedicate the numeric data argument for passing the LWP ID.

FreeBSD splits a single namespace into two ranges, respectively for process and thread identifiers. Identifiers up to PID_MAX are used to identify processes, while the identifiers above this value are used for threads. All thread identifiers are globally unique and disjoint from process identifiers. The requests that operate on threads accept thread identifiers in place of the PID.

The pthread library uses an opaque pthread_t type to represent thread identifiers, that are normally initialized by pthread_create(). The identifier for the current thread can be returned by pthread_self() and compared to other identifiers using pthread_equal. However, there is no portable way of printing it or using it outside of the current process.

GNU/Linux provides a non-portable gettid() function (<unistd.h>) that can be used to obtain the numeric thread identifier that can be passed e.g. to tgkill(). This function has been added in glibc 2.30. For compatibility with older versions of libc, SYS_gettid syscall can be directly invoked instead.

On NetBSD, the numeric identifier of the current thread can be gotten using _lwp_self() (<lwp.h>). This identifier needs to be combined with the process ID e.g. before passing it to ptrace(2) calls.

On FreeBSD, the numeric identifier of the current thread can be obtained using pthread_getthreadid_np() (<pthread_np.h>).

Launching and attaching to processes

There are two primary ways to hook up a debugger to a process. You can either have the debugger start the program and therefore debug it from its entry point, or you can attach a process that is already running to the debugger. The second approach is especially useful to handle unexpected issues that occurred while using a program.

Debugger attach & launch

Launching a program inside the debugger is very similar to the POSIX low-level method of starting a child executable — i.e. fork(2), prepare the child environment and then execute the actual program via execvp(3) or alike. The difference is that the debugger issues a PT_TRACE_ME request in order to take control over the child just before executing the actual executable. This implies that it needs to automatically step through this final action before giving control to the user.

Attaching to a running process is done using the PT_ATTACH request. The request takes a PID of the interesting process, attaches it to the debugger and stops as a result.

In both cases, the debugger calls waitpid(2) or alike on the process, in order to confirm that it has stopped. On both FreeBSD and NetBSD, it invokes the PT_SET_EVENT_MASK request to enable reporting events of interest — e.g. new or terminating threads. Finally, it obtains a list of all threads of the running process.

The method of getting the thread list differs between FreeBSD and NetBSD. On FreeBSD, PT_GETNUMLWPS is invoked first to get the number of active threads, then PT_GETLWPLIST is used to get the list of thread identifiers. On NetBSD, PT_LWPNEXT is repeatedly called to get information about successive threads.

SIGTRAP on FreeBSD and NetBSD

Generally, when a non-ignored signal is about to be delivered to a debugged process, the process stops and the waitpid(2) call or alike issued by the debugger to monitor the process indicates the signal. The debugger is then responsible for deciding how to handle the signal and resuming the process.

The same mechanism is used to inform the debugger about other events related to the debugged process, such as:

breakpoint and watchpoint hits
single-stepping traps
spawning new processes
starting and exiting threads
replacing the process via exec(3)
syscall entry and exit

More precisely, whenever such an event occurs, the kernel generates an artificial SIGTRAP signal that causes the process to be stopped. Some of the events are signaled unconditionally, while others need to be explicitly requested by setting the event mask via PT_SET_EVENT_MASK. Through inspecting the detailed signal/LWP information, the debugger can determine which event has occurred.

The level of detail of SIGTRAP data differs between FreeBSD and NetBSD. On FreeBSD, the event and signal data can be found in the structure returned by PT_LWPINFO. On NetBSD, the signal data is obtained via PT_GET_SIGINFO, while additional event information is provided by PT_GET_PROCESS_STATE.

The following table illustrates the events and the method of reporting them.

`SIGTRAP` `si_code` values
Event	FreeBSD		NetBSD
Event	`pl_siginfo.si_code`	`pl_flags`	`psi_siginfo.si_code`	`pe_report_event`
Breakpoint	`TRAP_BRKPT`		`TRAP_BRKPT`
Generic trace	`TRAP_TRACE`		`TRAP_TRACE`
Hardware DR trap	`TRAP_TRACE`		`TRAP_DBREG`
DTrace-induced trap	`TRAP_DTRACE`		(FreeBSD-specific)
Capabilities protective trap	`TRAP_CAP`		(FreeBSD-specific)
LWP (thread) created		`PL_FLAG_BORN`	`TRAP_LWP`	`PTRACE_LWP_CREATE`
LWP (thread) exited		`PL_FLAG_EXITED`	`TRAP_LWP`	`PTRACE_LWP_EXIT`
Syscall entry		`PL_FLAG_SCE`	`TRAP_SCE`
Syscall exit		`PL_FLAG_SCX`	`TRAP_SCX`
`exec(3)`		`PL_FLAG_EXEC`	`TRAP_EXEC`
`fork(2)` (parent)		`PL_FLAG_FORKED`	`TRAP_CHLD`	`PTRACE_FORK`
`fork(2)` (child)		`PL_FLAG_CHILD`		`PTRACE_FORK`
`vfork(2)` (parent)		`PL_FLAG_VFORKED`		`PTRACE_VFORK`
`vfork(2)` (child)		`PL_FLAG_CHILD`		`PTRACE_VFORK`
`vfork(2)` (parent resumed)		`PL_FLAG_VFORK_DONE`		`PTRACE_VFORK_DONE`
`posix_spawn(3)` (parent and child)	(NetBSD-specific)			`PTRACE_POSIX_SPAWN`
Note: `fork(2)`, `vfork(2)` and `posix_spawn(3)` signals are issued both from the parent (forking) process, upon reaching the syscall, and from the child process, before executing the first instruction. The `vfork(2)` syscall blocks parent until the child exits or execs, and the kernel issues an additional signal to the parent when that happens and it is about to resume execution. The `clone(2)` function causes the same signal as `fork(2)` or `vfork(2)`, depending on its arguments.

Achieving Milestone 2 and updating tests

The most important goal for Milestone 2 was to reach feature parity with the legacy plugin, and ensure that there are no major regressions that would prevent LLDB from being used to actually debug programs. We have finally reached that point.

In order to reach this point, we had to implement missing features and fix a few nasty bugs. Most notably, we had to implement threading support and XSTATE-based register support. The watchpoint implementation originally written for NetBSD worked fine for FreeBSD but we wanted to provide a single, reusable mixin-style class rather than copying the same code to a third plugin (NetBSD watchpoint support was a modified version of the Linux code). While at it, we tried to make the code more readable.

Figuring out why attaching to processes did not work was particularly challenging. It turned out that it was a result of two different issues in the plugin code. Firstly, setting the process state to stopped while attaching caused lldb-server to try to prematurely emit the respective state packet. Since the attach method has not returned yet, the server crashed due to not having the process handle. Secondly, we were re-listing process' threads too late in the code, after marking all threads as stopped. As a result, the threads reported to LLVM were not marked as stopped.

Attaching to process by name was broken for both FreeBSD plugins, as well as NetBSD, possibly due to earlier changes in LLDB. The plugin code responsible for searching running processes involved an optimization that delayed fetching the process name until its other properties (such as PID, UID, GID…, if requested) were tested. However, the test also attempted to match the process name before it was read, and therefore always failed. We have fixed it to skip process name during the first verification.

Finally, we wanted to discover why expression parser did not work correctly. The causal chain involved the parser engine claiming that it can’t allocate memory, inability to find mmap function, missing shared libraries in process' module list and finally the method responsible for getting information on memory regions. We postponed the further investigation of this bug for the next milestone. Since the legacy plugin did not implement this method at all, we have decided to disable it for the time being and continue looking into the problem later.

After resolving all these issues, we have decided to swap the default plugin. Now, the (new) remote plugin is used by default with amd64 and i386 targets. The legacy plugin can be forced there by setting FREEBSD_LEGACY_PLUGIN environment variable to any value. It is also used on other architectures that the new plugin has not been ported yet (arm, arm64, mips, ppc).

Changes merged upstream

Plan for the next milestone

The third milestone focuses on resolving issues and updating documentation. We have initially marked most of the failing tests as ‘expected failures’ already. We have also established that some tests are producing unstable results. In the first place, we would like to go through these tests and attempt to fix the underlying bugs.

We are going to go through open LLDB bug reports, and establish whether they are still valid. We are going to close these that have been fixed already, and file new bugs for known issues that have not been reported yet.

We are also going to ensure that the FreeBSD documentation regarding LLDB is complete and up-to-date. This primarily involves adding missing manpages for LLDB tools.