FreeBSD Remote Process Plugin: Final Milestone Achieved

By Kamil Rytarowski, Michał Górny

December 10, 2020 - 14 minutes read - 2858 words

BSD contract debugger FreeBSD LLDB LLVM

Moritz Systems have been contracted by the FreeBSD Foundation to modernize the LLDB debugger’s support for FreeBSD. We are working on a new plugin utilizing the more modern client-server layout that is already used by Darwin, Linux, NetBSD and (unofficially) OpenBSD. The new plugin is going to gradually replace the legacy one.

LLVM

This dragon image is owned by Apple Inc.

The Project Schedule was divided into three milestones, each taking approximately one month:

M1 Introduce new FreeBSD Remote Process Plugin for x86_64 with basic support and upstream to LLVM.
M2 Ensure and add the mandated features in the project (process launch, process attach (pid), process attach (name), userland core files, breakpoints, watchpoints, threads, remote debugging) for FreeBSD/amd64 and FreeBSD/i386.
M3 Iterate over the LLDB tests. Detect and as time permits fix bugs. Ensure bug reports for each non-fixed and known problem. Add missing man pages and update the FreeBSD Handbook.

In the previous report we have announced the completion of the second project’s milestone, that is achieving the feature parity with the legacy plugin and enabling the new plugin by default on 32 and 64-bit x86. We have explained how different platforms express process and thread identifiers and how SIGTRAP is used to deliver event notifications to the debugger. We have also described the two alternative approaches on hooking the debugger up to the process - either via launching it, or attaching to a running process.

The third milestone was focused on fixing bugs, updating the test suite state and documentation. We are proud to announce that this stage is finished as well, and therefore the whole contract is accomplished timely and successfully. In this article, we would like to shortly summarize our work and describe some of the more interesting areas of focus in detail.

A race condition while copying watchpoints to new threads

The primary goal in the third milestone was to go through failing tests and either fix them, or at least document the failures and mark the respective tests as expected to fail. The first really interesting problem we’ve found while investigating the commands/watchpoints/multiple_threads test. The purpose of the test is to verify that watchpoints work when the respective variables are altered by a non-main thread.

Originally, the test was done in two variants: with the watchpoint being set before starting the new thread, and after starting it. The first variant was supposed to verify whether LLDB correctly copies existing watchpoints to new threads as they are being started. The second variant verified whether the watchpoint command correctly adds the new watchpoint to all running threads.

Debug Registers in threading application

What’s important here is that hardware-assisted watchpoints on x86 are configured via altering the state of Debug Registers. Like other register sets, the values of DRs are thread-local, and therefore the debugger needs to set them separately for every thread. Furthermore, new threads inherit the DR state from parent threads on FreeBSD, and our original watchpoint code relied on new threads having the correct DR at start.

However, there is a catch. The new thread is not reported to the debugger until it is actually ready to start. During this time, the DRs are copied from the parent thread and it continues execution. In fact, it is entirely feasible that the process is stopped due to breakpoint in the parent thread before the new thread is actually reported ready. This creates an ample opportunity for the user to set a new watchpoint, and this is precisely what happened to us during the test.

At this point, the debugger is not yet aware that another thread is being created. However, the kernel has already copied the Debug Register values from the parent thread. As a result, the new thread is created with the old DR values, while the debugger assumed that it had the new values instead.

We have reported this confusing behavior to the FreeBSD Bugzilla. For the time being, we’ve changed the plugin to explicitly copy DRs when a new thread is reported, therefore guaranteeing that any changes during the problematic period are propagated. We have also extended the original test to cover three scenarios: watchpoint set before requesting the new thread, watchpoint set immediately after requesting it (i.e. falling into our race condition) and watchpoint set after waiting for the new thread to actually start running (i.e. covering the original intent).

Simplifying the register reading and writing logic

The original register reading and writing logic in the new plugin has been inspired by the code present in the NetBSD plugin. It roughly consisted of a large switch-case construct that mapped enumeration values into appropriate operations on system structures. There were three large switches in total: one for reading register values, one for writing register values and one for mapping enumeration values from i386 to amd64 platform. Furthermore, the first two needed large separate variants for i386 and amd64.

At the same time, LLDB already carried another set of register information that was created via macros by inspecting struct field offsets and sizes. Unlike the plugin logic, it did not use system structures but instead inlined them. This is because the same structures are used to access core dumps, and avoiding system headers makes it possible to compile the code and inspect FreeBSD core dumps on other systems.

LLDB Registers

Unlike NetBSD, the Linux plugin actually reused the offsets and sizes from this data to access register sets. We have decided to follow suit, and replace the aforementioned custom logic with accesses based on offset and size values, and this allowed us to reduce code duplication significantly. We have also added platform-specific tests that verify that the offsets and sizes are correct, compared to system structures.

What’s even more important is that this change improved maintainability a lot. We have had hit cryptic bugs that turned out to be caused by wrong integer type being used inside the switch-case. Storing the sizes inside a list makes it possible to easily verify their correctness and avoid future bugs due to size mismatches.

Fixing cases of the legacy plugin being wrongly used

LLDB plugins

The process plugins in LLDB are split into two kinds: client plugins and server plugins. Client plugins are used by the LLDB client, while server plugins are used by lldb-server to implement the remote protocol. The legacy FreeBSD plugin is a client plugin - it is loaded by LLDB and used to debug a program. The modern FreeBSD plugin is a server plugin - it is loaded by the LLDB server and used to implement the GDB remote protocol. Another plugin called gdb-remote provides a glue between the client and server. It is loaded by the client, it spawns lldb-server and fulfills client’s requests by communicating with the server.

Therefore, by switching between the legacy and remote FreeBSD plugins, we are actually switching between using the legacy client plugin and the gdb-remote plugin that spawns lldb-server with the remote FreeBSD plugin. Our original switching logic (based on the prior art from the Windows plugin) consisted of two pieces: a boolean switch in PlatformFreeBSD and a code blocking the legacy plugin from being loaded when the new plugin should be used. However, we have established that the latter is not really necessary, and we have removed the latter part as we changed the preferred plugin.

During the final testing period, we’ve found and fixed two cases where this was not correct: when choosing plugin for process connect, and when attaching to a running process.

The process connect command is supposed to iterate through all available process plugins, find one that initializes successfully and use it to establish a connection to the server. However, it lacked any means of actually determining whether the plugin in consideration supported remote connections at all. This was acceptable for non-transitional platforms that had only one candidate client plugin. However, on FreeBSD it could randomly choose either the legacy plugin, or the gdb-remote plugin. To resolve this, we have added explicit filtering for remote connection support, using similar approach as for determining core file support.

The plugin used for launching and attaching processes was supposed to be controlled by the aforementioned boolean switch. If the new plugin was to be used, the method returned true and the launch/attach implementation from PlatformPOSIX was being used. Otherwise, it returned false and the legacy plugin kicked in.

The PlatformPOSIX::DebugProcess() method used to launch programs explicitly forced the gdb-remote plugin. However, the PlatformPOSIX::Attach() method did not specify the plugin name and could therefore use either. To fix this, we’ve updated it to force gdb-remote consistently within the class.

The interaction between dynamic loader and the debugger

The dynamic loader is the system component responsible for loading shared libraries that are used by the program. This includes both loading the linked libraries as specified by DT_NEEDED ELF header, and loading additional modules at runtime via dlopen(3).

The dynamic linker provides a r_debug structure that can be used by the debugger to inspect its state, as well as monitor events - that is, loading and unloading shared libraries. The r_debug structure is consistent across most of the Unix systems (with Solaris being an exception). On FreeBSD, it is declared in <sys/link_elf.h> as:

struct r_debug {
        int             r_version;      /* Currently '1' */
        struct link_map *r_map;         /* list of loaded images */
        void            (*r_brk)(struct r_debug *, struct link_map *);
                                        /* pointer to break point */
        enum {
                RT_CONSISTENT,          /* things are stable */
                RT_ADD,                 /* adding a shared library */
                RT_DELETE               /* removing a shared library */
        }               r_state;
        void            *r_ldbase;      /* Base address of rtld */
};

The r_version field specifies the structure version. The newest releases of FreeBSD and NetBSD both use version 0 of the SVR4 rendezvous protocol. Linux uses version 1, and the future releases of FreeBSD and NetBSD will use it too. The only difference between the two versions is the presence of r_ldbase field. It is worth noting that using version 1 has the additional advantage of clearly indicating that the structure has been initialized.

The r_map field is a pointer to an array of link_map structures providing information about the currently loaded shared libraries.

The r_brk provides an address to a function that is called by dynamic loader on state changes. The debugger is expected to set a breakpoint on this function in order to act on these events.

The r_state field indicates the current dynamic loader state. There are three states defined: consistent indicating that a new stable state has been achieved, add indicating that the loader is about to load new libraries and delete indicating that it is about to unload libraries.

Finally, r_ldbase specifies the memory address at which the dynamic loader itself is loaded.

Dynamic Loader

When the dynamic linker is about to load a new module, it triggers the r_brk breakpoint (called the rendezvous breakpoint in LLDB) with an r_state of add. When it is about to unload a module, it calls it with an r_state of delete. In both cases, r_map does not include the new modules yet. The debugger can use this to save the current list of modules for comparison.

After the modules are loaded or unloaded, the breakpoint is hit again, with the consistent r_state. At this point, LLDB updates its loaded module list.

One curious difference between Linux and FreeBSD is how the initial set of shared libraries (DT_NEEDED) is reported. On Linux, it is reported at the very beginning of the program via a regular added-consistent series of hits. On the first (added state) breakpoint hit, the module list contains only the dynamic loader itself. On the second (post-add) hit, it contains all the shared libraries. On FreeBSD, there is only one (consistent) breakpoint hit during which all the shared libraries are already present in r_map.

LLDB’s POSIX Dynamic Loader plugin has been originally written with the Linux behavior in mind, particularly expecting an explicit add event for the dynamically loaded shared libraries. As a result, it has failed to include the DT_NEEDED libraries in the loaded module list. A side effect of this is that it also did not skip the dynamic loader itself on Linux.

We have prepared a patch adding all libraries from the initial breakpoint hit that resolved the FreeBSD problem and therefore unblocked enabling memory map support. However, we had to revert it since it caused the dynamic loader module to be loaded twice on Linux. We have established that this is caused by the module being loaded using two different paths (the ld-linux-x86-64.so.2 symlink and actual ld-2.32.so file), and LLDB relying on exact path match for deduplication.

Other significant changes and fixes

Besides the problems we’ve described in detail above, the final milestone work included a few more important fixes, notably:

Removing thread name caching that caused LLDB not to reflect thread name changes during process' runtime.
Adding support for exec() events.
Fixing handling of user-raised SIGTRAP.
Adding fip and fdp registers on amd64 that provide convenient access to the full 64-bit values of these FPU registers (this is a followup on FIP/FDP register problems from our first report).
Translating ftag to its full value, consistently with GDB behavior (this is a followup on ftag register problems from our first report).

Digesting of the changes

The final results of the execution of the LLDB regression on FreeBSD 13.0-CURRENT amd64:

Unsupported      :  453
Passed           : 1766
Expectedly Failed:    4

This test results reflect the pristine LLVM development branch (revision 25c40a45999e59e3b2902cd91373cd47e7a93488) with the dynamic loader patch patch applied.

For comparison, the results on Linux 5.9.13 x86_64:

Unsupported      :  326
Passed           : 1904
Expectedly Failed:    1

We have ensured that all non-fixed and known problems have documented Problem Reports in LLVM’s Bugzilla.

To find annotated failing or skipped tests, try:

find lldb/test/API -type f \
    -exec grep -i '\(expectedFail\|skipIf\).*freebsd' {} +

The lldb-server program has been documented in a form of a manual page. Originally, the lldb.1 contributed by The FreeBSD Foundation file was written in a raw troff format, but it was recently rewritten by upstream in a Sphinx format and it is currently generated on the fly, during the build.

The FreeBSD Handbook was patched accordingly to mention the LLDB remote debugging capabilities. We expect to see this change merged once LLDB 12.0 is released.

Changes merged upstream

Summary of the third and the last milestone

The third milestone finalizes our current contract with the FreeBSD Foundation. The introduced changes are expected to be shipped with LLDB 12.0, and where applicable in FreeBSD 13.0.

During our work, the FreeBSD Project gained numerous important improvements: in the kernel, userland base libraries (the dynamic loader) and the LLVM toolchain FreeBSD support. The overall experience of FreeBSD/LLDB developers and advanced users on this rock solid Operating System reached the state known from other environments. Furthermore, the FreeBSD specific work resulted in generic improvements, enhancing the LLDB support for Linux and NetBSD.

Now, after concluding the FreeBSD work, we are also planning to use our new experience to merge improvements back to the NetBSD plugin, which was used as a starting point for the whole FreeBSD work.

This work was sponsored by The FreeBSD Foundation and we are grateful for this great development challenge from the FreeBSD Project.