How Debuggers Work: Getting and Setting x86 Registers, Part 2: XSAVE

By Michał Górny

October 29, 2020 - 18 minutes read - 3707 words

asm freebsd netbsd ptrace x86 xsave

In the previous part of this article, I have described the basic methods of getting and setting the baseline registers of 32-bit and 64-bit x86 CPUs. I have covered General Purpose Registers, baseline Floating-Point Registers and Debug Registers along with their ptrace(2) interface.

XSAVE

In the second part, I would like to discuss the XSAVE family of instructions. I will describe the different variants of this instruction as well as explain the differences between them and their limitations. Afterwards, I will compare the ptrace(2) API used to access its data on Linux, FreeBSD and NetBSD. Other systems such as OpenBSD or DragonFly BSD do not provide requests to retrieve or set extended registers, so the comparison may help them design their own APIs.

As I’ve explained earlier, the discussed instructions are necessary to implement context switching — the mechanism used by the Operating System to run multiple threads and processes quasi-simultaneously on the same processor. In order to perform that, the kernel needs to be able to save the values of all registers used by the program, and restore them afterwards. This information is also exposed to debuggers in order to provide them with means to introspect and alter the state of debugged programs.

The instructions described in the first part were sufficient to describe the registers used up to the early generations of Intel Core CPUs. However, as the next generations of processors introduced new instruction sets, it eventually became necessary to introduce new registers as well. In 2011, the AVX extensions present first in Intel’s Sandy Brige and afterwards in AMD’s Bulldozer microarchitecture doubled the sizes of earlier XMM registers, creating 16 new YMM registers.

The new registers can be used to store twice as large vectors of data, and perform operations on all of their elements simultaneously. This is particularly useful for heavy computations, for example in multimedia or cryptographic applications. Examples of programs that can explicitly take advantage of AVX instructions to improve their performance include the FFmpeg media decoding and encoding library or OpenCV image manipulation library.

As applications start using the new registers, it becomes necessary for the kernel to be able to save and restore them as part of context switching — otherwise the programs would lose data! The XSAVE instruction set serves exactly that purpose. It was introduced in the newer versions of Intel Core microarchitecture (2008). It is used both in the 64-bit and 32-bit mode (although 32-bit programs can use only a subset of the exposed registers).

The XSAVE instruction extends the format used by FXSAVE to include additional register sets. However, unlike the earlier saving instructions, it is not strictly limited to a fixed data set. Instead, it makes it possible to introduce support for new CPU extensions without the necessity of adding a next XSAVE variant or breaking compatibility with existing software. Furthermore, it accounts for the possibility that some processors may choose not to implement interim instruction sets.

The State Components

XSAVE revolves around the concept of State Components. A state component represents a single subset of data that can be saved or restored independently. There are two special state components corresponding to the original FXSAVE instruction: the x86 state component, and the SSE state component. Further instruction sets introduce one or more components each.

In modern processors, there are two kinds of state components: user state components and supervisor state components. The former group represent regular registers that are accessible to userspace programs, the latter involves privileged registers that should not be exposed to regular programs.

The individual state components are controlled via the State Component Bitmap. This bitmap is used by XSAVE to determine which instruction sets to save, and by XRSTOR to determine which to restore (or reset). Enabling the respective bits causes additional data to be saved to the memory, effectively requiring larger storage area.

In order to make it possible to save a particular state component or to use the respective registers in a program, the kernel needs to enable its tracking in one of the control registers. These control registers are XCR0 for user components, and IA32_XSS for supervisor components. Both use the same bit numbers as the state component bitmap.

State Component Bitmap
Bit	Instr. set	User SC (XCR0)	Supervisor SC (IA32_XSS)	Size (bytes)
0	x87	x87 state	reserved	512
1	SSE	SSE state	reserved	512
2	AVX	YMM_Hi128	reserved	256
3	MPX	BNDREGS	reserved	64
4	MPX	BNDCSR	reserved	16
5	AVX-512	opmask	reserved	64
6		ZMM_Hi256	reserved	512
7		Hi16_ZMM	reserved	1024
8	PT	reserved	PT	72
9	PKRU	PKRU	reserved	4
13	HDC	reserved	HDC	8

The XSAVE Area Format

The data format used by the XSAVE instruction is called the XSAVE Area. The XSAVE Area consists of three parts: the 512-byte legacy region that is the same as used by FXSAVE instruction, followed by the 64-byte XSAVE header containing information about the data present in the XSAVE Area, followed by the variably sized extended region used to store additional state components.

Similarly to FXSAVE, all XSAVE instructions have their -64 counterparts (e.g. XSAVE64) that differ in the way FIP and FDP registers are saved in the legacy region. More information on this, along with a table describing the legacy region in detail, can be found in the previous part of the article, FXSAVE vs FXSAVE64 section.

The XSAVE header currently contains two 64-bit fields whose values correspond to the state-component bitmaps: XSTATE_BV and XCOMP_BV. XSTATE_BV is written by XSAVE to indicate that a particular state component has been written to the extended region, and read by XRSTOR to determine whether the component is to be restored from this region (bit set) or reset to the default state (bit clear). XCOMP_BV is written by the compacting variants of XSAVE to indicate that the compact form of XSAVE Area is being used and which components are present in it, and read by XRSTOR to distinguish this format.

The XSAVE header layout
64	0	bits
XCOMP_BV	XSTATE_BV	0
reserved		128
		256
		384

The extended region can be written either in the standard or compact format. In the standard format, each state component is placed at a fixed offset defined by the processor (and available via CPUID). If some of the state components are skipped, the relevant portion of XSAVE Area is gapped to preserve offsets of the successive components. In the compact format, the skipped components do not take up space, and the remaining components are shifted to minimize space usage. Therefore, the offsets depend on the components actually being written, and need to be calculated by software for every invocation.

Example XSAVE Area format
Standard format	Compact format
Legacy area (512 bytes)	Legacy area (512 bytes)
XSAVE header (64 bytes)	XSAVE header (64 bytes)
YMM_Hi128 (256 bytes)	YMM_Hi128 (256 bytes)
unused (MPX + AVX-512) (1680 bytes)	PT (72 bytes)
unused (MPX + AVX-512) (1680 bytes)	(not allocated)
PT (72 bytes)	(not allocated)

Invoking XSAVE

There are a few preliminary steps that need to be done before invoking any of the XSAVE family of instructions. I will shortly list them now.

Firstly, the support for the instruction needs to be verified via CPUID. Strictly speaking, the same is also true for FXSAVE.

Secondly, the state tracking needs to be enabled. This means setting appropriate state component bits in XCR0 for user state components, and in IA32_XSS for supervisor state components. The appropriate XSAVE bit also needs to be set in the Control Register CR4. All of this is done by the kernel.

Thirdly, a buffer large enough for the XSAVE Area needs to be obtained. The program should use CPUID instruction to obtain the needed size. The buffer needs to be aligned to 64 bytes. Usually, it may be convenient to zero the buffer first, to avoid having to be careful e.g. about XSAVE leaving unused XSTATE_BV bytes unmodified.

Finally, the requested state component bitmap needs to be put into the register pair EDX:EAX (the higher 32 bits into EDX, lower into EAX — this is a common i386 convention for 64-bit integers). Once this is done, XSAVE can be invoked.

Afterwards, another series of CPUID calls are necessary to obtain offsets or sizes and alignment requirements to process the contents of the XSAVE Area.

The listing below presents a simple program that calls XSAVE three times with different register sets modified.

#include <assert.h>
#include <inttypes.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

struct xsave {
    uint8_t legacy_area[512];
    union {
        struct {
            uint64_t xstate_bv;
            uint64_t xcomp_bv;
        };
        uint8_t header_area[64];
    };
    uint8_t extended_area[];
};

int main() {
    uint32_t buf_size = 0;
    uint32_t avx_offset = 0;
    uint8_t avx_bytes[32];
    struct xsave* buf[3];
    int i;
    for (i = 0; i < sizeof(avx_bytes); ++i)
        avx_bytes[i] = i;

    __asm__ __volatile__ (
        /* check CPUID support for XSAVE and AVX */
        "mov $0x01, %%eax\n\t"
        "cpuid\n\t"
        "mov $0x04000000, %%eax\n\t"  /* bit 26 - XSAVE */
        "and %%ecx, %%eax\n\t"
        "jz .cpuid_end\n\t"
        "mov $0x10000000, %%eax\n\t"  /* bit 28 - AVX */
        "and %%ecx, %%eax\n\t"
        "jz .no_avx\n\t"
        /* get AVX offset */
        "mov $0x0d, %%eax\n\t"
        "mov $0x02, %%ecx\n\t"
        "cpuid\n\t"
        "mov %%ebx, %1\n\t"
        "\n"
        ".no_avx:\n\t"
        /* get XSAVE area size for current XCR0 */
        "mov $0x0d, %%eax\n\t"
        "xor %%ecx, %%ecx\n\t"
        "cpuid\n\t"
        "mov %%ebx, %0\n\t"
        "\n"
        ".cpuid_end:\n\t"
        : "=m"(buf_size), "=m"(avx_offset)
        :
        : "%eax", "%ebx", "%ecx", "%edx"
    );

    if (buf_size == 0) {
        printf("no xsave support\n");
        return 1;
    }

    printf("has avx: %s\n", avx_offset != 0 ? "yes" : "no");
    printf("xsave area size: %d bytes\n", buf_size);

    for (i = 0; i < 3; ++i) {
        buf[i] = aligned_alloc(64, buf_size);
        assert(buf[i]);
    }

    __asm__ __volatile__ (
        "mov $0x07, %%eax\n\t"
        "xor %%edx, %%edx\n\t"
        "xsave (%0)\n\t"
        "movd %%eax, %%mm0\n\t"
        "xsave (%1)\n\t"
        "and %3, %3\n\t"
        "jz .xsave_end\n\t"
        "vmovups (%3), %%ymm0\n\t"
        "xsave (%2)\n\t"
        "\n"
        ".xsave_end:\n\t"
        :
        : "r"(buf[0]), "r"(buf[1]), "r"(buf[2]),
          "c"(avx_offset != 0 ? avx_bytes : 0)
        : "%eax", "%edx", "%mm0", "%ymm0", "memory"
    );

    printf("XSTATE_BV (initial): %#018" PRIx64 "\n",
           buf[0]->xstate_bv);
    printf("XSTATE_BV (with MMX): %#018" PRIx64 "\n",
           buf[1]->xstate_bv);
    if (avx_offset != 0) {
        printf("XSTATE_BV (with AVX): %#018" PRIx64 "\n",
               buf[2]->xstate_bv);
        printf("YMM0 most significant quadword: %#018" PRIx64 "\n",
               *((uint64_t*)(((char*)buf[2]) + avx_offset)));
    }

    for (i = 0; i < 3; ++i)
        free(buf[i]);
    return 0;
}

On my NetBSD (9.99.74 amd64) system with Ryzen 5 3600, this program writes the following output:

has avx: yes
xsave area size: 832 bytes
XSTATE_BV (initial): 000000000000000000
XSTATE_BV (with MMX): 0x0000000000000001
XSTATE_BV (with AVX): 0x0000000000000007
YMM0 most significant quadword: 0x1716151413121110

The variants of the XSAVE instruction

XSAVE (Intel Core, 2008) is the first register-saving instruction. It saves the requested user state components (requests for supervisor state components are ignored) into the XSAVE Area. All requested components are written (if available), independently of whether they are actually being used or not. The extended region of the XSAVE Area is written in the standard format, and skipped components result in gaps.

XSAVEOPT (Sandy Bridge, 2011) is a version of XSAVE that supports two optimizations: the init optimization, and the modified optimization. The init optimization means that the requested state component will not be written if it has not been changed compared to its initial state. The modified optimization means that if XSAVEOPT is writing to the same memory area that was passed to XRSTOR previously, then the state component will not be written if it has not been modified since it has been last restored. This assumes that the XSAVE Area is not modified by the user between the two instructions. These two optimizations can improve context switching performance by avoiding unnecessary writes.

XSAVEC (Skylake, 2015) is a version of XSAVE that uses the compact XSAVE Area format. Therefore, only these components that were explicitly requested are saved into the XSAVE Area, in a packed format. It also uses the init optimization in order to skip writing the components that were not modified compared to their initial state. The XSAVEC instruction can improve performance and might reduce memory usage by skipping unnecessary components.

Finally, XSAVES (Skylake, 2015) is a version of XSAVE that combines the ability to save supervisor components, compact format, and both init and modified optimizations. The components are written only if they were modified since their initial state, and since the previous XRSTORS invocation. This variant provides the best performance, and is capable of reducing the memory footprint.

XRSTOR is the restoring counterpart of XSAVE, XSAVEOPT and XSAVEC. It automatically determines the XSAVE Area format from the header region. XRSTORS is the restoring counterpart of XRSTORS.

All of the aforementioned instructions take requested component bitmap as EDX:EAX register pair.

All of the instructions except for XSAVES and XRSTORS can be executed by unprivileged processes. XSAVES and XRSTORS can only be executed by the kernel.

Comparison of `XSAVE` variants
Variant	State comp.		Area format		Optimization
Variant	user	sup.	standard	compact	init	mod.
`XSAVE`	✓	✗	✓	✗	✗	✗
`XSAVEOPT`	✓	✗	✓	✗	✓	✓
`XSAVEC`	✓	✗	✗	✓	✓	✗
`XSAVES`	✓	✓	✗	✓	✓	✓

The ptrace(2) API

The ptrace(2) API used for other register sets is based on the concept of filling a fixed size struct. Therefore, it does not map cleanly into XSAVE instruction that can return data of variable length. While it is technically possible to simply keep adding new ptrace(2) requests as the kernel gains support for successive state components, it seems better to embrace the idea and create an API that is extensible as well. This is at least what Linux, FreeBSD and NetBSD have done.

The Linux ptrace(2) API

Linux 2.6.34 added two new ptrace(2) requests: PTRACE_GETREGSET and PTRACE_SETREGSET that provide a generic way to get any register sets. They take a NT_* constant identifying the interesting register set as their third argument (addr), and a struct iovec that encapsulates the buffer’s address and length as their fourth argument (data). The getter writes the actual data length back into the structure. An interesting advantage of this solution is that the same constants are used for these two requests and to identify notes in core dump files.

The available constants correspond to all regular register sets, including Linux-specific user register structure, FSAVE, FXSAVE. However, the most interesting to us is NT_X86_XSTATE — this is how the XSAVE Area is exposed.

The kernel only exposes methods to copy from and into the XSAVE Area. The program needs to call CPUID itself in order to determine the buffer size and component offsets.

The FreeBSD ptrace(2) API

FreeBSD has three dedicated ptrace(2) requests related to XSAVE: PT_GETXSTATE_INFO, PT_GETXSTATE and PT_SETXSTATE.

PT_GETXSTATE_INFO takes a pointer to struct ptrace_xstate_info as the third argument (addr), and its size as the fourth argument (data). It fills the structure with the enabled XSAVE component bitmap and the maximum XSAVE Area length.

PT_GETXSTATE and PT_SETXSTATE take a pointer to the buffer as the third argument (addr) and its size as the fourth argument (data). The buffer uses the same layout as the XSAVE Area itself.

While FreeBSD provides explicit API to get the buffer size, working on the XSAVE Area itself still requires querying CPUID to determine the component offsets.

The NetBSD ptrace(2) API

NetBSD has gained a ptrace(2) API to access the XSAVE Area last year. It consists of two requests, PT_GETXSTATE and PT_SETXSTATE. Both requests take a struct iovec that encapsulates a pointer to struct xstate and its (current) size, as the third argument (addr). Similarly to other register requests on NetBSD, the fourth argument (data) specified the LWP (thread) identifier.

Unlike the other two systems, NetBSD does not use the raw XSAVE Area but instead normalizes it into struct xstate. The caller does not need to worry about allocating appropriately sized buffer or determining the layout of the XSAVE Area. The current size of struct xstate covers all currently supported components, and since it is passed along with the request, new fields can be added without breaking backwards compatibility. Furthermore, the kernel can switch to using XSAVES in the future without changing the user-visible struct xstate.

Furthermore, the NetBSD structure provides an explicit field to control XSAVE Area updates more precisely. This makes it possible to issue a partial PT_SETXSTATE without having to copy the existing values for everything else from PT_GETXSTATE.

Finally, NetBSD implements translation from both FSAVE and FXSAVE, making it possible to use PT_GETXSTATE and PT_SETXSTATE unconditionally on all x86 systems going as far as NetBSD/i386 is supported. It is therefore a good replacement for both PT_GETFPREGS and PT_GETXMMREGS, eliminating the inconsistency between i386 and amd64.

You can read more about the design of NetBSD XSAVE support in the LLDB: watchpoints, XSTATE in ptrace() and core dumps report.

An Example Multiplatform Program

The listing below provides an example program that reads an YMM register via the XSTATE ptrace(2) API and then writes a modified value back. The program is using conditional #if blocks to provide compatibility with FreeBSD, NetBSD and Linux. As such, it primarily demonstrates the differences between the interfaces provided by these Operating Systems.

#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/uio.h>
#include <sys/wait.h>

#if defined(__NetBSD__)
#   include <x86/cpu_extended_state.h>
#   include <x86/specialreg.h>
#elif defined(__FreeBSD__)
#   include <x86/fpu.h>
#   include <x86/specialreg.h>
#elif defined(__linux__)
#   include <linux/elf.h>
#else
#   error "unsupported platform"
#endif

#include <assert.h>
#include <inttypes.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>

#include <cpuid.h>

void print_ymm(const char* name,
               uint8_t xmm[16],
               uint8_t ymm_hi[16]) {
    int i;
    printf("%20s: {", name);
    for (i = 0; i < 16; ++i)
        printf(" 0x%02x", xmm[i]);
    for (i = 0; i < 16; ++i)
        printf(" 0x%02x", ymm_hi[i]);
    printf(" }\n");
}

int main() {
    /* verify that AVX is supported */
    uint32_t eax, ebx, ecx, edx;
    if (!__get_cpuid(0x01, &eax, &ebx, &ecx, &edx) ||
            !(ecx & bit_AVX)) {
        printf("AVX not supported\n");
        return 1;
    }

#if !defined(__NetBSD__)
    /* get the YMM offset for systems using the raw XSAVE Area */
    assert (__get_cpuid_count(0x0d, 0x02, &eax, &ebx, &ecx, &edx));
    uint32_t avx_offset = ebx;
#endif
#if defined(__linux__)
    /* get the size of the XSAVE Area */
    assert (__get_cpuid_count(0x0d, 0x00, &eax, &ebx, &ecx, &edx));
    uint32_t xsave_size = ebx;
#endif

    int ret;
    pid_t pid = fork();
    assert(pid != -1);

    if (pid == 0) {
        /* child -- debugged program */
        uint8_t avx_bytes[32];
        int i;
        for (i = 0; i < sizeof(avx_bytes); ++i)
            avx_bytes[i] = i;

        /* request tracing */
#if !defined(__linux__)
        ret = ptrace(PT_TRACE_ME, 0, NULL, 0);
#else
        ret = ptrace(PTRACE_TRACEME, 0, NULL, 0);
#endif
        assert(ret != -1);

        print_ymm("in child, initial", avx_bytes, avx_bytes+16);

        __asm__ __volatile__ (
            "vmovups (%0), %%ymm0\n\t"
            "int3\n\t"
            "vmovups %%ymm0, (%0)\n\t"
            :
            : "b"(avx_bytes)
            : "%ymm0", "memory"
        );

        print_ymm("in child, modified", avx_bytes, avx_bytes+16);

        _exit(0);
    }

    /* parent -- the debugger */
    /* wait for the child to become ready for tracing */
    pid_t waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    assert(WIFSTOPPED(ret));
    assert(WSTOPSIG(ret) == SIGTRAP);

    /* get registers */
#if defined(__NetBSD__)
    struct xstate xst;
    struct iovec iov = { &xst, sizeof(xst) };
    ret = ptrace(PT_GETXSTATE, pid, &iov, 0);
#elif defined(__FreeBSD__)
    struct ptrace_xstate_info info;
    ret = ptrace(PT_GETXSTATE_INFO, pid,
                 (caddr_t)&info, sizeof(info));
    assert(ret == 0);

    char buf[info.xsave_len];
    ret = ptrace(PT_GETXSTATE, pid, buf, sizeof(buf));
#elif defined(__linux__)
    char buf[xsave_size];
    struct iovec iov = { buf, sizeof(buf) };
    ret = ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
#endif
    assert(ret == 0);

    /* SSE+AVX registers should have been requested */
#if defined(__NetBSD__)
    assert(xst.xs_rfbm & XCR0_SSE);
    assert(xst.xs_rfbm & XCR0_YMM_Hi128);
#elif defined(__FreeBSD__)
    assert(info.xsave_mask & XFEATURE_ENABLED_SSE);
    assert(info.xsave_mask & XFEATURE_ENABLED_YMM_HI128);
#endif

    /* SSE+AVX registers should be in modified state */
#if defined(__NetBSD__)
    assert(xst.xs_xstate_bv & XCR0_SSE);
    assert(xst.xs_xstate_bv & XCR0_YMM_Hi128);
#elif defined(__FreeBSD__)
    struct xstate_hdr* xst = (struct xstate_hdr*)&buf[512];
    assert(xst->xstate_bv & XFEATURE_ENABLED_SSE);
    assert(xst->xstate_bv & XFEATURE_ENABLED_YMM_HI128);
#elif defined(__linux__)
    uint64_t xstate_bv = *((uint64_t*)&buf[512]);
    assert(xstate_bv & 2); /* SSE */
    assert(xstate_bv & 4); /* YMM_Hi128 */
#endif

#if defined(__NetBSD__)
    uint8_t* xmm = xst.xs_fxsave.fx_xmm[0].xmm_bytes;
    uint8_t* ymm_hi = xst.xs_ymm_hi128.xs_ymm[0].ymm_bytes;
#elif defined(__FreeBSD__)
    uint8_t* xmm =
        ((struct savexmm*)buf)->sv_xmm[0].xmm_bytes;
    uint8_t* ymm_hi =
        ((struct ymmacc*)&buf[avx_offset])[0].ymm_bytes;
#elif defined(__linux__)
    uint8_t* xmm = &buf[160];
    uint8_t* ymm_hi = &buf[avx_offset];
#endif

    print_ymm("from PT_GETXSTATE", xmm, ymm_hi);
    int i;
    for (i = 0; i < 16; ++i) {
        xmm[i] += 0x80;
        ymm_hi[i] += 0x80;
    }
    print_ymm("set via PT_SETXSTATE", xmm, ymm_hi);

    /* update the registers and resume the program */
#if defined(__NetBSD__)
    ret = ptrace(PT_SETXSTATE, pid, &iov, 0);
#elif defined(__FreeBSD__)
    ret = ptrace(PT_SETXSTATE, pid, buf, sizeof(buf));
#elif defined(__linux__)
    ret = ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
#endif
    assert(ret == 0);
    ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
    assert(ret == 0);

    /* wait for the child to exit */
    waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    assert(WIFEXITED(ret));
    assert(WEXITSTATUS(ret) == 0);

    return 0;
}

Summary

The FXSAVE instruction can store x86 registers up to the XMM registers introduced with SSE. The XSAVE and XRSTOR family of instructions can be used to save and restore the registers introduced by newer instruction sets, e.g. the YMM registers introduced by AVX. XSAVE is specifically designed to allow introducing new register sets without breaking backwards compatibility or requiring new variants of the instruction.

The XSAVE instructions are relatively harder to use than the methods described in the first part of the article. The user needs to specify the requested State Components. Depending on the XSAVE Area format used by the instruction, the user also needs to obtain or compute the appropriate buffer size and State Component offsets.

The additional variants of the XSAVE instruction primarily provide optimizations that aim to improve the performance of context switching. These include skipping register sets that are in their initial state or that have not been modified since the last XRSTOR call, as well as using a more compact XSAVE Area format. The privileged XSAVES variant introduces additional State Components that are only available to the supervisor.

The new instructions required an appropriately extensible ptrace(2) API. Unlike the requests for earlier register sets, the API for XSTATE varies greatly between FreeBSD, NetBSD and Linux. Both FreeBSD and Linux expose the raw XSAVE Area, while NetBSD normalizes it into a well-defined struct xstate. However, all three systems share the concept of explicitly specifying the buffer size, in order to support future extensions.

This concludes the two-part article on working with register sets via ptrace(2) API. You should now have a rough idea why it is necessary to save and restore the state of all registers on a system, how the kernel does it on x86 and how the results are exposed to the debugger. While the article was concerned only with x86, and primarily on FreeBSD, NetBSD and Linux, this should give you good foundations for further research. Many other architectures share very similar concepts, and the large parts of ptrace(2) API are very similar across different architectures and Operating Systems from the UNIX family.