How Debuggers Work: Getting and Setting x86 Registers, Part 1
By Michał Górny
- 15 minutes read - 3001 wordsIn this article, I would like to shortly describe the methods used
to dump and restore the different kinds of registers on 32-bit
and 64-bit x86 CPUs. The first part will focus on General Purpose
Registers, Debug Registers and Floating-Point Registers up to the XMM
registers provided by the SSE extension. I will explain how their
values can be obtained via the ptrace(2)
interface.
The ptrace(2)
API is commonly used in all modern BSD systems
and Linux, as all of them derive it from the original form designed
and implemented in 4.3BSD. The primary focus in this article is
on the FreeBSD and NetBSD systems. Nevertheless, the users of other
Operating Systems such as OpenBSD, DragonFly BSD or Linux can still
benefit from this article as the basic principles are the same
and the code examples are intended to be easily adapted to other
platforms.
A single CPU (in modern hardware: CPU core or CPU thread, if hyperthreading is available) can execute only one program thread at a time. In order to be able to run multiple processes and threads quasi-simultaneously, the Operating System must perform context switching — that is periodically suspend the currently running thread, save its state, restore the saved state of another thread and resume it. Saving and restoring the values of the processor’s registers play an important part in context switching. It is important that this process is fully transparent to the process being switched, and in a properly implemented kernel there should be no side effects that are perceptible to the program.
The debugger may need to examine the register sets of the debugged program for a number of reasons. By inspecting the Program Counter, it is able to determine the location in source code at which the execution will continue, and by altering it it can control the execution. The Stack Pointer is necessary to introspect variables stored on the stack, while the remaining registers can hold variables themselves.
A special set of the x86 registers are the Debug Registers. They are not accessible to the program itself; however, they can be read or written by the debugger. They allow setting hardware assisted breakpoints (instruction execute trap) on the code being executed, and watchpoints (read and/or write operation trap) on the variables.
General Purpose Registers (GPR)
Copying GPRs
The term ‘General Purpose Registers’ is a bit ambiguous. In the narrower sense, it means the few (8 on i386, 16 on amd64) baseline registers that can be used to store arbitrary data (usually integers or pointers). In the wider sense it means all baseline registers in the processor architecture, historically excluding floating-point registers and special kinds of registers. On x86, this includes the ‘narrower sense’ general-purpose registers, the Program Counter (EIP/RIP), segment registers and the flag register.
The majority of the General Purpose Registers can be copied directly,
e.g. using the MOV
instruction, or pushed onto the stack
via PUSH
. The EIP/RIP register can be copied using the LEA
instruction, and restored via JMP
. The flag register can be pushed
onto the stack via PUSHFD
/PUSHFQ
, and afterwards popped from it
via POPFD
/POPFQ
.
The listing below demonstrates a program that grabs the values of all amd64 GPRs at an arbitrary point during the execution and prints them after returning from assembly.
(standalone example source, skip listing)
#include <stdio.h>
#include <stdint.h>
enum {
R_RAX, R_RBX, R_RCX, R_RDX, R_RSI, R_RDI, R_RBP, R_RSP,
R_R8, R_R9, R_R10, R_R11, R_R12, R_R13, R_R14, R_R15,
R_RIP, R_RFLAGS,
R_LENGTH
};
enum {
S_CS, S_DS, S_ES, S_FS, S_GS, S_SS,
S_LENGTH
};
int main()
{
uint64_t gpr[R_LENGTH];
uint16_t seg[S_LENGTH];
asm volatile (
/* fill registers with random data */
"mov $0x0102030405060708, %%rax\n\t"
"mov $0x1112131415161718, %%rbx\n\t"
"mov $0x2122232425262728, %%rcx\n\t"
"mov $0x3132333435363738, %%rdx\n\t"
"mov $0x4142434445464748, %%rsi\n\t"
"mov $0x5152535455565758, %%rdi\n\t"
/* RBP is used for frame pointer, RSP is stack pointer */
"mov $0x8182838485868788, %%r8\n\t"
"mov $0x9192939495969798, %%r9\n\t"
"mov $0xa1a2a3a4a5a6a7a8, %%r10\n\t"
"mov $0xb1b2b3b4b5b6b7b8, %%r11\n\t"
"mov $0xc1c2c3c4c5c6c7c8, %%r12\n\t"
"mov $0xd1d2d3d4d5d6d7d8, %%r13\n\t"
"mov $0xe1e2e3e4e5e6e7e8, %%r14\n\t"
"mov $0xf1f2f3f4f5f6f7f8, %%r15\n\t"
/* dump GPRs */
"mov %%rax, %[rax]\n\t"
"mov %%rbx, %[rbx]\n\t"
"mov %%rcx, %[rcx]\n\t"
"mov %%rdx, %[rdx]\n\t"
"mov %%rsi, %[rsi]\n\t"
"mov %%rdi, %[rdi]\n\t"
"mov %%rbp, %[rbp]\n\t"
"mov %%rsp, %[rsp]\n\t"
"mov %%r8, %[r8]\n\t"
"mov %%r9, %[r9]\n\t"
"mov %%r10, %[r10]\n\t"
"mov %%r11, %[r11]\n\t"
"mov %%r12, %[r12]\n\t"
"mov %%r13, %[r13]\n\t"
"mov %%r14, %[r14]\n\t"
"mov %%r15, %[r15]\n\t"
/* dump RIP */
"lea (%%rip), %%rbx\n\t"
"mov %%rbx, %[rip]\n\t"
"mov %[rbx], %%rbx\n\t"
/* dump segment registers */
"mov %%cs, %[cs]\n\t"
"mov %%ds, %[ds]\n\t"
"mov %%es, %[es]\n\t"
"mov %%fs, %[fs]\n\t"
"mov %%gs, %[gs]\n\t"
"mov %%ss, %[ss]\n\t"
/* dump RFLAGS */
"pushfq\n\t"
"popq %[rflags]\n\t"
: [rax] "=m"(gpr[R_RAX]), [rbx] "=m"(gpr[R_RBX]),
[rcx] "=m"(gpr[R_RCX]), [rdx] "=m"(gpr[R_RDX]),
[rsi] "=m"(gpr[R_RSI]), [rdi] "=m"(gpr[R_RDI]),
[rbp] "=m"(gpr[R_RBP]), [rsp] "=m"(gpr[R_RSP]),
[r8] "=m"(gpr[ R_R8]), [ r9] "=m"(gpr[ R_R9]),
[r10] "=m"(gpr[R_R10]), [r11] "=m"(gpr[R_R11]),
[r12] "=m"(gpr[R_R12]), [r13] "=m"(gpr[R_R13]),
[r14] "=m"(gpr[R_R14]), [r15] "=m"(gpr[R_R15]),
[rip] "=m"(gpr[R_RIP]), [rflags] "=m"(gpr[R_RFLAGS]),
[cs] "=m"(seg[S_CS]), [ds] "=m"(seg[S_DS]),
[es] "=m"(seg[S_ES]), [fs] "=m"(seg[S_FS]),
[gs] "=m"(seg[S_GS]), [ss] "=m"(seg[S_SS])
:
: "%rax", "%rbx", "%rcx", "%rdx", "%rsi", "%rdi",
"%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
"memory"
);
printf("rax = 0x%016lx\n", gpr[R_RAX]);
printf("rbx = 0x%016lx\n", gpr[R_RBX]);
printf("rcx = 0x%016lx\n", gpr[R_RCX]);
printf("rdx = 0x%016lx\n", gpr[R_RDX]);
printf("rsi = 0x%016lx\n", gpr[R_RSI]);
printf("rdi = 0x%016lx\n", gpr[R_RDI]);
printf("rbp = 0x%016lx\n", gpr[R_RBP]);
printf("rsp = 0x%016lx\n", gpr[R_RSP]);
printf(" r8 = 0x%016lx\n", gpr[R_R8]);
printf(" r9 = 0x%016lx\n", gpr[R_R9]);
printf("r10 = 0x%016lx\n", gpr[R_R10]);
printf("r11 = 0x%016lx\n", gpr[R_R11]);
printf("r12 = 0x%016lx\n", gpr[R_R12]);
printf("r13 = 0x%016lx\n", gpr[R_R13]);
printf("r14 = 0x%016lx\n", gpr[R_R14]);
printf("r15 = 0x%016lx\n", gpr[R_R15]);
printf("rip = 0x%016lx\n", gpr[R_RIP]);
printf("cs = 0x%04x\n", seg[S_CS]);
printf("ds = 0x%04x\n", seg[S_DS]);
printf("es = 0x%04x\n", seg[S_ES]);
printf("fs = 0x%04x\n", seg[S_FS]);
printf("gs = 0x%04x\n", seg[S_GS]);
printf("ss = 0x%04x\n", seg[S_SS]);
printf("rflags = 0x%016lx\n", gpr[R_RFLAGS]);
return 0;
}
The GPR ptrace(2) API
Both FreeBSD and NetBSD use the PT_GETREGS
request to get the values
of GPRs from the program, and PT_SETREGS
to update them.
The requests take a pointer to struct reg
as an argument.
On FreeBSD, both i386 and amd64 have the individual registers listed as fields of the struct. On NetBSD, i386 uses a regular structure, while amd64 puts all values into an array whose indices are defined in the headers as constants.
The listing below compares the structures used on FreeBSD and NetBSD.
Note that NetBSD/amd64 uses a special macro. For example,
greg(rdi RDI, 0)
defines _REG_RDI
.
(FreeBSD structs, NetBSD/i386 struct, NetBSD/amd64 struct, NetBSD/amd64 register names, skip listing)
/* FreeBSD/i386 */ /* NetBSD/i386 */
struct __reg32 { struct reg {
__uint32_t r_fs; int r_eax;
__uint32_t r_es; int r_ecx;
__uint32_t r_ds; int r_edx;
__uint32_t r_edi; int r_ebx;
__uint32_t r_esi; int r_esp;
__uint32_t r_ebp; int r_ebp;
__uint32_t r_isp; int r_esi;
__uint32_t r_ebx; int r_edi;
__uint32_t r_edx; int r_eip;
__uint32_t r_ecx; int r_eflags;
__uint32_t r_eax; int r_cs;
__uint32_t r_trapno; int r_ss;
__uint32_t r_err; int r_ds;
__uint32_t r_eip; int r_es;
__uint32_t r_cs; int r_fs;
__uint32_t r_eflags; int r_gs;
__uint32_t r_esp; };
__uint32_t r_ss;
__uint32_t r_gs;
};
/* FreeBSD/amd64 */ /* NetBSD/amd64 */
struct __reg64 { #define _FRAME_REG(greg, freg) \
__int64_t r_r15; greg(rdi, RDI, 0) \
__int64_t r_r14; greg(rsi, RSI, 1) \
__int64_t r_r13; greg(rdx, RDX, 2) \
__int64_t r_r12; greg(r10, R10, 6) \
__int64_t r_r11; greg(r8, R8, 4) \
__int64_t r_r10; greg(r9, R9, 5) \
__int64_t r_r9; /* ... */ \
__int64_t r_r8; greg(rcx, RCX, 3) \
__int64_t r_rdi; greg(r11, R11, 7) \
__int64_t r_rsi; greg(r12, R12, 8) \
__int64_t r_rbp; greg(r13, R13, 9) \
__int64_t r_rbx; greg(r14, R14, 10) \
__int64_t r_rdx; greg(r15, R15, 11) \
__int64_t r_rcx; greg(rbp, RBP, 12) \
__int64_t r_rax; greg(rbx, RBX, 13) \
__uint32_t r_trapno; greg(rax, RAX, 14) \
__uint16_t r_fs; greg(gs, GS, 15) \
__uint16_t r_gs; greg(fs, FS, 16) \
__uint32_t r_err; greg(es, ES, 17) \
__uint16_t r_es; greg(ds, DS, 18) \
__uint16_t r_ds; greg(trapno, TRAPNO, 19) \
__int64_t r_rip; greg(err, ERR, 20) \
__int64_t r_cs; greg(rip, RIP, 21) \
__int64_t r_rflags; greg(cs, CS, 22) \
__int64_t r_rsp; greg(rflags, RFLAGS, 23) \
__int64_t r_ss; greg(rsp, RSP, 24) \
}; greg(ss, SS, 25)
struct reg {
long regs[_NGREG];
};
Floating-Point Registers
Dumping via FSAVE and FXSAVE
Floating-Point Registers is the term used to indicate registers whose
primary purpose was handling floating-point numbers. The traditional
separation between General Purpose Registers and Floating-Point
Registers is reflected in the ptrace(2)
CPU-specific calls that
allow setting or getting either the GPR or FPU registers in a single
operation. Some CPU architectures include additional sets of registers,
e.g. x86 exposes the Debug Registers separately.
There are a few architectures that do not use the Floating Point Unit setters and getters, as they do not feature a hardware-assisted FPU (this is often the case in low-power embedded devices).
The earliest FPRs on x86 were the x87 registers, including 8 80-bit extended precision number registers ST(i) and a few control registers.
The contents of these registers can be dumped using FSAVE
instruction,
and restored using FRSTOR instruction. The instruction takes a pointer
to a 108-byte memory buffer, stores the current values of control
registers and ST(i) registers and resets the FPU.
The FSAVE
mnemonic implicitly inserts an additional FWAIT
instruction
that ensures that the FPU completes handling the previous operation.
If you wish to capture the FPU state in the middle of exception handling,
FNSAVE
should be used instead as it captures the immediate FPU state
without waiting.
The 64-bit MMi registers introduced as part of the MMX instruction set
overlap with ST(i) registers. As a result, no new dumping instruction
is necessary, and if the MMi registers are used, they are dumped
as part of ST(i) in FSAVE
.
64 | 48 | 32 | 16 | 0 | bits |
---|---|---|---|---|---|
unused | FSW | unused | FCW | -16 | |
FCS | FIP | unused | FTW | 64 | |
unused | FDS | FDP | FOP | 144 | |
ST(0) / MM0 | 224 | ||||
ST(1) / MM1 | 304 | ||||
ST(2) / MM2 | 384 | ||||
ST(3) / MM3 | 464 | ||||
ST(4) / MM4 | 544 | ||||
ST(5) / MM5 | 624 | ||||
ST(6) / MM6 | 704 | ||||
ST(7) / MM7 | 784 |
The SSE register set introduced 8 new 128-bit registers XMMi
and a control MXCSR register. Along with them, a new dumping function
FXSAVE
and its restoring counterpart FXRSTOR were introduced. They use
a 512-byte memory buffer aligned on a 16-byte boundary, with a different
layout than FSAVE
.
The obvious difference between FSAVE
and FXSAVE
is that the latter saves
SSE registers. On i386, the registers XMM0..XMM7 are stored,
and the remaining part of the buffer is left reserved/unused. On amd64,
a major part of the reserved space is used to store XMM8..XMM15.
The other difference that is frequently missed is that the FTW status
register is stored by FSAVE
in its entire form, while by FXSAVE
in its
abridged form. The former indicates what kind of value every ST(i)
register contains — empty, zero, normalized number and special.
The latter only indicates whether the register is empty or not.
To access the registers introduced by further processor extensions
such as AVX, XSAVE
instruction needs to be used. Unlike these
previously described here, it has been designed to be extensible.
XSAVE
is a wide topic, and it will be the subject of the second part
of this article.
FXSAVE vs FXSAVE64
The traditional variant of FXSAVE
/FXRSTOR
instruction stores the FIP
(instruction causing an exception) and FDP (its operand) pointers
as pairs of 16-bit segment registers (FCS, FDS, respectively) and 32-bit
address registers (FIP, FDP). This is a problem for amd64 programs
since the original 64-bit pointer is truncated to 32 bits.
To resolve this, the additional mnemonics FXSAVE64
/FXRSTOR64
are provided. They prepend a REX.W=1
prefix to the respective
instruction, changing the FIP and FDP fields to use a 64-bit pointer
instead. Their drawback is that the segment is no longer reported;
however, newer amd64 processors no longer support FCS/FDS anyway.
112 | 96 | 80 | 64 | 48 | 32 | 16 | 0 | bits | |
---|---|---|---|---|---|---|---|---|---|
rs. | FCS | FIP | FOP | rs. | FTW (abr.) | FSW | FCW | 0 | |
FIP(FXSAVE64) | |||||||||
MXCSR_MASK | MXCSR | rs. | FDS | FDP | 128 | ||||
FDP(FXSAVE64) | |||||||||
reserved | ST(0) / MM0 | 256 | |||||||
reserved | ST(1) / MM1 | 384 | |||||||
⋮ | ⋮ | ⋮ | |||||||
reserved | ST(7) / MM7 | 1152 | |||||||
XMM0 | 1280 | ||||||||
XMM1 | 1408 | ||||||||
⋮ | ⋮ | ||||||||
XMM7 | 2176 | ||||||||
XMM8 (amd64) | 2304 | ||||||||
XMM9 | 2432 | ||||||||
⋮ | ⋮ | ||||||||
XMM15 | 3200 | ||||||||
reserved | 3328 | ||||||||
reserved | 3456 | ||||||||
reserved | 3584 | ||||||||
unused | 3712 | ||||||||
unused | 3840 | ||||||||
unused | 3968 |
The ptrace(2) API
Both FreeBSD and NetBSD share roughly the same API for getting baseline
floating-point registers. The baseline ptrace requests are
PT_GETFPREGS
and PT_SETFPREGS
, both filling in a struct fpreg
.
While the visible fields of this struct differ between FreeBSD and NetBSD,
their underlying layout is the same.
For historical reasons, struct fpreg
on i386 follows the FSAVE
layout. This has two important implications. Firstly, it does not
include the SSE registers. Secondly, it includes a full FPU Tag Word
(FTW) register. The latter implies that a kernel using FXSAVE
or a newer instruction internally needs to reconstruct the full value.
Until a few days ago, the reconstruction performed by both FreeBSD and NetBSD kernels was incomplete — all non-empty registers were represented as normalized values regardless of what their actual value was. I have fixed it in the FreeBSD and NetBSD kernels recently. You can read more about the problem in the FreeBSD Remote plugin report.
Resolving the lack of SSE registers required another pair of requests
— PT_GETXMMREGS
and PT_SETXMMREGS
. Both use a struct whose
underlying layout matches FXSAVE
. They are implemented
as machine-dependent requests (while other listed requests are common
to all architectures, even if not actually used). They are available
to compat32 programs (e.g. a 32-bit debugger used on a 64-bit system)
in the upcoming NetBSD 10 release. You can read more about that
in the XSAVE and compat32 kernel work report.
On amd64, PT_GETFPREGS
and PT_SETFPREGS
both use the FXSAVE
-based
structure.
Debug Registers
The i386 architecture includes 8 Debug Registers, while amd64 has 16 of them. In reality, only a subset of these registers is available, namely DR0 through DR3, DR6 and DR7. DR0 through DR3 are used to specify the memory addresses for breakpoints or watchpoints, DR6 is used as status register, DR7 as control register. The remaining DRs are reserved.
reg. | purpose |
---|---|
DR0 | bp./wp. #0 address |
DR1 | bp./wp. #1 address |
DR2 | bp./wp. #2 address |
DR3 | bp./wp. #3 address |
DR4 | reserved (obsolete alias to DR6) |
DR5 | reserved (obsolete alias to DR7) |
DR6 | debug status register |
DR7 | debug control register |
DR8 ⋮ DR15 |
reserved (amd64 only) |
The Debug Registers can be copied using MOV
but only at privilege
level 0 (thus, inside the kernel). The access to them is exposed to
the debugger via PT_GETDBREGS
and PT_SETDBREGS
that take a
struct dbreg
argument. The structure is the same on FreeBSD
and NetBSD.
An example code
The following listing demonstrates a NetBSD program that reads
General-Purpose Registers and Floating-Point Registers from a child
via ptrace(2)
, prints some of the registers and then writes modified
GPR back.
(standalone example source, skip listing)
#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <machine/reg.h>
#include <assert.h>
#include <inttypes.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int ret;
pid_t pid = fork();
assert(pid != -1);
if (pid == 0) {
uint64_t rax = 0x0001020304050607;
printf("RAX in child before trap: 0x%016" PRIx64 "\n", rax);
/* child -- debugged program */
/* request tracing */
ret = ptrace(PT_TRACE_ME, 0, NULL, 0);
assert(ret != -1);
__asm__ __volatile__ (
"finit\n\t"
"fldz\n\t"
"fld1\n\t"
"int3\n\t"
: "+a"(rax)
:
: "st"
);
printf("RAX in child after trap: 0x%016" PRIx64 "\n", rax);
_exit(0);
}
/* parent -- the debugger */
/* wait for the child to become ready for tracing */
pid_t waited = waitpid(pid, &ret, 0);
assert(waited == pid);
assert(WIFSTOPPED(ret));
assert(WSTOPSIG(ret) == SIGTRAP);
struct reg gpr;
struct fpreg fpr;
/* get GPRs and FPRs */
ret = ptrace(PT_GETREGS, pid, &gpr, 0);
assert (ret == 0);
ret = ptrace(PT_GETFPREGS, pid, &fpr, 0);
assert (ret == 0);
printf("RAX from PT_GETREGS: 0x%016" PRIx64 "\n",
gpr.regs[_REG_RAX]);
printf("ST(0) (raw) from PT_GETFPREGS: 0x%04" PRIx16
"%016" PRIx64 "\n",
fpr.fxstate.fx_87_ac[0].r.f87_exp_sign,
fpr.fxstate.fx_87_ac[0].r.f87_mantissa);
printf("ST(1) (raw) from PT_GETFPREGS: 0x%04" PRIx16
"%016" PRIx64 "\n",
fpr.fxstate.fx_87_ac[1].r.f87_exp_sign,
fpr.fxstate.fx_87_ac[1].r.f87_mantissa);
gpr.regs[_REG_RAX] = 0x0f0e0d0c0b0a0908;
printf("RAX set via PT_SETREGS: 0x%016" PRIx64 "\n",
gpr.regs[_REG_RAX]);
/* set GPRs and resume the program */
ret = ptrace(PT_SETREGS, pid, &gpr, 0);
assert (ret == 0);
ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
assert(ret == 0);
/* wait for the child to exit */
waited = waitpid(pid, &ret, 0);
assert(waited == pid);
assert(WIFEXITED(ret));
assert(WEXITSTATUS(ret) == 0);
return 0;
}
When compiled and run on NetBSD, the program outputs:
RAX in child before trap: 0x0001020304050607
RAX from PT_GETREGS: 0x0001020304050607
ST(0) (raw) from PT_GETFPREGS: 0x3fff8000000000000000
ST(1) (raw) from PT_GETFPREGS: 0x00000000000000000000
RAX set via PT_SETREGS: 0x0f0e0d0c0b0a0908
RAX in child after trap: 0x0f0e0d0c0b0a0908
Summary
Saving and restoring registers plays a crucial part in context switching. The General Purpose Registers — can normally be copied directly, or with some trivial tricks. On the other hand, the x87 Floating-Point Registers are not directly accessible and are dumped using dedicated instructions.
The two base instructions for dumping FPRs are FSAVE
and FXSAVE
. FSAVE
stores x87 register dump into a 108-byte memory area, while FXSAVE
stores x87 and SSE state into a 512-byte memory area. Both of these
instructions also implicitly include MMX state since MMi registers
overlap with ST(i).
Debuggers can access the stored registers of interrupted debugged
processes via ptrace(2)
API. The vast majority of targets both
on NetBSD and FreeBSD define PT_GETREGS
and PT_SETREGS
to work
with General Purpose Registers, and PT_GETFPREGS
and PT_SETFPREGS
to work with Floating-Point Registers.
For historical reasons, the two latter requests on i386 use
the limited historical FSAVE
layout. This deficiency is amended
via an additional pair of PT_GETXMMREGS
and PT_SETXMMREGS
.
Request | Data type | Register group |
---|---|---|
PT_GETREGS |
struct reg |
General Purpose Registers |
PT_GETFPREGS |
struct fpreg |
Floating-Point Registers
(FSAVE on i386,FXSAVE on amd64) |
PT_GETXMMREGS |
struct xmmregs |
Floating-Point Registers (i386 only, FXSAVE ) |
PT_GETDBREGS |
struct dbreg |
Debug Registers (x86 only) |
Just as FXSAVE
is insufficient to dump all the registers on the modern
(e.g. AVX-enabled) CPUs, the aforementioned calls do not provide the ability
to expose more values. The newer x86 CPUs introduce the XSAVE
family
of instructions providing a forward-extensible dump format,
and the Operating Systems have followed by providing a future-extensible
API to work with these dumps. This will be the topic of the second part
of this article.