|=---------------=[ Linux Assembly and Disassembly the Basics ]=--------------=|
|=----------------------------------------------------------------------------=|
|=-------------------------=[ lhall@telegenetic.net ]=------------------------=|
---[ Introduction to as, ld and writing your own asm from scratch.
First off you have to know what a system call is. A system call, or software
interrupt is the mechanism used by an application program to request a service
from the operating system. System calls often use a special machine code instruction
which cause the processor to change mode or context (e.g. from "user more" to
"supervisor mode" or "protected mode"). This switch is known as a context switch,
for obvious reasons. A context is the protection and access mode that a piece of
code is executing in, its determined by a hardware mediated flag. If you have ever
heard of people talking about ring zero or cr0 they are referring to code that
executes at protected or supervisor mode such as all kernel code. A context switch
allows the OS to perform restricted actions such as accessing hardware devices or
the memory management unit. Generally, operating systems provide a library that sits
between normal programs and the rest of the operating system, usually the C library
(libc), such as Glibc, or the Windows API. This library handles the low-level details of
passing information to the kernel and switching to supervisor mode. These API's give
you access functions that make your job easier, for instance printf to print a formatted
string or the *alloc family to get more memory.
In linux the system calls are defined in the file /usr/include/asm/unistd.h.
entropy@phalaris entropy $ cat /usr/include/asm/unistd.h
#ifndef _ASM_I386_UNISTD_H_
#define _ASM_I386_UNISTD_H_
/*
* This file contains the system call numbers.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
[...snip...]
Each system call is shown as the system call name preceded by __NR_ and then
followed by the system call number. The system call number is very important
for writing asm programs that don't use gcc, a compiler or libc. The system call
and fault low-level handling routines are contained in the file
/usr/src/linux/arch/i386/kernel/entry.S although this is over our head for now.
The text that you type for the instructions of the program is known as the source
code. In order to transform source code into a executable program you must assemble
and link it. These steps are done for you by a compiler, but we will do them seperatly.
Assembling is the process that transforms your source code into instructions for the
machine. The machine itself only reads numbers but humans work much better with words.
An assembly language is a human readable form of machine code. The linux assemblers
name is `as`, you can type `as -h` to see its arguments. `as` generates and object file
out of a source file. An object file is machine code that has not been fully put
together yet. Object files contain compact, pre-parsed code, often called binaries,
that can be linked with other object files to generate a final executable or code
library. An object file is mostly machine code. The linker is the program responsible
for putting all the object files together and adding information so the kernel knows
how to load and run it. `ld` is the name of the linker on linux.
So to summarize, source code must be assembled and linked in order to produce and
executable program. On linux x86 this is accomplished with
as source.s -o object.o
ld object.o -o executable
Where "source.s" is your assembly code, "object.o" is the object file produced
from `as` and output (-o), and "executable" is the final executable produced when
the object file has been linked.
In the last tutorial we used gcc to generate the asm and then to compile the
program. When we called write we pushed the length of the string, the address
of the string, and the file descriptor onto the stack and then issued the instruction
"call write". This needs some explanation because how we do it now is totally
different. That way, pushing values onto the stack, is because we were using C
(hello.c) and hence gcc generates C code which uses the C calling convention. A
calling convention is the way that variables are stored and the parameters and
return values are transfered, C takes its parameters and passes its return variables
in a stack frame (eg pushl $14, pushl $.LC0, pushl $1, call write). A stack frame is
a piece of the stack that holds all the info needed to call a function.
So when we issued the "call write" instruction we were using the C Library (libc),
and the write there was really the system call write, same name, but wrapped in
libc (eg. getpid() is a wrapper for syscall(SYS_get_pid)). Now when we write our
own asm for now we will not be using libc, even though that was is easier its not
always possible to use and its good to know whats happening on a lower level.
Here's our first program.
entropy@phalaris asm $ cat hello.s
.section .data
hello:
.ascii "Hello, World!\n\0"
.section .text
.globl _start
_start:
movl $4, %eax
movl $14, %edx
movl $hello, %ecx
movl $1, %ebx
int $0x80
movl $1, %eax
movl $0, %ebx
int $0x80
entropy@phalaris asm $ as hello.s -o hello.o
entropy@phalaris asm $ ld hello.o -o hello
entropy@phalaris asm $ ./hello
Hello, World!
Same output as before and accomplished the same thing but done very differently.
.section .data
Starts the section .data where all our data goes. We could just as easily have
done .section .rodata like what gcc generated in the intro and then the string
would have been read only but its much more common to put initialized data into
the .data section. .rodata section is more like we wanted to do a #define hello
"Hello, World\n" in C, in the .data section its more similar to
char hello[] = "Hello, World\n".
hello:
The label hello, which remember is a symbol (a symbol being a string representation
for an address) followed by a colon. A label says, when you assemble, take the
next instruction or data following the colon and and make that the labels value.
.ascii "Hello, World!\n\0"
And here is what the value of the label hello: is going to be, the label hello is going
to point to the first character of the string (.ascii defines a string) "Hello, World!\n\0".
.section .text
Here we start our code section.
.globl _start
`as` expects _start while `gcc` expects main to be the starting function of an
executable. Again .globl tells the assembler that it shouldn't get rid of the
symbol after assembly because the linker needs it.
_start:
_start is a symbol that is going to be replaced by an address during either
assembly or linking. _start here is where our program will start to execute when
loaded by the kernel.
movl $4, %eax
When calling a system call the system call number you want to call is put into the
register eax. As we saw above in the file /usr/include/asm/unistd.h, the write
system call was defined as "#define __NR_write 4". So here we are moving
the immediate value 4 into eax, so when we call the kernel to do its work it will
know we want write.
movl $14, %edx
movl $hello, %ecx
movl $1, %ebx
The write system call is expecting three arguments namely, the file descriptor to
write to, the address of the string to write, and the length of the string to write.
When calling system calls, function arguments are passed in registers, which differs
from the C Library or libc convention which expects function arguments to be pushed
onto the stack. So we have the system call number goes into eax, the first argument
goes into ebx, the third into ecx, the fourth into edx. There can be up to six
arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments,
they are simply passed though the structure as first argument. So we fill in the
registers that write needs to do its job, we move 1 which is STDOUT into ebx, we put
the label hello's value (which is the address of the string "Hello, World!\n\0")
into ecx, and we put the length of the string 14, into edx.
int $0x80
This instruction int(errupts) the kernel($0x80) and asks it to do the system
function whos index is in eax. An interrupt interrupts the programs flow and
asks the kernel to do something for us. The kernel will then preform the system
function and then return control to our program. Before the interrupt we were
executing in a user mode context, during the system call we were executing in
a protected mode context, and when the kernel is done and returns control to
our program we are again executing in a user mode context. So the kernel reads
eax does a write of our string and returns.
movl $1, %eax
Now were done and we need to exit, so what number do we use to
execute exit? Look back at unistd.h and we see that exit is
"#define __NR_exit 1".
movl $0, %ebx
exit expects one argument namely the return code (0 means no errors),
so we put that into ebx.
int $0x80
Call the kernel to execute exit with return code 0 and were done.
Onto the disassembly.
Compile with debugging symbols, `as` uses the same -g or -gstabs that `gcc` does.
entropy@phalaris asm $ as -g hello.s -o hello.o
And link it.
entropy@phalaris asm $ ld hello.o -o hello
Start gdb.
entropy@phalaris asm $ gdb hello
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License,
and you arewelcome to change it and/or distribute copies of it under
certain conditions. Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library
"/lib/libthread_db.so.1".
Set a breakpoint at the address of _start so we can step through it.
(gdb) break *_start
Breakpoint 1 at 0x8048094: file hello.s, line 7.
(gdb) run
Starting program: /home/entropy/asm/hello
Hello, World!
Program exited normally.
Current language: auto; currently asm
(gdb)
The breakpoint didn't work, Im not sure why this happens but we can do a quick
fix. Here is the fixed asm.
entropy@phalaris asm $ cat hello.s
.section .data
hello:
.ascii "Hello, World!\n\0"
.section .text
.globl _start
_start:
nop
movl $4, %eax
movl $14, %edx
movl $hello, %ecx
movl $1, %ebx
int $0x80
movl $1, %eax
movl $0, %ebx
int $0x80
The only difference is the nop or no operation right after _start.
Now we can set our breakpoint and it will work. Reassemble and link.
entropy@phalaris asm $ as -g hello.s -o hello.o
entropy@phalaris asm $ ld hello.o -o hello
Start gdb.
entropy@phalaris asm $ gdb hello
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License,
and you arewelcome to change it and/or distribute copies of it under
certain conditions. Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library
"/lib/libthread_db.so.1".
List our assembly. I put in comments so it would be easier to follow.
Breakpoint 1 at 0x8048095: file hello.s, line 8.
(gdb) list _start
2 hello: # label hello, address of the first char
3 .ascii "Hello, World!\n\0" # .ascii defines a string
4 .section .text # our code start
5 .globl _start # the start symbol defined as .globl
6 _start: # the start label
7 nop # no operation for debugging with gdb
8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len)
9 movl $14, %edx # 14 is the length of our string
10 movl $hello, %ecx # the address of our string
11 movl $1, %ebx # 1 is STDOUT, to the screen
(gdb) <hit enter>
12 int $0x80 # call the kernel
13 movl $1, %eax # move 1 into %eax, 1 is syscall exit()
14 movl $0, %ebx # move 0 into %ebx, exit's return value
15 int $0x80 # call kernel
Set a break point at our nop.
(gdb) break *_start+1
Breakpoint 1 at 0x8048095: file hello.s, line 8.
And run it.
(gdb) run
Starting program: /home/entropy/asm/hello
Breakpoint 1, _start () at hello.s:8
8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len)
Current language: auto; currently asm
Now the breakpoint works.
(gdb) step
_start () at hello.s:9
9 movl $14, %edx # length for write 14 is the length of our string
(gdb) step
_start () at hello.s:10
10 movl $hello, %ecx # the address of our string
(gdb) step
_start () at hello.s:11
11 movl $1, %ebx # 1 is STDOUT, to the screen
(gdb) step
_start () at hello.s:12
12 int $0x80 # call the kernel
Check the registers to see if they have the correct information in them.
(gdb) print $edx
$1 = 14
(gdb) x/s $ecx
0x80490b8 <hello>: "Hello, World!\n"
(gdb) print $ebx
$2 = 1
(gdb) print $eax
$3 = 4
(gdb)
Looks good so let the kernel do its work.
(gdb) step
Hello, World!
_start () at hello.s:13
13 movl $1, %eax # move 1 into %eax, 1 is syscall exit()
It has executed the write system call, you can see the printed string and
returned to gdb. Now we call exit.
(gdb) step
_start () at hello.s:14
14 movl $0, %ebx # move 0 into %ebx, exit's return value
(gdb) step
_start () at hello.s:15
15 int $0x80 # call kernel
(gdb) step
Program exited normally.
And its done.
(gdb) q
entropy@phalaris asm $
Check out the objdump output.
entropy@phalaris asm $ objdump -d hello
hello: file format elf32-i386
Disassembly of section .text:
08048094 <_start>:
8048094: 90 nop
8048095: b8 04 00 00 00 mov $0x4,%eax
804809a: ba 0e 00 00 00 mov $0xe,%edx
804809f: b9 b8 90 04 08 mov $0x80490b8,%ecx
80480a4: bb 01 00 00 00 mov $0x1,%ebx
80480a9: cd 80 int $0x80
80480ab: b8 01 00 00 00 mov $0x1,%eax
80480b0: bb 00 00 00 00 mov $0x0,%ebx
80480b5: cd 80 int $0x80
entropy@phalaris asm $
Notice the difference from the last tutorial objdump output, no snipping of
tons of lines of extra sections and such, its only the code we coded in there
which is so much cleaner. Notice how easy it would be to take the opcodes and
make some tiny shell code out of, again without the nulls. What? You want some shellcode
to print "Hello, World!\n" for your next 0day? Next time my friend, next time.
# milw0rm.com [2006-04-08]