|=------------=[ Linux Assembly and Disassembly an Introduction ]=------------=|
|=----------------------------------------------------------------------------=|
|=-------------------------=[ lhall@telegenetic.net ]=------------------------=|
----[ Introduction to gcc, gdb and objdump.
Here we start with a very simple c program. All it does is
write() the fourteen character string "Hello, World!\n" to STDOUT,
which is file descriptor 1. Normally with no type of redirection
going on STDOUT is your monitor - your standard output. If we `man 2 write`,
section 2 of the man pages are for system calls, we can see that writes prototype
is:
ssize_t write(int fd, const void *buf, size_t count);
So write returns a ssize_t, its first argument is a file descriptor to
write to, its second argument is a pointer or address of a buffer, and
its third argument is how many bytes for the starting address of the
buffer to write.
entropy@phalaris asm $ cat hello.c
main() {
write (1,"Hello, World!\n", 14);
return 0;
}
---[ gcc
Compile the program.
entropy@phalaris asm $ gcc hello.c
gcc produces a file called a.out if you dont specify a filename it should
output too. a.out stands for assembler output and is the default name for
executable output from many compilers, especially UNIX ones. a.out is also
an old object file format for executables.
entropy@phalaris asm $ ./a.out
Hello, World!
Specify the executable file name gcc should output to with -o.
entropy@phalaris asm $ gcc hello.c -o hello
entropy@phalaris asm $ ./hello
Hello, World!
Use the -S switch to generate the assembly.
entropy@phalaris asm $ gcc hello.c -S -o hello.s
We output with -o to the file hello.s, the -S will generate at&t
assembly from our code, the same assembly that gcc would use when
it calls as and ld.
entropy@phalaris asm $ cat hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello, World!\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
subl %eax, %esp
subl $4, %esp
pushl $14
pushl $.LC0
pushl $1
call write
addl $16, %esp
movl $0, %eax
leave
ret
.size main, .-main
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"
Everything that starts with a "." like ".file" is a directive - it
directs the assembler or compiler to do something instead of it
itself doing something.
.file "hello.c"
This directive tells the compiler the name of the source file.
.section .rodata
This starts the section .rodata, which stands for read only data.
Sections break up your programs into pieces that are easier to manage,
and segmentation helps keep things in their place. Some other sections
are .data which is initialized data; .bss which is uninitialized data and
.text which is your program code.
.LC0:
This is a text label for an address, also called a symbol, so we dont have
to refer to hex digits to access an address. Labels tell the assembler
to make the symbols value be what ever the next instruction or data element
is. A label is a symbol followed by a colon. It does _not_ have to start
with a ".".
.string "Hello, World!\n"
This is data of type string, the labels value is the address of the "H" at the
beginning or the string.
.text
This begins the .text section, which is where out code begins. Dont ask me why
it dosent say .section before it.
.globl main
main is a symbol that is going to be replaced by an address during either
assembly or linking. Symbols are used to mark locations of addresses, such
as addresses of data or addresses of function pointers. .globl tells the
assembler that it shouldnt get rid of the symbol after assembly because
the linker needs it. main is the symbol where the program starts.
.type main, @function
main's type is a function.
main:
The main label which tells the kernel where to start executing your program.
pushl %ebp
movl %esp, %ebp
This is called a procedure prolog, it sets up stuff thats needed so
functions dont corrupt data or the stack.
subl $8, %esp
andl $-16, %esp
movl $0, %eax
subl %eax, %esp
subl $4, %esp
This is all unimportant for now.
pushl $14
pushl $.LC0
pushl $1
call write
This is our system call to write(), it pushl's each of the functions
arguments onto the stack. Notice it pushes from right to left while
the function call looks like write(1, STRING, 14); So we push immediate
value $14 or 14 which is the length of our string, the value of the
label (which is the address of our string "Hello, World!\n"), and 1
for STDOUT, then it calls the write system call to write it out.
addl $16, %esp
Clean up the stack, since we pushed three things for write and
one to make the base pointer the stack pointer thats 4*4(bytes).
movl $0, %eax
movl the immediate value $0 into the register eax. eax is used
for the return value of programs and functions.
leave
ret
Leave the function, ret means return control of the cpu to who
ever called the program because the program is done doing what
it was written too.
.size main, .-main
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"
Extra stuff, like the size of the program, the .note section and
an identification string to show what version of gcc and which
linux it was compiled on.
Get rid of all the old compiled programs.
entropy@phalaris asm $ rm a.out hello
Compile the assembly file.
entropy@phalaris asm $ gcc hello.s
And execute.
entropy@phalaris asm $ ./a.out
Hello, World!
Compile and execute with a output -o.
entropy@phalaris asm $ gcc hello.s -o hello
entropy@phalaris asm $ ./hello
Hello, World!
Fun times.
Now for the disassembly.
Compile with debugging symbols.
entropy@phalaris asm $ gcc -gstabs hello.s -o hello
---[ gdb
Start gdb. Gentoo has something going on that I havent yet figured out,
it prints this
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Yours may or may not do this, it dosent effect us for now.
entropy@phalaris asm $ gdb hello
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".
Here we list the source of the file we are debugging. We can do this because
we compiled with the -g or -gstabs for debugging information to be included.
(gdb) list main
4 .string "Hello, World!\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushl %ebp
10 movl %esp, %ebp
11 subl $8, %esp
12 andl $-16, %esp
13 movl $0, %eax
(gdb) <hit enter>
14 subl %eax, %esp
15 subl $4, %esp
16 pushl $14
17 pushl $.LC0
18 pushl $1
19 call write
20 addl $16, %esp
21 movl $0, %eax
22 leave
23 ret
(gdb) <hit enter>
24 .size main, .-main
25 .section .note.GNU-stack,"",@progbits
26 .ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"
Now set a breakpoint at the address of the symbol main. The "*" before
main says its an address not an immediate value. A breakpoint is where
execution of the program will stop when it is hit, so this is saying
execute until you hit the address of main.
(gdb) break *main
Breakpoint 1 at 0x8048370: file hello.s, line 9.
Now run the program. If you wanted to run with arguments you would
do run argv1 argv2.
(gdb) run
Starting program: /home/entropy/asm/hello
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Breakpoint 1, main () at hello.s:9
9 pushl %ebp
Current language: auto; currently asm
Ok at this point we have stopped at line 9, one line after the symbol main.
step means take one step forward, or execute exactly one instruction.
(gdb) step
10 movl %esp, %ebp
Here we can see if we type step that esp will be moved into ebp, so look
at both of their values before this instruction executes.
(gdb) print $esp
$1 = (void *) 0xbfd10578
(gdb) print $ebp
$2 = (void *) 0xbfd105a8
Ok they are different, so type step to have the instruction at line 10 execute.
(gdb) step
main () at hello.s:11
11 subl $8, %esp
This is a bit confusing with the line 11 showing, it shows the line that
will execute when you type step. So it has just executed line 10 at this
point and will execute line 11 when you type step again.
Now we examine their values again.
(gdb) print $esp
$3 = (void *) 0xbfd10578
(gdb) print $ebp
$4 = (void *) 0xbfd10578
They are the same after the movl.
Now step past all the things deemed unimportant for now.
(gdb) step
main () at hello.s:12
12 andl $-16, %esp
(gdb) step
13 movl $0, %eax
(gdb) step
14 subl %eax, %esp
(gdb) step
15 subl $4, %esp
(gdb) step
main () at hello.s:16
16 pushl $14
Ok so now were going to check out the stack. Remeber I said that esp
points to the last thing pushed in the stack? Here we execute line 16,
which pushl's the value 14 onto the stack.
(gdb) step
main () at hello.s:17
17 pushl $.LC0
Now we examine(x) a decimal(d) at the address esp points to.
(gdb) x/d $esp
0xbfd10568: 14
And you can see it has a 14 there.
If we step again we will execute line 17, pushing the address of
the string on the stack.
(gdb) step
main () at hello.s:18
18 pushl $1
Ok lets see the address it pushed, we again examine(x) but this time in hex(x).
(gdb) x/x $esp
0xbfd10564: 0x0804849c
Ok so it has the address 0x0804849c in it, thats the address of our sting so
lets examine(x) with the type string(s).
(gdb) x/s 0x0804849c
0x804849c <_IO_stdin_used+4>: "Hello, World!\n"
And we can see our Hello, World!\n string.
step again to put the $1 onto the stack.
(gdb) step
main () at hello.s:19
19 call write
And examine it as decimal.
(gdb) x/d $esp
0xbfd10560: 1
We can also still exame the stack at other places + or - our current position.
To see two pushes up which would be the $14 we would examine decimal at esp+8,
because a long is four bytes, and each pushl pushes four bytes so
2(pushes)*4(bytes each) = 8(bytes total).
(gdb) x/d $esp+8
0xbf9b9af8: 14
Or to examine the hex address one pushl up we would examiine hex at esp+4.
(gdb) x/x $esp+4
0xbf9b9af4: 0x0804849c
The next instruction is a system call, see line "19 call write" above.
If we were to step into this we would step into the syscall which is a
mess to trace when just beginning. Instead we will execute the instruction
"next" which will execute the next instruction until it returns. What this
means is the syscall will fire, write its message then return to us.
(gdb) next
Hello, World!
20 addl $16, %esp
So it wrote out the string.
(gdb) step
main () at hello.s:21
21 movl $0, %eax
movl $0 into eax, 0 is the return value.
(gdb) step
22 leave
Prepare to leave.
(gdb) step
main () at hello.s:23
23 ret
ret(urn) control to the caller of the program.
(gdb) step
0x4003a28e in __libc_start_main () from /lib/libc.so.6
And we are done, let the kernel continue executing.
(gdb) continue
Continuing.
Program exited normally.
(gdb) quit
---[ objdump
A little objdump.
Lets check out another part of our executable by disassembling
with objdump.
entropy@phalaris asm $ objdump -d hello
hello: file format elf32-i386
This will show all the .sections disassembled with their address,
opcodes and asm.
[...snip...]
We'll just look at main.
08048370 <main>:
8048370: 55 push %ebp
8048371: 89 e5 mov %esp,%ebp
8048373: 83 ec 08 sub $0x8,%esp
8048376: 83 e4 f0 and $0xfffffff0,%esp
8048379: b8 00 00 00 00 mov $0x0,%eax
804837e: 29 c4 sub %eax,%esp
8048380: 83 ec 04 sub $0x4,%esp
8048383: 6a 0e push $0xe
8048385: 68 9c 84 04 08 push $0x804849c
804838a: 6a 01 push $0x1
804838c: e8 07 ff ff ff call 8048298 <write@plt>
8048391: 83 c4 10 add $0x10,%esp
8048394: b8 00 00 00 00 mov $0x0,%eax
8048399: c9 leave
804839a: c3 ret
804839b: 90 nop
804839c: 90 nop
804839d: 90 nop
804839e: 90 nop
804839f: 90 nop
^ ^ ^
| | Assmebly
| Opcodes
Virtual address when loaded
[...snip...]
Virtual address are the address the program will think its loaded
at that are actually mapped to physical address, this is so every
program can start at the same virtual address.
The opcodes are the machine code that is generated from the assembly,
this is a good way to generate shellcode, although null's are
bad times, bad times.
Then we have the assembly, looks similar execpt for $-16 is written
as $0xfffffff0 which is the twos complement for -16, and is used to
clear bits. Also the pushl $14 is the hex version pushl $0xe, the label
.LC0 of instruction pushl $.LC0 has been translated to its address namely
the line push $0x804849c (remeber before this is the address we examined
with (gdb) x/s 0x0804849c, which displayed our string).
Thats it for this one.
# milw0rm.com [2006-04-08]