First steps into shellcodes

Create the first payload
Change the assembly code to avoid null bytes
Automate opcodes extraction
Shellcode development techniques

The term shellcode simply represent machine code in places where it is not normally found, such as a char array.

Create the first payload

First let’s create a simple payload: a one that just… exits. Here, with status code 0.

In C, it would looks like:

int main(){
    exit(0);
}

This program uses the exit syscall, giving it the value 0.

In assembly (x86_64, Intel syntax), the same code looks like:

section .text
global _start

_start:
    mov rdi, 0   ; set return code to 0
    mov rax, 60  ; use syscall number 60, which is exit
    syscall

Let’s compile and try it:

$ nasm -f elf64 -o exit.o exit.asm
$ ld exit.o -o exit
$ ./exit
$ echo $?
0

We can look at the object file using objdump:

$ objdump -M intel -d exit

exit:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
  401000:       bf 00 00 00 00          mov    edi,0x0
  401005:       b8 3c 00 00 00          mov    eax,0x3c
  40100a:       0f 05

The opcode of our payload are the bytes in the middle column:

bf 00 00 00 00
b8 3c 00 00 00
0f 05

which can be converted to this char array: char doexit[] = "\xbf\x00\x00\x00\x00\xb8\x3c\x00\x00\x00\x0f\x05". Theorically, we can execute it as follows:

char doexit[] = "\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80";

int main(int argc, char **argv)
{
  int (*func)();
  func = (int (*)()) doexit;
  (int)(*func)();
}

However, this will cause an issue. In C, the 0x00 character (also known as null byte) mark the end of a string. So our shellcode will only be partially interpreted.

Change the assembly code to avoid null bytes

There is different ways to avoid null bytes in our opcodes.

The first instruction that causes problem is mov rdi, 0. The null bytes exists because we are using the value 0x0. The trick here is to use the XOR logical operator. When XOR-ing 2 identical values (in our case: registers), the result of the operation will be 0. So, to put the 0 value in the RDI register, we can simply do xor rdi, rdi which result in the opcode 48 31 ff.

The second problematic instruction is mov rax, 1. The null bytes appears because we are moving a one-byte value (0x1) in a longer register. As a register can be accessed without using their full size, we can move 0x1 into the AL register which is the first byte of the RAX register. We end up having mov al, 1 which corrsponds to b0 3c opcode. The final assembly code looks like:

section .text
global _start

_start:
    xor rdi, rdi ; XOR the RDI register and store the result in it
    mov al, 60   ; use AL resgister instead of full RAX
    syscall

With objdump:

$ objdump -M intel -d exit

exit:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
  401000:       48 31 ff                xor    rdi,rdi
  401003:       b0 3c                   mov    al,0x3c
  401005:       0f 05                   syscall

So our shellcode went from 12 to 7 bytes length, and all the null bytes are removed !

Let’s use it in our C code:

char doexit[] = "\x48\x31\xff\xb0\x3c\x0f\x05";

int main(){
    int (*func)();
    func = (int (*)()) doexit;
    (int)(*func)();
}

How does this code works ? In C, functions are just variables that point to executable code. Here, we create a function called func that will simply point to our code stored in doexit.

Now compile the program and launch it:

$ gcc shellcode.c -z execstack
$ ./a.out
$ echo $?
0

Our shellcode worked !

Note that we must use the -z execstack option with GCC, because it is intelligent enough to detect stack smashing attempts, and will abort the program execution.

Let’s try with another exit value, for exemple 2:

$ objdump -M intel -d exit

exit:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
  401000:       40 b7 02                mov    dil,0x2
  401003:       b0 3c                   mov    al,0x3c
  401005:       0f 05

Replace the char array in the C code to char doexit[] = "\x40\xb7\x02\xb0\x3c\x0f\x05" and compile and execute it:

$ gcc shellcode.c -z execstack
$ ./a.out
$ echo $?
2

Automate opcodes extraction

As you can see, parsing opcodes from objdump can be annoying. That’s why we will automate this task with a simple bash function:

objdumptoshellcode (){
    for i in $(objdump -d $1 -M intel | grep "^ " | cut -f2); do 
        echo -En '\x'$i
    done
    echo 
}

When we use it on exit executable, we get:

$ objdumptoshellcode exit
\xb3\x02\xb0\x01\xcd\x80

This will make our task easier in the next steps !

Shellcode development techniques

There is multiple way to write code that will create a shellcode, and all doesn’t have the same assets and drawbacks. I’ll talk about jmp, call, pop and the stack techniques.

JMP, CALL, POP

Consider the following assembly skeleton:

jmp end

main:
    pop rsi
    ...

end:
    call main
    hello: db "hello", 0xa

The first instruction set the instruction pointer (stored in the RIP register) to point to “end” function, so after the jump we will go inside it. The first instruction in the “end” function is call main. When we execute it, the address of the next instruction is pushed on the stack (in our case, the address of the string “hello\n”). This way, when we execute pop rsi in the “main” function, the RSI register will contain the address of our string !

We must do this because we are injecting our shellcode inside a program that is already running, which means that we can’t know the exact address of the string. This is called a position-independent executable (also known as PIE).

Let’s try this technique to display a message. To begin, we need to write the assembly code:

section .text
global _start

_start:
    jmp caller

    main:
        pop rsi      ; get the address of the string
        xor rax, rax ; clear the registers
        xor rdi, rdi
        xor rdx, rdx

        ; write string to stdout
        mov al, 1   ; write is syscall function 1
        mov dil, 1  ; use fd 1 (stdout)
        mov dl, 6   ; length of the string (letters + line return)
        syscall

        ; exit
        mov al, 60 ; exit is syscall function 60
        syscall

    caller:
        call main  ; put the string address on the stack
        msg: db "hello", 0xa

We can extract the opcodes from the executable file with our function:

$ objdumptoshellcode hello
\xeb\x17\x5e\x48\x31\xc0\x48\x31\xff\x48\x31\xd2\xb0\x01\x40\xb7\x01\xb2\x06\x0f\x05\xb0\x3c\x0f\x05\xe8\xe4\xff\xff\xff\x68\x65\x6c\x6c\x6f\x0a

and replace the char array in our C code.

Then, we compile and execute it:

$ gcc shellcode.c -z execstack
$ ./a.out 
hello

Stack technique

One of the advantage of this technique is the size of the shellcode. However, as we use the stack to store values, it is important to keep in mind the endianness of our CPU architecture. Here is the code for the same exploit, using the stack technique:

section .text
global _start

_start:
    ; clear the registers
    xor rax, rax
    xor rdi, rdi
    xor rdx, rdx    
    
    ; setting the stack
    push rdx         ; push rdx to the stack. It is empty, and
                     ; will be used as null byte
    push 0x0a6f6c6c  ; push "\noll" to the stack
    push word 0x6568 ; push "eh" to the stack
    mov al, 1        ; syscall 1 (write)
    mov dil, 1       ; fd 1 (stdout)
    mov rsi, rsp     ; we give in argument to write the stack pointer
                     ; which is pointing to our string
    mov dl, 6        ; length of the string
    syscall             
    
    ; exit
    mov al, 60  ; exit is syscall 60
    syscall

Firstly, we clear the registers. Next, we push RDX to stack, which will behave has null byte. After that, we push the string. As x86_64 in little endian, we start by the end of the string. Once this is done, we simply call the function as seen before, and we exit.

The corresponding opcode is:

\x48\x31\xc0\x48\x31\xff\x48\x31\xd2\x52\x68\x6c\x6c\x6f\x0a\x66\x68\x68\x65\xb0\x01\x40\xb7\x01\x48\x89\xe6\xb2\x06\x0f\x05\xb0\x3c\x0f\x05

So we went from a 36-byte-long shellcode with the jmp, call, pop technique to a 35-byte-long shellcode with the stack technique. In our case, the gain is minor, but still exists.

RIP relative addressing technique

The x86_64 architecture allows another development technique because of the introduction of a new command: rel. This allows us to write code which is position-independent. The address in question is calculated relatively to the RIP pointer. Here is the same shellcode, written following this technique:

section .text
global _start

; we declare our variable containing the string
_start:
    jmp main
    hello: db "hello", 0xa

main:
    ; clear the registers
    xor rax, rax
    xor rdi, rdi
    xor rdx, rdx

    ; set the syscall parameters as usual
    mov al, 1
    mov dil, 1
    lea rsi, [rel hello] ; move the relative address of the string
                         ; into RSI
    mov dl, 6 ; length of the string
    syscall
    
    ; exit
    mov al, 60
    syscall

The corresponding shellcode is

\xeb\x06\x68\x65\x6c\x6c\x6f\x0a\x48\x31\xc0\x48\x31\xff\x48\x31\xd2\xb0\x01\x40\xb7\x01\x48\x8d\x35\xe5\xff\xff\xff\xb2\x06\x0f\x05\xb0\x3c\x0f\x05

It has a length of 37 bytes.

If we check the compiled object with objdump, we clearly see that the address stored in RSI is relative to RIP:

$ objdump -M intel -d hellorel

hellorel:     format de fichier elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
  401000:	eb 06                	jmp    401008 <main>

0000000000401002 <hello>:
  401002:	68 65 6c 6c 6f       	push   0x6f6c6c65
  401007:	0a                   	.byte 0xa

0000000000401008 <main>:
  401008:	48 31 c0             	xor    rax,rax
  40100b:	48 31 ff             	xor    rdi,rdi
  40100e:	48 31 d2             	xor    rdx,rdx
  401011:	b0 01                	mov    al,0x1
  401013:	40 b7 01             	mov    dil,0x1
  401016:	48 8d 35 e5 ff ff ff 	lea    rsi,[rip+0xffffffffffffffe5]        # 401002 <hello>
  40101d:	b2 06                	mov    dl,0x6
  40101f:	0f 05                	syscall 
  401021:	b0 3c                	mov    al,0x3c
  401023:	0f 05                	syscall

Once again, if we compile and launch this program, everything works perfectly:

$ ./hellorel
hello