The third assignment of the SLAE64 exam states:
- Study about the Egg Hunter shellcode
- Create a working demo of the Egg Hunter
- It should be configurable for different payloads
I for one had not heard before of the concept of an egg hunter so a little searching around led me to a (the?) paper by skape called Safely Searching Process Virtual Address Space published in 2004.
In a nutshell an egg hunter is a piece of code that searches the virtual address space (VAS) of a process looking for a predefined marker, called an egg. Once this marker has been found it jumps to it and the code that was marked by the egg is executed in the context of the application.
The situations where one would use an egg hunter are situations where an attacker is able to inject code of arbitrary length into the application's memory but cannot predict its final location. An overflow elsewhere which would otherwise not allow uploading the larger piece of code could then be utilized to insert the egg hunter. It then hunts for the egg in memory to find the rest of the exploit.
The paper defines three criteria an egg hunter should satisfy:
- It should be robust
- It must be small
- It should be fast
The paper then makes a number of observations which I'll summarize here:
- the egg should be 8 bytes; this is due to the fact the egg hunter would typically store the egg as a 4 byte key and it may otherwise end up finding it's own key rather than the actual egg when searching memory;
- the pattern of the egg should be easy to recognize by the hunter without requiring a lot of code to verify the "something" is actually the egg (criterium 2);
- the key could be valid code; for example:
0x50905090(thus the egg is the dword
0x50905090placed in memory twice in a row) which decodes to:
nop;push eax;nop;push eax. When the hunter jumps to it the key mustn't contain invalid instructions or the application would crash, or, the hunter should account for this and jump beyond the marker to first bytes of actual code.
- iterating through memory addresses and potentially dereferencing uninitialized/unavailabale memory could lead to a segmentation fault. Registering a "fake"
SIGSEGVhandler requires too much code according to the author so it is argued it's better to rely on standard syscalls such as
sigaction(2)to catch the
EFAULTerror upon touching uninitialized memory.
Note however that while the paper focusses on Linux/x86, the described concepts do generally apply to Linux/x86_64 too. However it should be noted that the 64-bit shellcode cannot really come close to their 32-bit equivalents of 30-odd bytes with a runtime below 10 seconds. To me this effectively renders this (interesting) technique not feasible for practical usage on Linux/x86_64 as the haystack to search for our needle is so much bigger.
The second implementation described by the author eliminates the need for an executable egg by using the
scasd instruction which actually does the incrementing of EDI by four bytes per comparison. Thus bypassing the need to jump to the beginning of the egg or explicitly accounting for 8 bytes of non-executable egg.
For this assignment I have based my code on skape's access(2) revisited implementation. I thought this implementation was very clear in what it's doing without making sacrifices in size. In the end it comes in at a total of 45 bytes.
I have used the egg value of
0x90509050 which is actually valid code for:
nop push eax nop push eax
but due to using the
scasd call we don't actually end up jumping to the very beginning of our egg. RDI has been advanced by
scasd until right where our actual payload would begin.
One way we could speed this up can be learned from
pmap $PID where
$PID is the process ID of our shellcode wrapper. For example:
root@slae64:~# pmap 20182 20182: ./egghunter 00005635d4eb4000 4K r-x-- egghunter 00005635d50b4000 4K r-x-- egghunter 00005635d50b5000 4K rwx-- egghunter 00005635d5c24000 132K rwx-- [ anon ] 00007f40cc854000 1948K r-x-- libc-2.27.so 00007f40cca3b000 2048K ----- libc-2.27.so 00007f40ccc3b000 16K r-x-- libc-2.27.so 00007f40ccc3f000 8K rwx-- libc-2.27.so 00007f40ccc41000 16K rwx-- [ anon ] 00007f40ccc45000 156K r-x-- ld-2.27.so 00007f40cce63000 8K rwx-- [ anon ] 00007f40cce6c000 4K r-x-- ld-2.27.so 00007f40cce6d000 4K rwx-- ld-2.27.so 00007f40cce6e000 4K rwx-- [ anon ] 00007ffc5e1bc000 132K rwx-- [ stack ] 00007ffc5e1f2000 12K r---- [ anon ] 00007ffc5e1f5000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 4512K
As you can see the program is mapped into memory starting at
0x5635d4eb4000 yet our egg hunter will spend a considerable amount of time iterating through lower memory before getting to that address. One optimization at the cost of code size might be to determine the base address and start searching from there.
In order to verify my code is correct I forced my shellcode wrapper to place the
.text section at a relatively low address so the egg would be found quite quickly. I used the
-Wl,-Ttext-segment,0x40000000 compiler flag for that.
I have uploaded my code to jasperla/slae64 on GitHub:
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.
Student ID: SLAE64-1614