Hacking the /proc filesystem memory!

Hacking the /proc filesystem memory!

In the Linux (or any *nix) OS when a program is loaded into memory and executed, it receives a PID, a process ID. Today we will consider two special "files" associated with each such running process: /proc/PID/maps and /proc/PID/mem where PID is the actual integer process ID in question. Let's clarify this with an example, but before we do all the code and files discussed in this article can be found here.

First it will be useful to create a little C program that will make testing all this very convenient for us. 

No alt text provided for this image

The program allocates a string, "Holberton School", in its heap memory and loops once a second indefinitely, each time printing the: iteration count, value of the string, and the strings' memory address (the hexadecimal value of its pointer). The program needs to loop in order to stay resident in memory, so that we can hack it! And if we do so successfully, we should see the string value it prints change right before our eyes...

So if we quickly compile this program to the binary executable a.out and run it, we can use the pgrep command to find the PID corresponding to our program name. We'll need to do this in two separate terminal windows because the one that loads the program will get a bit spammy!

Window 1:

>ls
0-main.c  README.md  read_write_heap.py

>gcc 0-main.c 

>ls
0-main.c  a.out  README.md  read_write_heap.py

>./a.out
[0] Holberton School (0xb0a010)
[1] Holberton School (0xb0a010)
[2] Holberton School (0xb0a010)
...

Window 2:

>pgrep a.out
18220

>ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
vagrant  18220  0.0  0.0   4328   348 pts/0    S+   21:31   0:00 ./a.out
...

The pgrep command is just a convenient way to quickly isolate the PID matching our program name. We could of course use the classic ps aux or even ps aux | grep a.out

As we mentioned originally, there will now be at least two special files corresponding to our process in /proc/18220/maps and /proc/18220/mem

The first file we can just read with the cat command:

>cat /proc/18220/maps

00400000-00401000 r-xp 00000000 00:19 265456                             /vagrant/holberton/holbertonschool-system_linux/0x03-proc_filesystem/a.out

00600000-00601000 r--p 00000000 00:19 265456                             /vagrant/holberton/holbertonschool-system_linux/0x03-proc_filesystem/a.out

00601000-00602000 rw-p 00001000 00:19 265456                             /vagrant/holberton/holbertonschool-system_linux/0x03-proc_filesystem/a.out

00b0a000-00b2b000 rw-p 00000000 00:00 0                                  [heap]

7f4ded902000-7f4dedac0000 r-xp 00000000 08:01 3285                       /lib/x86_64-linux-gnu/libc-2.19.so

7f4dedac0000-7f4dedcc0000 ---p 001be000 08:01 3285                       /lib/x86_64-linux-gnu/libc-2.19.so

7f4dedcc0000-7f4dedcc4000 r--p 001be000 08:01 3285                       /lib/x86_64-linux-gnu/libc-2.19.so

7f4dedcc4000-7f4dedcc6000 rw-p 001c2000 08:01 3285                       /lib/x86_64-linux-gnu/libc-2.19.so

7f4dedcc6000-7f4dedccb000 rw-p 00000000 00:00 0 

7f4dedccb000-7f4dedcee000 r-xp 00000000 08:01 3102                       /lib/x86_64-linux-gnu/ld-2.19.so

7f4dedee1000-7f4dedee4000 rw-p 00000000 00:00 0 

7f4dedeec000-7f4dedeed000 rw-p 00000000 00:00 0 

7f4dedeed000-7f4dedeee000 r--p 00022000 08:01 3102                       /lib/x86_64-linux-gnu/ld-2.19.so

7f4dedeee000-7f4dedeef000 rw-p 00023000 08:01 3102                       /lib/x86_64-linux-gnu/ld-2.19.so

7f4dedeef000-7f4dedef0000 rw-p 00000000 00:00 0 

7ffcc9ab6000-7ffcc9ad7000 rw-p 00000000 00:00 0                          [stack]

7ffcc9b3c000-7ffcc9b3e000 r-xp 00000000 00:00 0                          [vdso]

ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

That's a lot of data but only this one line which ends with "[heap]" is relevant to us right now:

00b0a000-00b2b000 rw-p 00000000 00:00 0                                  [heap]

This tells us that our a.out application's heap memory begins at address 00b0a000 and ends at address 00b2b000 (or 0xb0a000 to 0xb2b000). Also the rw-p indicates that both read and write access to this memory is available.

Let's recall the string address printed before by our running C program:

[0] Holberton School (0xb0a010)

And note that indeed 0xb0a000 <= 0xb0a010 <= 0xb2b000. So the address of our string as displayed by our C program itself is between the start and stop addresses of the heap as parsed in the maps file. We're on the right track!

But how do we actually access these memory addresses? Simply by reading and writing to the virtual /proc/18220/mem file! But to do this we'll need to write a script to open the file in binary mode. Let's use Python for its clarity and convenience.

First a little input validation. We'll require 4 arguments (including the script name itself): the PID, the search string, and the replace string. Also, since it can be dangerous to overwrite application memory willy-nilly, we will require that the replace string not be longer than the search string so we do not overwrite any critical bytes!

No alt text provided for this image

That's it! Of course, unless our IDE is really smart, we now have to implement parse_maps_file() and update_mem_file(). The former takes just the integer PID as its only argument and returns a tuple corresponding to the start and stop addresses of the heap memory. The latter then takes 5 arguments, the integer PID, the search string, the replace string, the starting address of the heap memory, and the stopping address, and uses them to modify the mem virtual file.

No alt text provided for this image

The parse_maps_file() function basically does the same thing we did manually. It opens and reads the /proc/PID/maps file, and checks line-by-line looking for one which ends with "[heap]" (note that the newline character is included in the parsed lines so we must specify it for endswith() to match).

Once this line is found we split it by spaces and take the first field, which is the START-STOP memory addresses separated by a dash "-" character. We split this by the dash and parse the two addresses as hexadecimal integers. If any of this process fails or raises an exception we print the error message and exit with error status 1. Otherwise the function returns the start and stop addresses of the heap memory.

No alt text provided for this image

The update_mem_file() function opens /proc/PID/mem for our given PID in read-write binary mode. We're dealing with pure memory here, so we are not considering character encoding at this level. It is just a sequence of bytes. In the case of a C application, the string characters happen to be single bytes whose integer values match the ASCII codes of the characters they represent.

Note that this script will have to be run with sudo root privileges otherwise it will not be permitted access to the /proc/PID/mem virtual file even if the effective user is the owner of the process. Once this file is open() in read-write binary mode, we immediately seek() to the starting address of the heap memory which we found via parse_maps_file(). We then read() the entire heap memory, which is calculated as the difference of the higher ending address and the lower starting address. So we simply subtract from the end address the start address and the result is the number of bytes we need to read() after seek()ing to the start address in order to get the entire heap memory.

We store the result of the read() in the variable data and call the find() function on it to search for our given string. We need to first call encode() on our given search and replace strings to encode the Python strings, which by default are unicode, into either UTF-8 or ASCII, both of which are the same for the latin alphabet and encode every character in a single byte. We need to do this in order to correctly find and replace the sequence of C characters which are encoded as individual bytes.

If the find() function returns a non-negative result, it is the offset, or the distance from our current location in memory, which denotes the position of the found string. The seek() function by default is relative to the start of the file, so after the find we seek back to the sum of the heap start and the offset, which will place our imaginary cursor right where the search string was found in the process' heap memory as accessed via the /proc/PID/mem virtual file.

From there it is just a single write() statement passing our encoded replace string. However, in order to have a proper replacing effect, we need to manually add the null character b'\x00' (the \b denotes a byte sequence, in this case a single byte of the hex value 0x00 or just plain 0) to our replace string so that it terminates properly.

So let's try all this (and don't forget the sudo) in Window 2:

>sudo ./read_write_heap.py 18220 School Cool!
[*] Heap starts at B0A000
[*] Read 135168 bytes
[*] String found at B0A01A
[*] 6 bytes written!
>

And, viola! In Window 1:

...
[31] Holberton School (0xb0a010)
[32] Holberton School (0xb0a010)
[33] Holberton School (0xb0a010)
[34] Holberton Cool! (0xb0a010)
[35] Holberton Cool! (0xb0a010)
[36] Holberton Cool! (0xb0a010)
...

(Note that the value of the string address 0xB0A01A printed by the Python script is almost the same as the address printed by the C program 0xb0a010... this is because 0xb0a010 refers to the very start of "Holberton School" but "School" starts after the 10 characters "Holberton " and since 10 is 0x0a in base 16 hex therefore 0xb0a010 + 0x00000a == 0xb0a01a)

We have hacked the /proc filesystem virtual memory of our C application with a Python script!

hi, I am trying to read the /proc/pid/mem file, the process i followed is i go to /proc/pid/maps and get the "00b0a000-00b2b000 rw-p 00000000 00:00 0 " copy the "00b0a000" and go to /proc/pid/mem using python f = open('proc/pid/mem','rb') f.seek(00b0a000) f.read(10) but every-time i only gets the header for example ("\x90\xb1\xff"), i am looking for some strings. How can i read the strings attached? Please help! thanks in advance

Like
Reply

To view or add a comment, sign in

More articles by Arthur Damm

Others also viewed

Explore content categories