a blog by @captainsafia
Unraveling `rm`: what happens when you run it?
Another Monday, another blog post!
I’ve been diving into the curl codebase over the past couple of blog posts, but something else has spiked my interest today, so I figured I might as well dig into it while the curiosity is still hot.
To be honest, I’m reluctant to dive into this because the last time I wrote a blog post about a Unix-related topic the — let’s call them the group of individuals with too much time on their hands and a lot of petty on their hearts — took me to task on some of the substance in the blog post in a way that wasn’t too nice. And by “wasn’t too nice” I mean hella racist and sexist. In any case, I figure there will always be haters (and people with unhealthy attachments to operating systems and harassing strangers on the Internet), so I might as well carry on.
OK. Enough blabber. I’ve been working through a backlog of issues on the Zarf app. As such, I’ve been spending a lot of time on the command line. The backlog involved deleting a lot of code (insert satisfied sigh here) and sometimes this involved deleting entire files of source code (insert doubly satisfied sigh here). This got me wondering: what’s going on when you run
rm on the command line. There’s a couple of variants of the
rm command that I commonly run.
$ rm settings.json $ rm -rf config/
Anyways, I wanted to dive into what is going on under the hood with
rm, so I decided to start by determining the syscalls invoked by the
[email protected] ~/zarf> sudo dtruss /tmp/rm History.md dtrace: system integrity protection is on, some features will not be available SYSCALL(args) = return open("/dev/dtracehelper\0", 0x2, 0xFFFFFFFFE9A3EB10) = 3 0 ioctl(0x3, 0x80086804, 0x7FFEE9A3EA70) = 0 0 close(0x3) = 0 0 access("/AppleInternal/XBS/.isChrooted\0", 0x0, 0x0) = -1 Err#2 thread_selfid(0x0, 0x0, 0x0) = 3765033 0 bsdthread_register(0x7FFF56790BEC, 0x7FFF56790BDC, 0x2000) = 1073742047 0 issetugid(0x0, 0x0, 0x0) = 0 0 mprotect(0x1061CA000, 0x1000, 0x0) = 0 0 mprotect(0x1061CF000, 0x1000, 0x0) = 0 0 mprotect(0x1061D0000, 0x1000, 0x0) = 0 0 mprotect(0x1061D5000, 0x1000, 0x0) = 0 0 mprotect(0x1061C8000, 0x88, 0x1) = 0 0 mprotect(0x1061D6000, 0x1000, 0x1) = 0 0 mprotect(0x1061C8000, 0x88, 0x3) = 0 0 mprotect(0x1061C8000, 0x88, 0x1) = 0 0 getpid(0x0, 0x0, 0x0) = 77384 0 stat64("/AppleInternal/XBS/.isChrooted\0", 0x7FFEE9A3E148, 0x0) = -1 Err#2 stat64("/AppleInternal\0", 0x7FFEE9A3E1E0, 0x0) = -1 Err#2 csops(0x12E48, 0x7, 0x7FFEE9A3DC80) = 0 0 dtrace: error on enabled probe ID 2190 (ID 557: syscall::sysctl:return): invalid kernel access in action #10 at DIF offset 28 csops(0x12E48, 0x7, 0x7FFEE9A3D570) = 0 0 geteuid(0x0, 0x0, 0x0) = 0 0 ioctl(0x0, 0x4004667A, 0x7FFEE9A3F954) = 0 0 lstat64("History.md\0", 0x7FFEE9A3F8F8, 0x0) = 0 0 access("History.md\0", 0x2, 0x0) = 0 0 unlink("History.md\0", 0x0, 0x0) = 0 0
Sidebar: Usually, you determine the syscalls utilized by a command by using
strace. I’m on a Mac so
strace isn’t available. Instead, I used a tool called
dtrace. To allow it to process the
rm command, I had to make a copy of the executable into a temporary directory and execute that. All this to say, this is why I’m executing
dtrace /tmp/rm above instead of
So, anyway, let’s look into what’s going on above. The first couple of lines in the trace seem to be pretty clearly related to setting up the
sudo part of the command. I was intrigued by the calls to the
mprotect command. I figured that it might be something related to memory addresses because the first parameter passed to the
mprotect function looks like a memory address. I decided to head over to Google to see if I could find the documentation for this function and confirm this. I find the documentation here and my suspicions were confirmed. The function is responsible for setting the access rights on memory for the calling process. The function declaration looks like this
int mprotect(void *addr, size_t len, int prot) where
addr is the start of the memory range, and
len is the length of range of memory addresses that will be changed, and
prot is an integer that represents how the memory should be protected. I looked into what the different values for
prot that were passed into the
mprotect function calls above and figured out the following.
0x0is the code for
PROT_NONEmeaning that the memory cannot be accessed for writes or reads.
0x1is the code for
PROT_READmeaning that the memory can be read.
0x3is a bitwise OR of the values for
PROT_WRITEwhich means that it allows that memory to be both read or written.
I wasn’t sure what the memory addresses that were referenced in the
mprotect call actually corresponded to or what the best way to figure it would be.
getpid command has a pretty self-explanatory name, but I wondered what the parameters that were passed to the function were. As it turns out, the
getpid function supposedly takes no parameters, so the inclusion of the parameters in the call above perplexed me.
I was also unsure of what the
ioctl calls did. As it turns out, this is actually pretty warranted. Some investigation revealed that
csops is a system call that is unique to the Apple operating system and can be used to check the signature that is written into a memory page by the operating system. It seems to be some way of checking the validity of a particular chunk of code. I’m not too sure about it, and there isn’t a ton of information about Apple-specific syscalls so I can’t dig into the details of this as well as I’d like.
ioctl syscall is a pretty versatile one and is responsible for all input/output related interactions. According to the manpage, the first argument represents a file descriptor, the second is a reference to a “device-dependent request code”, and the third is a pointer to memory. I dug around to figure out what the request code “0x4004667A” and realized that it was pretty commonly associated with invocations of the
dtrace command so I figured that this invocation was not related to the task of removing the file referenced.
I was interested in the last syscall invoked in this strace,
unlink. I headed back to Google to learn some more about it and came across this manpage. The
unlink command is the one that is actually responsible for removing the file.
All in all, the most
rm-related parts of the trace are in the last few lines. Everything else seems to be setup associated with setting up memory permissions, ensuring the status of the related files, and setting up the
So there’s that! I’ll admit that I’m not sure how I feel about this trace approach to antropology. It’s a little too direct, I almost like wallowing in the code and getting last in the complexity of it. There is something therapeutic about it. I’ll see if I get more comfortable with this technique in future code reads. Until then, see you next time!