a blog by @captainsafia
Curling up with the `curl` code base
Randomly the other day, I was wondering (when is anything I wonder not random?) how browsers work. Correction: I was wondering what happens when you enter a URL into your browser and hit “Enter.” I wasn’t wondering about the DNS resolutions or TCP connections or other networking related aspects around the Internet. I was wondering how a browser connects all of this into a single application. I’ve learned about these concepts before, but I’ve always wondered how they are all connected into a single, functional application.
As per usual, I want to learn more about this by reading the source code for an application. At first, I figured I might read the code for Chromium or Firefox or some other open source browser. Then I realized that there was probably a lot of code in these applications that would cloud my analysis of the networking aspect of this. Instead, I figured that I would try to figure out how these concepts connect by looking at the code base for
curl is a command line tool that allows the user to transfer data from one server to another. Although it is often used to query sites through the hypertext transfer protocol (HTTP), you can actually use it transfer data through other protocols as well, like FTP. I actually didn’t know this before starting this exploration so I guess that’s one new thing learned!
Anyways, to get started, I want to figure out what happens when I run the following on the command line.
$ curl https://safia.rocks
This command print the HTML source for my personal website to the console. You can try it and see.
The source code for
curl is hosted on GitHub. The first order of business was to find the entry point for the
curl executable. I snooped around the source code to try to find this.
Sidebar: I use the phrase “snooped around” a lot in these blog posts. Generally, it means looking through directories, reading the first couple of lines in files that I think might be related to what I want, and reading through the source for header files.
After poking around for a little bit, I started to notice some trends in how the source code for
curl is structured. In particular, the source code associated with the command line tool is sprinkled around files under the
src/tool_*.c path. This includes code that parses parameters, prints out the help statement, processes the URLs passed into
curl, and interfaces with libcurl.
Sidebar: When I refer to
curl, I am referring to the command line tool. There is also
libcurl which is the library that contains the code for handling all the transfer protocols.
Once I figured out this
src/tool_*.c pattern, it didn’t take long for me to find the entry point of the command line tool. It’s located in the
main function of the
src/tool_main.c file. Sweet! Now, we can get to the fun parts.
The most critical code referenced in this function is the following.
/* Start our curl operation */ result = operate(&global, argc, argv);
So, it looks like most of the heavy lifting is done by the vaguely named
argv parameters are self-explanatory. The
global parameter is a reference to the global configuration for
curl which is the
.curlrc file stored in the users home directory by default.
Alright! So It’s time to look into the
operate function which is defined in the
src/tool_operate.c file. I decided to start my exploration by looking at the function declaration for the
CURLcode operate(struct GlobalConfig *config, int argc, argv_item_t argv)
In particular, I was interested in the
CURLcode object that is returned from the function. I tried to find the definition of this object somewhere in the code base. I assumed that it would be a struct so I searched for that definition. The query I was using to look for the struct definition didn’t turn up anything (I actually looked for it using the command line on a local copy of curl). Instead, I opted for the documentation on
CURLcode which I found here. So it turns out that a CURLcode is actually an enum-like structure which contains integer representations of different possible error codes that might occur in
curl. This makes sense when you consider the fact that most (all) C functions return their status using integers and the fact tha the result of the
operate function is typecast into an integer in the return for the entry point of the main function for
Alright! This blog post is getting a little long so in the next blog post, I’ll dive into the meat of the
operate function and try to analyze what’s going on there. So I don’t lead myself to far astray, I’d like to answer the following questions as I’m reading through the code.
- What calls to the
libcurlare made in the
- What preprocessing, if any, is done to the arguments passed to the
See you in the next blog post!