Squid is one of the most well known caching proxy. It works as a proxy server for protocols such as HTTP and FTP, but utilizes caching when possible to greatly improve content delivery. When I first came across Squid, one of the things I became curious about is how to access and modify HTTP traffic, more specifically just the URLs requested, in real-time. An use case for this is logging to a database for auditing.
Quick research led me to something called the “url_rewrite_program”, which is simply an option available in the Squid configuration file. The path for that is typically something like this:
The way this option works is that in the config file, you simply add a line like the following where that option name is followed by the path to your rewriter:
What Squid does on start is that it spawns several instances of this program (defined by the option “url_rewrite_children”). If you’re using a script file like I am, make sure you set the executable mode. You can do that with a command like the following:
chmod +x /home/aktarer/squid/rewriter.php
As Squid gets requests, it pipes data in the following format to the standard input of your program:
URL <SP> client_ip “/” fqdn <SP> user <SP> method [<SP> kvpairs]<NL>
Your program can simply be a loop waiting on this. Here’s an example of what you might get:
http://www.bing.com 188.8.131.52/- – GET – myip=184.108.40.206 myport=3128
You can parse this like any other string and do as you please like insert to your database. You’ll notice that simply reading input data might cause your proxy to no longer work. Something I didn’t mention earlier is that Squid watches your standard output and requires a response for every input. So when you get the above line as input, you are expected to output something like the following as output, followed by the new line character.
Now let’s say you simply can’t allow people using your proxy to access something like Bing, what can you do? Well instead of outputting the above for such requests, you can output the following:
This will cause a redirect to Google.
Now you might be wondering, how can you access more information such as the request body? As far as I know, there’s no easy way to do this. However, Squid is open source! This means, you can build your own version to do just that.