Monday 17 November 2008

An idea for a teeny-tiny web developer's toolkit

Every now and then I need to work out what's gone wrong with a production web server. Maybe it was a deployment issue, or a configuration issue or maybe something has broken. I'm sometimes wondering "what on Earth is that server saying to this browser?"

I use Firebug, HTTP Live Headers and Tamper Data in Firefox a fair bit. But they only work in Firefox and MSIE sends different request headers which can affect the outcome. There's also the excellent Charles Web Debugging Proxy which is so awesome it lets you observe SSL encrypted traffic between client and server, serve local files to the browser instead of server files and "spoof" DNS addresses to make browsers talk to specific servers.

Some days though I find myself typing this a lot:

$ telnet 80

Connected to
Escape character is '^]'.

GET / HTTP/1.1

HTTP/1.1 302 Found

When you're doing that over and over again you start to look for little tricks to make it go quicker. For example, I might do this so that I can use the bash command line history to replay something quicker:

$ printf "GET / HTTP/1.1\nHost:\n\n" | telnet 80

Connected to
Escape character is '^]'.
Connection closed by foreign host.

Urgh. Except that didn't work because telnet got an interrupt when printf sent the end of file. Try this instead:

$ (printf "GET / HTTP/1.1\nHost:\n\n" ; sleep 2) | telnet 80

Yeah, OK. So now it's on one line and I can scroll back through the command history buffer. But what I'd really like to do is this:

$ get --be-like=firefox | no-cache | connect | tee connect.out

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: Close

Now I don't have to remember to type the "Host" header and the "get" tool can add other useful headers (for example, to emulate Firefox or MSIE).

Or this:

$ get --be-like=msie | remove_header accept-encoding | spoof as | connect | headers

Basically what you could do is build a HTTP request and response stream out of UNIX pipes. Each step in the pipe chain would modify the request (or response) before passing it on. Eventually things get passed to the command "connect" which is responsible for actually making the request from the real web server and spitting out the reply. The reply can go through a similar chain of pipes.

Kinda like Yahoo! Pipes. Only... with actual pipes.

And if I substitute the "get" command (which really is just printing out HTTP request objects as text) with a "proxy" command (which listens to port 8080 for web browser requests) then I could test Apache configurations with something like this:

$ proxy --port 8080 | spoof as | connect | respond -e | tee logs/respond.out

Here's the most complicated pipe chain I've gotten to work so far:

$ proxy --port=8081 | tee logs/proxy.log | connect | tee logs/connect.log | filter text/html tidy -i -c -asxhtml -utf8 -f /dev/null | tee logs/filter.log | respond -e > logs/respond.log

That comand:
  1. Listens to local port 8081 for web browser proxy requests
  2. Logs the browser requests to logs/proxy.log
  3. Connects to the appropriate web server to get the content just requested by the browser. By default, the connect command handles any content transfer decoding
  4. Logs the server response to logs/connect.log
  5. Runs the server response through HTML Tidy ("tidy") which reformats the HTML with indentation, corrects any HTML errors and transforms it to XHTML if required (UTF-8 encoded of course). HTML Tidy's error messages and report is sent to /dev/null.
  6. Save the output of HTML Tidy to logs/filter.log
  7. Send the final response stream back to the browser, but also echo the response on stdout; and
  8. Save the final response to logs/response.log (the contents of which should be identical to filter.log -- I was debugging).

I had initially had an idea for a much more complicated Charles-like tool with a GUI and threads and select() polling and plug-ins and the like. But this seems to capture the essence of what I was trying to do. It's the smallest implementation that will work -- and no smaller. :-)

1 comment:

Webplore said...
This comment has been removed by a blog administrator.