Friday, 21 November 2008

Making it useful

Played a bit more the the small web developer's toolkit in my last post. This time in a real-world production setting.

Although other tools could have given the answer as well it was kinda fun to watch this work:

$ proxy --port 8082 | log | connect | respond -e | log
proxy: Proxy started at http://macbook.local:8082/
[14:41:26] GET http://google.com/ HTTP/1.1 -> [14:41:27] HTTP/1.1 301 Moved Permanently
[14:41:27] GET http://www.google.com/ HTTP/1.1 -> [14:41:28] HTTP/1.1 302 Found
[14:41:28] GET http://www.google.com.au/ HTTP/1.1 -> [14:41:28] HTTP/1.1 200 OK
[...]

The "log" command is just printing out interesting things it sees -- by default the request and response lines and adds a timestamp.  In real life I wasn't debugging Google BTW. :-)

Monday, 17 November 2008

An idea for a teeny-tiny web developer's toolkit

Every now and then I need to work out what's gone wrong with a production web server. Maybe it was a deployment issue, or a configuration issue or maybe something has broken. I'm sometimes wondering "what on Earth is that server saying to this browser?"

I use Firebug, HTTP Live Headers and Tamper Data in Firefox a fair bit. But they only work in Firefox and MSIE sends different request headers which can affect the outcome. There's also the excellent Charles Web Debugging Proxy which is so awesome it lets you observe SSL encrypted traffic between client and server, serve local files to the browser instead of server files and "spoof" DNS addresses to make browsers talk to specific servers.

Some days though I find myself typing this a lot:

$ telnet app.example.com 80

Trying 127.0.0.1...
Connected to app.example.com.
Escape character is '^]'.

GET / HTTP/1.1
Host: app.example.com

HTTP/1.1 302 Found
[...]

When you're doing that over and over again you start to look for little tricks to make it go quicker. For example, I might do this so that I can use the bash command line history to replay something quicker:

$ printf "GET / HTTP/1.1\nHost: app.example.com\n\n" | telnet app.example.com 80

Trying 127.0.0.1...
Connected to app.example.com.
Escape character is '^]'.
Connection closed by foreign host.

Urgh. Except that didn't work because telnet got an interrupt when printf sent the end of file. Try this instead:

$ (printf "GET / HTTP/1.1\nHost: westfield.com\n\n" ; sleep 2) | telnet westfield.com 80

Yeah, OK. So now it's on one line and I can scroll back through the command history buffer. But what I'd really like to do is this:

$ get --be-like=firefox http://app.example.com/ | no-cache | connect | tee connect.out

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: Close
[...]

Now I don't have to remember to type the "Host" header and the "get" tool can add other useful headers (for example, to emulate Firefox or MSIE).

Or this:

$ get --be-like=msie http://app.example.com/ | remove_header accept-encoding | spoof app.example.com as 127.0.0.1 | connect | headers

Basically what you could do is build a HTTP request and response stream out of UNIX pipes. Each step in the pipe chain would modify the request (or response) before passing it on. Eventually things get passed to the command "connect" which is responsible for actually making the request from the real web server and spitting out the reply. The reply can go through a similar chain of pipes.

Kinda like Yahoo! Pipes. Only... with actual pipes.

And if I substitute the "get" command (which really is just printing out HTTP request objects as text) with a "proxy" command (which listens to port 8080 for web browser requests) then I could test Apache configurations with something like this:

$ proxy --port 8080 | spoof app.example.com as 127.0.0.1 | connect | respond -e | tee logs/respond.out

Here's the most complicated pipe chain I've gotten to work so far:

$ proxy --port=8081 | tee logs/proxy.log | connect | tee logs/connect.log | filter text/html tidy -i -c -asxhtml -utf8 -f /dev/null | tee logs/filter.log | respond -e > logs/respond.log

That comand:
  1. Listens to local port 8081 for web browser proxy requests
  2. Logs the browser requests to logs/proxy.log
  3. Connects to the appropriate web server to get the content just requested by the browser. By default, the connect command handles any content transfer decoding
  4. Logs the server response to logs/connect.log
  5. Runs the server response through HTML Tidy ("tidy") which reformats the HTML with indentation, corrects any HTML errors and transforms it to XHTML if required (UTF-8 encoded of course). HTML Tidy's error messages and report is sent to /dev/null.
  6. Save the output of HTML Tidy to logs/filter.log
  7. Send the final response stream back to the browser, but also echo the response on stdout; and
  8. Save the final response to logs/response.log (the contents of which should be identical to filter.log -- I was debugging).

I had initially had an idea for a much more complicated Charles-like tool with a GUI and threads and select() polling and plug-ins and the like. But this seems to capture the essence of what I was trying to do. It's the smallest implementation that will work -- and no smaller. :-)

Monday, 3 November 2008

Email Client Market Share

A program called Fingerprint from Litmus is generating statistics on e-mail client market share. I was surprised to see Gmail's share so low (6% amongst business users, 4% amongst consumers) but also how old most Outlook client installs are (7% business users using Outlook 2007 versus 29% using Outlook 2003 or earlier).

At least Lotus Notes is down to 0.2% of business users.

The authors suggest running their software on your own mailing list. Presumably their analysis is a bit more sophisticated than what you'd get by looking at the Browsers dimension in Google Analytics for a specific "e-mail only" image.

More discussion on the Litmus Blog.