Friday 21 November 2008

Making it useful

Played a bit more the the small web developer's toolkit in my last post. This time in a real-world production setting.

Although other tools could have given the answer as well it was kinda fun to watch this work:

$ proxy --port 8082 | log | connect | respond -e | log
proxy: Proxy started at http://macbook.local:8082/
[14:41:26] GET HTTP/1.1 -> [14:41:27] HTTP/1.1 301 Moved Permanently
[14:41:27] GET HTTP/1.1 -> [14:41:28] HTTP/1.1 302 Found
[14:41:28] GET HTTP/1.1 -> [14:41:28] HTTP/1.1 200 OK

The "log" command is just printing out interesting things it sees -- by default the request and response lines and adds a timestamp.  In real life I wasn't debugging Google BTW. :-)

Monday 17 November 2008

An idea for a teeny-tiny web developer's toolkit

Every now and then I need to work out what's gone wrong with a production web server. Maybe it was a deployment issue, or a configuration issue or maybe something has broken. I'm sometimes wondering "what on Earth is that server saying to this browser?"

I use Firebug, HTTP Live Headers and Tamper Data in Firefox a fair bit. But they only work in Firefox and MSIE sends different request headers which can affect the outcome. There's also the excellent Charles Web Debugging Proxy which is so awesome it lets you observe SSL encrypted traffic between client and server, serve local files to the browser instead of server files and "spoof" DNS addresses to make browsers talk to specific servers.

Some days though I find myself typing this a lot:

$ telnet 80

Connected to
Escape character is '^]'.

GET / HTTP/1.1

HTTP/1.1 302 Found

When you're doing that over and over again you start to look for little tricks to make it go quicker. For example, I might do this so that I can use the bash command line history to replay something quicker:

$ printf "GET / HTTP/1.1\nHost:\n\n" | telnet 80

Connected to
Escape character is '^]'.
Connection closed by foreign host.

Urgh. Except that didn't work because telnet got an interrupt when printf sent the end of file. Try this instead:

$ (printf "GET / HTTP/1.1\nHost:\n\n" ; sleep 2) | telnet 80

Yeah, OK. So now it's on one line and I can scroll back through the command history buffer. But what I'd really like to do is this:

$ get --be-like=firefox | no-cache | connect | tee connect.out

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: Close

Now I don't have to remember to type the "Host" header and the "get" tool can add other useful headers (for example, to emulate Firefox or MSIE).

Or this:

$ get --be-like=msie | remove_header accept-encoding | spoof as | connect | headers

Basically what you could do is build a HTTP request and response stream out of UNIX pipes. Each step in the pipe chain would modify the request (or response) before passing it on. Eventually things get passed to the command "connect" which is responsible for actually making the request from the real web server and spitting out the reply. The reply can go through a similar chain of pipes.

Kinda like Yahoo! Pipes. Only... with actual pipes.

And if I substitute the "get" command (which really is just printing out HTTP request objects as text) with a "proxy" command (which listens to port 8080 for web browser requests) then I could test Apache configurations with something like this:

$ proxy --port 8080 | spoof as | connect | respond -e | tee logs/respond.out

Here's the most complicated pipe chain I've gotten to work so far:

$ proxy --port=8081 | tee logs/proxy.log | connect | tee logs/connect.log | filter text/html tidy -i -c -asxhtml -utf8 -f /dev/null | tee logs/filter.log | respond -e > logs/respond.log

That comand:
  1. Listens to local port 8081 for web browser proxy requests
  2. Logs the browser requests to logs/proxy.log
  3. Connects to the appropriate web server to get the content just requested by the browser. By default, the connect command handles any content transfer decoding
  4. Logs the server response to logs/connect.log
  5. Runs the server response through HTML Tidy ("tidy") which reformats the HTML with indentation, corrects any HTML errors and transforms it to XHTML if required (UTF-8 encoded of course). HTML Tidy's error messages and report is sent to /dev/null.
  6. Save the output of HTML Tidy to logs/filter.log
  7. Send the final response stream back to the browser, but also echo the response on stdout; and
  8. Save the final response to logs/response.log (the contents of which should be identical to filter.log -- I was debugging).

I had initially had an idea for a much more complicated Charles-like tool with a GUI and threads and select() polling and plug-ins and the like. But this seems to capture the essence of what I was trying to do. It's the smallest implementation that will work -- and no smaller. :-)

Monday 3 November 2008

Email Client Market Share

A program called Fingerprint from Litmus is generating statistics on e-mail client market share. I was surprised to see Gmail's share so low (6% amongst business users, 4% amongst consumers) but also how old most Outlook client installs are (7% business users using Outlook 2007 versus 29% using Outlook 2003 or earlier).

At least Lotus Notes is down to 0.2% of business users.

The authors suggest running their software on your own mailing list. Presumably their analysis is a bit more sophisticated than what you'd get by looking at the Browsers dimension in Google Analytics for a specific "e-mail only" image.

More discussion on the Litmus Blog.

Monday 27 October 2008

Reminder: The Boy Who Cried "Wolf" Got Eaten

Parents and children will be familiar with the Aesop's fable, The Boy Who Cried Wolf. It's often told by parents to teach their children the importance of not raising false alarms and of telling the truth. While the moral for children is obvious there's a moral for parents as well: after all, the boy shepherd does in fact eventually confront a wolf.

Australian MPs have threatened total Internet censorship and regulation for so long now that one is tempted to assume that the latest threats of mandatory Internet censorship are just more cries of "wolf! wolf!" But Senator Conroy is not "crying wolf." Senator Conroy is the wolf.

Senator Conroy has been dishonest, aggressive and offensive in his plan for mandatory Internet censorship.

Dishonest, because he did not disclose the mandatory nature of the plan prior to the election. It is clear however that this has always been his plan.

Aggressive, because he has not consulted the Australian public nor listened to those who have raised serious and valid concerns about the fairness and practicality of his totalitarian scheme. His staff have attempted to silence critics by pressuring their employers.

And offensive, because when people have disagreed with his plan, he has accused them of being pro-child pornography.

The Senator is wrong, and his plan for mandatory Internet censorship is also wrong. It's bad policy. It's bad tech. It's security theatre. Give the $44M to law enforcement instead. And sack Senator Conroy.

Sunday 26 October 2008

Bulk Change of Passwords on Solaris (Repost)

Just filing this one for future use.

I needed to change the password of a large number of accounts on a Solaris box. Unfortunately, I didn't have anything rational like Expect. Also, the Solaris passwd command won't read from stdin. Probably a good thing. :-)

Here's a script that might come in handy. Should only be used on Solaris boxes, unless you grok the Linux shadow password format. But on Linux you can probably get Expect running and that would be a less... primitive way of getting the job done. The problem with this script is that it bypasses PAM so if you're using anything other than shadow files it won't work.

The code is below. If it's getting corrupted then you can also download the script.

#!/usr/bin/perl -w
# Usage: ./ [/etc/shadow [list of account names]]

use warnings;
use strict;

# List your accounts here. Or just say @accounts = @ARGV if you
# want to list them on the command line
my $shadow_file = shift @ARGV || '/etc/shadow';
my @accounts    = @ARGV;

# Other defaults
use constant PASSWORD_LENGTH => 8;

# Doing it this way lets us make one pass over /etc/shadow and preserve
# its line order
my %new_passwords = ();
foreach my $a (@accounts) {
    $new_passwords{$a} = generate_password();

# Backup /etc/shadow
my $backup_file = $shadow_file . ".BACKUP";
system("cp -p $shadow_file $backup_file");
die "cp failed to backup $shadow_file to $backup_file"
  if ( $? != 0 );

# Re-write /etc/shadow
open( my $backup, '<', $backup_file )
  || die "open: Unable to read $backup_file ($!)\n";
open( my $shadow, '>', $shadow_file )
  || die "open: Unable to write $shadow_file ($!)\n";

process_shadow( $backup, $shadow );



# process_shadow
#   Given two file handles, read in the first file handle and copy to the second.
#   When the first file handle reads in a shadow record for a user whose password
#   we are changing, we will swap out the password and print the new password
#   on stdout.
sub process_shadow {
    my ( $backup, $shadow ) = @_;

    # Days since UNIX epoch, the time format used by Solaris in /etc/shadow
    my $last_changed = int( time() / ( 60 * 60 * 24 ) );

    # Read each line from backup, modify if the user is having their password
    # changed, and print the new password on stdout
    my $line = 0;
    while (<$backup>) {
        if (/^ ([^:]+) : ([^:]{2})/x) {
            my $user = $1;
            my $salt = generate_password(2);

            if ( defined( $new_passwords{$user} ) ) {
                print "$user,$new_passwords{$user}\n";
                my $hashed = crypt( $new_passwords{$user}, $salt );
                s/^ ([^:]+:) [^:]+ : \d+:/$1$hashed:$last_changed:/x;
            print $shadow $_;

        else {
            warn "$shadow: Unable to parse line $line\n";


# Return a new (random) password.
# Props:
sub generate_password {
    my ($length) = @_;
    $length ||= PASSWORD_LENGTH;

    my $ALLOWED =

    my $password = '';
    while ( length($password) < $length ) {
        $password .= substr( $ALLOWED, ( int( rand( length $ALLOWED ) ) ), 1 );
    return $password;

Here endeth the hackery. :-)

Postscript: The was originally filed as "Mass Change of Passwords on Solaris" 26 Oct 2008. It was re-posted 2 July 2009 because the code had become corrupted somehow.

Monday 6 October 2008

Invalidating Page Caches in Mason

We have a number of websites that use Mason, a (relatively) old Perl library reportedly used by Amazon and Salon among other sites. It has built-in and flexible support for caching at pretty much any level the developer needs, but often the easiest thing to do is to cache the output of an entire component, which looks like this:

return if $m->cache_self(key => $key,
expires_in => '3 hours' );

What I wanted to do was invalidate the cache when the users browser had a Cache-Control or Pragma directive that indicated they did not want cached content (for example, when the user holds down the Shift key and clicks "Refresh" or "Reload").

The naive implementation does not work:

# Somewhere, $cache_ok is set to 0 if Cache-Control
# indicates no caching wanted
if ( $cache_ok ) {
return if $m->cache_self(key => $key,
expires_in => '3 hours' );

I mean, it does work for the initial request but all subsequence requests will continue to get the old cached content because the cache has not been invalidated, only skipped. The very next request for that component will get the old, cached content -- not a cached copy of the freshly calculated content.

Slightly less naive implementations didn't work either because of the way cache_self works. I was basically trying variations of "expires yesterday" but this parameter was not being consulted when Mason considered whether or not to return the cached content.

Cutting a long story short (well, not that long but kinda boring) here's how I got the page cache to be invalidated ("expired") when the user hits Shift-Refresh:

    if (  $cache_ok ) {
$m->cache_self(key => $page_cache_key, expire_if => sub { 1 }  );
} else  {
return if $m->cache_self(key =>  $page_cache_key, 
expires_in => '4 hours'  );
[... continue with component...]

The variable $cache_ok defaults to "1" but is set to "0" if no-cache is found in the Cache-Control request header (for completeness, also check the Pragma directive althought that might indicate the presence of a browser so old it wears flares).

What this code does is returns the cached content (provided it's younger than 4 hours) most of the time. But if the requesting client has indicated it does not want cached content then it invalidates Mason's copy of the cached output and then falls through to the bottom of the if block where normal. non-cached processing occurs.

Post-script: Since publishing this post I've changed and reformatted the source code slightly.

Friday 11 July 2008

Tuning Nagios Load Checks

[See also: check_load initial values cheat sheet].

The standard Nagios plugins include a "check_load" command which will raise a warning or error if the load averages for the target machine exceed some threshold. A little while ago the ops manager and I were discussing what those thresholds should be.

The usage for the check_load command is as follows:
Usage: check_load -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15
Without looking at the source I'm pretty sure that the program is either just opening a pipe to uptime or using the /dev/proc file system to read the load averages for the past 1, 5 and 15 minutes. Should be safe to assume then that Nagios' concept of load is exactly the same as uptime's and that the figures are ultimately is coming from the kernel scheduler. (Note: Yup. :-) Just checked the source.)

So the first question is: what does "load" actually measure?

Unix and Linux Load

Breifly, when Unix machines report their "load" (usually through uptime, top or who) they are reporting a weighted average of the number of processes either running or waiting for the CPU (Linux will also count processes that may be blocked waiting on I/O). This average is calculated over 1, 5 and 15 minutes (hence the three values) based on values that are sampled every 5 seconds (on Linux at least). Dr Neil Gunther has written more than you might ever want to know about how those load averages are calculated and what they mean. It's an excellent series of articles (see also the inevitable Wikipedia article).

So assuming we have a single-core CPU, a load value of "1.0" would suggest that the CPU has been 100% utilised over whatever reporting period that figure was calculated for. A load of "2.0" would mean that whenever one process had the CPU there was another that was forced to wait. However, if we have 2 cores, the same "2.0" load value would suggest that both processes got the CPU time they needed, while a load of "1.0" would suggest the CPU had only been at 50% capacity.

On a simple web server, running a single 2-core CPU a load average of "2.0, 1.0, 0.5" suggests that, over the last minute, the CPU has been 100% utilised; over the last 5 minutes it's been 50% utilised; and over the last 15 minutes, it's been 25% utilised. Halve those values if 4 cores are available and double them if only one is in the system.

You can see then that sensible threshold values for warning and critical states requires you to consider how many CPUs and CPU cores your system has. You're therefore probably going to want to set your thresholds per machine or at least set them differently for each different type of configuration.

For example, one of our Solaris boxes has 12 cores so a load of "6.0" is nothing to be concerned about. However that same load figure on another, single-core box might be worthy of a warning or even critical alert, depending on how sensitive we were to process queue lengths on that box. Except if that box is a Linux box with a lot of I/O and slow devices (like a tape drive) and is counting processes that are sitting idle and waiting for an I/O operation to finish. And what is the application running on it? Is it threaded? How is your kernel counting threads in that total -- or is it just counting processes?

Setting the Check_Load Thresholds

So determining an appropriate warning and critical set of threshold values for check_load will depend on what you think a reasonable process queue length will be; how your specific system treats threads; how your applications on that system behave (and their expected responsiveness levels); and how many CPUs / cores your system has. Oh -- and your performance targets or SLAs.

This is why experienced admins use a time honoured, complicated heuristic process to set an initial value and then continually adjust that value based on the correlation of alerts raised and actual performance and hence user impact.

In other words: we rub our bellies and take a guess and then change the values if we get too many or too few alerts. We're experienced sysadmins -- how much time do you think we have? :-)

In our case, for web servers, we decided that over 5 and 15 minute periods we expect spare capacity on the box -- but we only want to be alerted if the box is basically maxing out on CPU over a significant period. Over 1 minute we expect the occasional spike and don't really want an alert unless it's way beyond expectations. We're using Apache with no threading so 1 load point = 1 process using or waiting for CPU.

We've set warning levels for 15 minute load average at number of CPU cores times 2 (plus one!). For 5 minutes increase the threshold by 5. For one minute, increase it by 5 again. Critical threshold starts at number of CPU cores times 4 and then follows the same pattern for the 5 and 1 minute warning.

Here's a sample nrpe.cfg config file for a web server with 2 cores:
command[check_load]=/path/check_load -w 15,10,5 -c 30,25,20
It's important to actually test this set up. Use ApacheBench or JMeter or similar tool to get your load average up and test performance under those thresholds to see if it's acceptable. If your application is unacceptably slow from a user perspective at lower load values then lower your thresholds.

More Information

I've put together a little check_load cheat sheet that has some initial values for some common configurations. It might be a useful starting point if you're just starting to configure check_load in your Nagios environment.

[Note: This post has been edited since initial publication.]

Friday 4 July 2008

Cisco VPN Software Breaks When Upgrading to Leopard

I recently upgraded my MacBook to OS X 10.5 ("Leopard"). By and large it's been a pretty painless experience except for the fact that the work issued Cisco VPN client often starts up with an error. The error says, in part "Unable to communicate with the VPN subsystem."

Googling around the suggested fix is to restart the Cisco kernel extension. The following terminal command takes care of things nicely:
$ sudo /System/Library/StartupItems/CiscoVPN/CiscoVPN restart
I'm running version 4.9.01 by the way.

What the hell, here's a video of it if the command line is not where your heart truly lies.

Tuesday 20 May 2008

First Look at Django

Django is MVC... kinda

Django (and all of the frameworks I'm looking at) is based on the "MVC" software architecture pattern. Since these are largely personal notes (can't imagine anyone else reading this) I'm not going to cover the basics here. You can find that in the Django Book or on Wikipedia.

But here's an interesting quote from Chapter 5 of the Django Book:
Because the “C” is handled by the framework itself and most of the excitement in Django happens in models, templates, and views, Django has been referred to as an MTV framework. In the MTV development pattern,

  • M stands for “Model,” the data access layer. This layer contains anything and everything about the data: how to access it, how to validate it, which behaviors it has, and the relationships between the data.
  • T stands for “Template,” the presentation layer. This layer contains presentation-related decisions: how something should be displayed on a Web page or other type of document.
  • V stands for “View,” the business logic layer. This layer contains the logic that access the model and defers to the appropriate template(s). You can think of it as the bridge between models and templates.
If you’re familiar with other MVC Web-development frameworks, such as Ruby on Rails, you may consider Django views to be the “controllers” and Django templates to be the “views.” [...] Neither interpretation is more “correct” than the other. The important thing is to understand the underlying concepts.
This sounds like a subtle but possibly important distinction, particularly as I move between frameworks. I have some initial concerns about Django's interpretation of MVC but I'll keep that to myself until I've really given it time to make sense (basically, how does the split of business logic between Django "model" and "view" manifest itself in the design of your code, and can it lead to anemic domain models?).

Why Django?

Behind the 8-ball here. Not only is Django new to me but Python isn't a language I code in regularly -- in fact it's been some time since I've played with it. I like the look of Python. I've coded some trivial programs with it. The "white space thing" doesn't bother me and I don't really understand why it would bother anyone. Then again it's not the first language I've seen that does it. The first language that I saw that particular feature in was Occam, and I really liked Occam (not that I ever did anything useful with it).

So why Django?
  • It's based on Python and Python seems like it should be a really good language to work in. I'm a Perl refugee in some ways. I'm finding it hard to recruit really good Perl coders (as opposed to just plain old Perl coders) and I suspect that people who have taken the trouble to learn Python (or Ruby, or Scala, or LISP) are more likely to care about good code. Not because C# and Java programmers don't but ... well, in this market why would you learn Python if it wasn't because you cared about being a better programmer?
  • There are other Python frameworks that look very interesting: TurboGears and Pylons for example. But I have limited time and Django appears to have an edge for reasons I can't yet articulate. I'll probably drill down into TurboGears later.
  • Django is the first framework supported by Google App Engine.
  • The Django core team say sensible things about web development and the community appears active and supportive. A little smug sometimes but hey, wouldn't you be, if you were working in the One True Language too? :-)

Friday 16 May 2008

Notes on "Mental Detox"

So apparently of three that set out I was the only one to make it to Friday, which is not even a full week. For shame Kate, for shame. :-)

In the 5 days during my self imposed personal net exile:
  • I clocked up a modest amount of spam (219 messages), mailing list messages (184 conversations) and personal e-mail.
  • There were 634 RSS items in Google Reader.
  • There were 19 podcasts downloaded by iTunes.
And I learned:
  • That I don't miss, like or need Facebook. But it seems rude to delete my account...
  • That I follow far too many total freakin' strangers on Twitter. I need some Twitter Equilibrium :-)
  • I have too many RSS feeds that don't really tell me anything I need to know.
Well. Just some random notes from no-one of consequence. Seriously, why are you reading this?

Sunday 11 May 2008

Mental Detox Week (redux)

Mental Detox week was 21 - 27 April but I missed it completely (despite being an Adbusters subscriber). So Kate Carruthers and some of her friends (including me) are doing a Mental Detox week from May 12 to 17. Which isn't actually a full week but baby steps people, baby steps.

Given that I'll still be working I'll still need to use a computer. But
  • No IM
  • No personal e-mail
  • No Facebook (that one will be easy)
  • No iPod (that one will be hard)
  • No TV, radio or Google Reader.
  • And no Twitter.
Don't watch TV anyway and the radio sucks but not sure what the policy is on the print version of newspapers. Not reading my RSS feeds in Google Reader leaves me with the feeling that I might "miss something." What if they announce the next Google App Engine next week? On the other hand, real news can put that kind of event into perspective.

In hindsight it was pointless to commit to this during a working week. There's still an awful lot of noise to deal with at work. Next time I'll try this kind of electronic-blackout during annual leave but I'm committed now so...

See you in a week. :-)

Friday 9 May 2008

About the Application

The application is going to be small "to do" app loosely based on David Allen's Getting Things Done. Not that the world needs another "to do" app (or even another GTD app) but it fits the basic criteria: it's small but not trivial, capable of being built in a few days and the problem is well understood enough that I can concentrate on learning the basics of the framework rather than solving some other problem.

Here's a short summary of the requirements:
  • Users need to be able to register with a user name and password to save and access their lists. The "stretch target" would be to allow users to start making lists anonymously, and then to save that work upon registering. If they already have an account, then the anonymous list and previously saved list should be merged sensibly.
  • Users will make lists of tasks. So we have the concept of a "list" (users must have at least one). A list should have a name and, when users are looking at the "list of lists" should be able to be sorted arbitrarily (in other words, users should be able to force a particular list to the "top" or above another list).
  • Lists contain tasks. A task needs a name, some notes, and the same sorting behaviour as lists (ie give users full control over the order of the list).
  • Each task can have exactly one context. A context is a GTD concept -- it's basically the things you need to get a task done. Example context might be "at work", "at home", "online" -- so that a task that has the context "at work" is something to do at work. Each user can have their own set of unique contexts but it's expected that the application will provide some sensible defaults.
  • Stretch target: Allow tasks to have an arbitrary number of contexts which represent the intersection of those requirements. A task that has the context "home" and "online" is one that requires an Internet connection from home to do. Not very "GTD-y" to make things overly complex though...
  • There is the concept of a context being "current." In other words, you tell the application "I am at work" and it will only show you the "at work" tasks.
  • Stretch target: When a user logs on, the default context should be the last context they were using from that particular device. For example, if they've logged on from their iPhone and have set their current context to "errands" then the next time they log on from their iPhone that should be the default context. However, if they log on from work immediately after, the default context will be "at work." It is anticipated that the browser string will be unique enough for this.
  • Stretch target: Mobile and Twitter support.

So, there you have it. Will not set the world on fire but I'm confident that we can get to know some of the frameworks well enough to get a "feel" for how they see the problem and how much code it takes to get it going.

I shall call it "lulztodo", because we're doing it for the lulz (thanks Nick).

Thursday 8 May 2008

Method and Approach to the Evaluation

I think we learn best by doing. I learn best by doing. So I'm going to attempt to build an application in some of the frameworks I'm looking at.

The application will need to be
  • Small: Nothing too fancy -- the point is to learn a little about web development with a given framework, not solve an actual real problem. That will come later.
  • Well understood: In order to concentrate on learning the framework, I'm not going to try and solve the Travelling Salesman problem in O(1) time.
  • Non-trivial: Despite the above, it will need to be a bit more fancy than "Hello World." It should require persistent data storage, authentication and do something vaguely useful.
So, if I develop the same application over and over again in different frameworks each time I hope to get roughly comparable results.

Wednesday 7 May 2008

Searching for the One True Framework

No, I don't really believe there is One True Framework. But I am thinking about future web architecture directions for our group.

I'm going to write a series of posts comparing and contrasting a few different languages and framework combinations. These frameworks seem to be emerging as leaders in their language / platform niche:
That list in no particular order by the way. I don't think I could sensibly look at all of them or do them justice. And it might be smarter to pick a language first and then evaluate the frameworks available for it. But at this stage I'm going for "breadth" first and then I'll drill down into more detail.

Sunday 4 May 2008

About this Blog

Purpose and Function

To aid memory.

What is a "Hissohathair?"

 I do not know.

The origin of the name comes from some graffiti scrawled on a few walls in what was my home suburb of Enmore  (Sydney, Australia). It was variously written as "hisso hathair", "his so hat hair", and "hiss ho hat hair."

I never did find out what that meant. But it was amusing (to me), unique (and therefore registrable) and opaque.

[Note: This post was edited after it was first published, to add the explanation for the blog's name.]

Saturday 3 May 2008

Who are you?

Who are you?

No-one of consequence.

I must know!

Get used to disappointment.