Wednesday, 3 June 2009

Using CruiseControl.rb to manage a Perl Catalyst project

We're working with the Catalyst framework again, porting an old Perl 5 HTML::Mason site to Catalyst and introducing some modern Perl coding standards to a fairly old stack.

One of the things we needed was a Continuous Integration tool for the project. Since we're already using CruiseControl.rb for the Rails projects I thought it should be pretty easy to incorporate the Perl project into it.

And indeed it was:
$ ./cruise add CatalystProject --url https://svn.work.com/svn/catalystproject/trunk/

CruiseControl will run a "rake" task whenever a commit is made. So we need a small Rakefile with just enough code in it to run the standard Perl tools:

$ cat Rakefile
require 'rake'

task :default => :test

desc "Runs make test"
task :test do
  t  = system("eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib) && perl Makefile.PL && make test")
  rc = $?
  puts "\nMake finished with t=#{t} rc=#{rc.exitstatus}"
  raise "Perl make failed (rc=#{rc.exitstatus})" unless t
end

The eval line in the system call is there because we're using local::lib to manage our Perl library dependencies and we need to set the environment variables so that they can be found. To propagate any "make test" errors back out to rake we throw an exception on non-zero exit codes.

Tuesday, 12 May 2009

How to set your DNS search list in OSX

Here's how to set your DNS search path in OSX, which is something you'll want to do if you're getting redirected to eBay when you try to go to Google, or are experiencing other odd behaviour from your browser.
  1. Open System Preferences.
  2. Open Network settings.
  3. Your active network connection should be selected on the left. Select it if it is not. Click the "Advanced" button towards the bottom right hand corner.
  4. Select the DNS tab.
  5. On the right hand side is a column "Search Domains." It will probably already have your domain name listed, but it will be "greyed" out so that you can't remove it. That's OK. Click the "+" sign at the bottom of the column, and type the domain name again.
  6. You should now have the domain name listed twice. One will be grey, the other black. This is good, believe it or not. :-)
  7. Click OK.
  8. Click Apply.
Here's a screen capture that steps through it.

video

Thursday, 7 May 2009

The "search path fix" for ASZ.COm.Au is only a partial fix

Feel I need to clarify my fix to the "eBay/ASZ redirection problem" discussed in my last post.

Adding your domain name to your DNS search path will stop your DNS resolver from asking the "au.com.au" and "com.com.au" name servers for "google.com.au.com.au". So you shouldn't get "Welcome to ASZ.COm.Au" or be redirected to eBay or other unexpected behaviour.

What you'll get instead is "site not found". That's what I mean by a partial fix. If your ISPs name server is losing DNS responses (or is too slow to return them) then you'll still have problems: namely you'll get an error message when you try and visit some sites. However clicking "Refresh" or "Reload" will generally solve that problem. Once your at the site the address will be in your resolver's cache so things will be stable for a while.

Meanwhile I've written to the auDA guys asking for their opinion on defining DNS entries for other peoples domain names as sub-domains of yours.

Thursday, 30 April 2009

Welcome to ASZ.COm.Au (Or, The Resolver Library is Broken)

A few days ago, I did a Google search and got the strangest error: a 404 (File Not Found) page that claimed to be from Apache with PHP and Frontpage extensions loaded. Google doesn't run Apache.


Since that was A Very Odd Thing Indeed I went to the Google home page. This time I got a new message: "Welcome to ASZ.COm.Au" (for some reason, I feel it's important to preserve the capitalisation).

My first guess was that my DNS cache had been poisoned. Or that I'd picked up a virus somehow. When I looked on Twitter I saw that I was not alone: a few people were reporting similar problems trying to access Facebook, YouTube, Google and even eBay. A discussion about the issue had started on Whirlpool. In most cases, the issue appeared to resolve itself eventually, or after a DNS cache flush.

In my case, the issue persisted for a little while and then stopped. I was able to access Google again.

It took some time for me to figure out what was going on (hey -- I'm over 27). The intermittent nature of the issue made debugging difficult. It wasn't until my partner mentioned that she'd seen the same screen when she'd tried to access the Bureau of Meteorology  that I was able to make progress. Visiting http://bom.gov.au/ (but not http://www.bom.gov.au/) consistently reproduced the problem.

I chased down a few theories: malware, cache-poisoning, an "optus issue".  But I finally worked out that this was the result of a documented feature of the common UNIX resolver library.

When you visit a web page your computer needs to "resolve" the host name (eg "google.com.au") to an Internet address. That's the job of the resolver, which in turn uses something called a domain name server. Your browser asks the resolver "what's the address of Google.com.au" and the resolver answers. Either the resolver already knows the answer because it's looked it up before and kept a copy (a cache), or it doesn't know, and so it asks the domain name server. The domain name server in turn may ask other domain name servers, until someone, somewhere, knows the answer. The answer is then sent back through the chain, ultimately to your resolver (called the "client").

So if the resolver doesn't know an address it asks the domain name server and waits for an answer. But it will not wait forever. In fact, it typically won't wait longer than several seconds. It's an impatient little thing and if it hasn't heard back quickly enough it assumes that maybe the hostname is wrong -- maybe the user just typed part of the hostname. Or maybe the domain name server does answer in time but the resolver cannot use the answer -- the domain name server may reply with "no one knows that hostname." Either way, the resolver will start to guess what the real (or "fully qualified") hostname might be.

There are two strategies a resolver can use when it starts searching for the fully qualified hostname.

The first is to use an explicit list of search paths. You usually provide this list yourself when configuring your network settings. If you haven't set such a list, then it will employ the second strategy. (That's going to turn out to be quite handy...)

The second thing your resolver can do is a "domain name search". It takes your domain name, prepends the hostname you're looking for and does a lookup on that. I'm with Optusnet, so my domain name is "optusnet.com.au". My reslover then might lookup "bom.gov.au.optusnet.com.au" if it doesn't get a useful answer for "bom.gov.au". If it still doesn't get a useful answer (and it this case, it won't) then it starts searching "up" the domain name -- it removes the first part of the domain name and repeats the search. So its second search is for "bom.gov.au.com.au". See what it did there? It deleted "optusnet" and tried again.  (Technical note: the behaviour is documented in the resolver man page -- see RES_DNSRCH)

That right there is the flaw and the root cause of the problem.

Someone owns the domain names "au.com.au" and "com.com.au". And they have name servers. And they've set them up to answer queries for a whole host of things, among them, bom.gov.au.com.au, google.com.au.com.au and facebook.com.com.au. And our resolvers are querying them and merrily sending our browsers there if the real name servers for those domains don't get their answers back in time.

Now we know enough to say what's going on:
  1. You try and visit Google, Facebook, Twitter (or the BoM). It's been a while so the address isn't in your resolver's cache. So it does a lookup by asking your domain name server -- this is usually provided by your ISP.
  2. For whatever reason, your ISP's name server is either too slow to respond or the response is lost altogether. So your resolver "times out" and starts to "search". Your domain name ends in ".com.au" and so eventually, your resolver looks up "google.com.au.com.au" (or whatever site you're trying to visit, with ".com.au" added to the end). The name servers at "au.com.au" (or "com.com.au" depending on what you're looking up) do respond and do so in time.
  3. Your resolver gives the bogus address to the browser and stores it in the DNS cache. The "TTL" (time to live) for those addresses is 4 hours, so you're going to be stuck with that address in your cache for at most 4 hours.
  4. Eventually, your cache times out. Or maybe you know how to flush it. Either way, a second attempt by the resolver to get the right IP address works and the problem appears to be resolved.
Multiple things are going wrong here:
  • The ISPs name server is either too slow to respond or perhaps "dropping" packets (DNS packets are typically using UDP which is not a guaranteed delivery mechanism like TCP). I've seen this with Optusnet before but in the past I just got a "site not found". Such responses aren't cached so if you hit "Refresh" in your browser you typically find the site just fine the second time.
  • The name servers for "com.com.au" and "au.com.au" have records that match other peoples sites. They shouldn't. Right now, it's just confusing and annoying but its potential for phishing is obvious. It's not necessarily malicious but it should be changed.
  • The algorithm used by the resolver in both UNIX and Windows has a security flaw: it should not search all the way back to ".com.au".
To me the ultimate problem is that last one: UDP packets, and hence DNS responses, are not guaranteed to be delivered. It should be expected that sometimes, things will get busy and responses will be lost. The resolver is in error by searching the domain name all the way back to ".com.au" -- IMHO that's a security hole similar to the old "cookie monster" bug.

There's a fix though, which at least works on OSX (Mac). If you set an explicit search path, then the resolver won't use the second strategy described above. It will search the search path(s) and then stop there. I've set my search path to "optusnet.com.au", the same as my domain name, and can no longer reproduce the problem.

There are other things that can help:
  • If your ISP's domain name server is not reliable use OpenDNS. There is some anecdotal evidence that the possibility of DNS replies being late or dropped is lower. Getting the "Welcome to ASZ.COm.Au" page for Google or Facebook depended on your computer not getting the DNS response in time (or at all) so having a reliable domain name server will stop the problem happening.
  • Add a "." to the end of your hostnames when typing into the browser (for example, "google.com.au."). The trailing "." prevents the domain name searching from kicking in.
  • If you're able, configure firewalls to drop packets from the name servers at "com.com.au" and "au.com.au".
Oh, if you're wondering why "bom.gov.au" could reproduce the problem every time: Their name server does respond quickly but not with an "A" record for that domain. Which is to say, it's not a response that the resolver can use to answer the question "what's bom.gov.au's IP address". So even though the resolver gets an answer it still embarks on its domain name search until it looks up "bom.gov.au.com.au" and gets an answer (an "A" record). To reproduce the proglem with Google etc, you need to wait for the DNS response packet to go missing in order to trigger the domain search by your resolver.

Other things can be explained:
  • It appeared to be an "Optus problem" at one stage because their domain name servers are occasionally overloaded and therefore slow. The Optus domain ends in "com.au" and so the domain name search would go all the way back to ".com.au". TPG seems to have similar issues.
  • I couldn't reproduce the problem at work because my domain name there is "work.com". The resolver is smart enough not to search as far back as ".com" -- it just missed the case where a country domain has subclassifications (such as ".com.au", ".co.nz" or ".co.uk"). That's the limitation to the "counting dots" method of deciding how far to walk back.
  • Switching to OpenDNS would appear to solve the problem because the resolver didn't need to start a domain name search if it got the right answer right away.
  • Flushing the DNS cache would appear to solve the problem because it's only occasionally that DNS replies get lost. You have a good chance on your second attempt of getting the right address.
Thanks to objects, pixelgroup and 3buffalogirls for responding to my Twitter enquiries with additional information, and the posters on the Whirlpool forums.

Friday, 13 March 2009

Monitoring Rails builds with CruiseControl.rb and CCTray

More for my own memory than anything else...

CruiseControl.NET comes with a tool called CCTray  that gives you a handy way of monitoring the build status of multiple CruiseControl environments. It works out of the box with other CruiseControl.NET installations but needs a little trick to monitor the Ruby  and Java  versions (why we need the same app implemented three times is a subject for a rant one day I'm sure...).

For Ruby on Rails projects, set the monitoring URL in CCTray to this:
http://hostname.of.cruisecontrol.rb:3333/XmlStatusReport.aspx
It's not a real ASPX page but it returns XML that CCTray is expecting.

Cruise Control for Java is similar, but different ('natch):
http://hostname.of.cruisecontrol:3333/dashboard/cctray.xml
 Had trouble Googling that. :-)

Monday, 9 March 2009

Keep yourself logged in to a website with anti-idle

At $work I need to use a time sheet application which has a session timeout feature. I want a way to stay "logged in". So I've conceived a little plug-in for my personal web developer's proxy that will re-load certain web pages periodically in the background.

Could work like this:
  1. Start your personal proxy with the anti-idle plug-in in the chain (below).
  2. In your browser, go to the page you want to periodically re-load.
  3. At the end of the URL, append a CGI argument. For example you could append "?ttt_anti_idle=300" to reload the page every 5 minutes. If there are already CGI arguments in the URL just append: "&ttt_anti_idle=300".
  4. Load the new URL you've just typed. The anti-idle plug-in will strip out the extra argument you've appended prior to giving the URL to the "real" server.
  5. The anti-idle plug-in monitors its stream for "ttt_anti_idle" arguments and builds a list of pages to reload at certain intervals. It discards the result of course.
Here's how I imagine I'd set up the pipeline:

$ proxy | anti_idle --use_cgi=ttt_anti_idle | respond

[...]

Friday, 6 March 2009

Initial Load Values for Nagios Load Checks (Cheat Sheet)

I've put together a cheat sheet to show how you might want to initially configure your Nagios load checks. The thinking behind these initial values is set out in Tuning Nagios Load Checks.

Use OS Cores Warning Critical Notes
CMS (Teamsite) Solaris 1 10,7,5 20,15,10 Testing shows this app to be responsive up until these loads.
Web Server Linux 2 x 4 16,10,4 32,24,20 Web servers are paired, so want to know if reaching 50% capacity regularly. Testing shows performance degradation from a load of 20.
DB Server Linux 2 x 4 16,10,4 32,24,20 Same hardware, different use. Nevertheless, using same thresholds.
Nagios Linux 1 x 2 6,4,2 12,10,7 Small box, paired with backup.

General notes:
  • The UNIX servers (particularly the Sun SPARC ones) seem to be able to stay up and responsive even under heavy load. And they don't count processes waiting for I/O in their load counts the way Linux does. I have no explanation for this. :-)
  • We track these loads over time to predict demand growth for capacity planning -- the thresholds are not a long term goal but rather a short term alert threshold.
  • Transaction or revenue-earning web servers might have lower thresholds because of the different commercial implications of performance degradation. YMMV.
For more information on the Nagios check_load command, see Tuning Nagios Load Checks.

No more stupid YouTube comments

Prompted by Mark Damon Hughes' Stupid Comments Be Gone I wrote a small script that took YouTube HTML in on stdin, stripped out the comments, and spat the remainder out on stdout (Mark's trick uses CSS to hide them).

Now I can do this:

$ proxy | connect | kill_youtube_comments | respond
[...]

And lo! Works in all browsers. :-)

Breaking it down:
  1. The proxy command listens on port 8080 (I configure my browser to proxy to localhost:8080). It spits all requests it sees to stdout.
  2. The connect command reads a HTTP request on stdin, connects to the remote server, fetches the content, and spits a HTTP request on stdout.
  3. The kill_youtube_comments command reads in HTML and strips out the div that contains YouTube comments.
  4. The respond command reads a HTTP response and sends that (via named pipe) back to the proxy command so that it can return it to the browser.

I sometimes wonder if anyone else in the world would find a personal, hackable proxy useful.

Friday, 16 January 2009

Using Blogger's new Import Blog function to import an RSS-based blog

[UPDATE: I've released the code that I referred to below as a GitHub Gist.]

I've been playing with Blogger's Import Blog feature , made available in Blogger in Draft last year.

Google explicitly state that only Blogger exported blogs are supported. Blogger exports its blogs in Atom format. I thought perhaps that I could convert an RSS feed to Atom and then import that into Blogger and thereby move some old non-Blogger blogs over to Blogger.

Alas, no joy! The Blogger Import tool is quite fussy about its Atom. For example, if you export a blog in Atom format, and then run that through an XML formatting tool and re-import, you'll find that Blogger complains about the uploaded file.

However, I've kept at it, and now have a simple script that can take an RSS feed and convert it to an Atom format that Blogger seems happy with. It's not quite there -- a few posts are silently dropped for reasons I haven't figured out yet. I'm toying with the idea of eventually releasing it. Of course, I'm not the only one .

via hissohathair.blogspot.com

Thursday, 15 January 2009

Oh! Look! Time_t party coming!

At 10:30:31 on Friday the 14th of February this year (Sydney time) the UNIX epoch time will be "1234567890".

Time for a time_t party !

via hissohathair.blogspot.com