Friday 6 March 2009

Initial Load Values for Nagios Load Checks (Cheat Sheet)

I've put together a cheat sheet to show how you might want to initially configure your Nagios load checks. The thinking behind these initial values is set out in Tuning Nagios Load Checks.

Use OS Cores Warning Critical Notes
CMS (Teamsite) Solaris 1 10,7,5 20,15,10 Testing shows this app to be responsive up until these loads.
Web Server Linux 2 x 4 16,10,4 32,24,20 Web servers are paired, so want to know if reaching 50% capacity regularly. Testing shows performance degradation from a load of 20.
DB Server Linux 2 x 4 16,10,4 32,24,20 Same hardware, different use. Nevertheless, using same thresholds.
Nagios Linux 1 x 2 6,4,2 12,10,7 Small box, paired with backup.

General notes:
  • The UNIX servers (particularly the Sun SPARC ones) seem to be able to stay up and responsive even under heavy load. And they don't count processes waiting for I/O in their load counts the way Linux does. I have no explanation for this. :-)
  • We track these loads over time to predict demand growth for capacity planning -- the thresholds are not a long term goal but rather a short term alert threshold.
  • Transaction or revenue-earning web servers might have lower thresholds because of the different commercial implications of performance degradation. YMMV.
For more information on the Nagios check_load command, see Tuning Nagios Load Checks.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.