Description
Nagios Cloudwatch is a set of scripts to help with the Nagios (and derivates) monitoring of Amazon Cloud resources. Features includes:
- Amazon AWS cost monitoring
- Amazon EC2
- Instance running - Shows the running status of an instance
- Statistics metrics - Lets you monitor and alert on all AWS EC2 metrics (like CPUUtilization)
- Amazon ELB
- Statistics metrics
Installation:
3 simple steps- Download
- Configure
- Run
Download
Either download the lastest stable release as a tar-ball or zip-file, or clone the git repository.mkdir /path/to/your/plugins
cd /path/to/your/plugins
wget -O nagios-cloudwatch.latest.tar.gz https://github.com/maglub/nagios-cloudwatch/tarball/master
tar xvzf nagios-cloudwatch.latest.tar.gz
mkdir /path/to/your/plugins
cd /path/to/your/plugins
git clone git@github.com:maglub/nagios-cloudwatch.git
cd nagios-cloudwatch
git checkout stable
Configure
- Create a read-only user in AWS and associate it with your environment
- Put the access key and secret key in a config.yml file in the same directory as these scripts (or somewhere else if you intend to use the -C parameter)
aws:
#======================
#--- authentication
#======================
access_key_id: YOUR_ACCESS_KEY
secret_access_key: YOUR_SECRET_KEY
#========================================
#--- default region, unless overridden on the command line
#========================================
#--- possible regions us-west-1 us-west-2 eu-west-1, etc...
region: us-west-2
#======================
#--- Proxy config
#======================
#proxy_uri: http://user:passwd@IP:PORT
Run
Here are a couple of examples on how to run the script:
$ ./check_cloudwatch.rb --ec2 --list-instances
Name: vpn-v001ec2 Id: i-91f62199 privateIp: 192.168.99.6 State: Zone: running
Name: kmg-test002ec2 Id: i-7ef62176 privateIp: 192.168.99.10 State: Zone: running
Name: win-test003ec2 Id: i-9956e691 privateIp: 192.168.99.20 State: Zone: stopped
$ ./check_cloudwatch.rb --ec2 --instance=i-7ef62176 --list-metrics
i-7ef62176;CPUUtilization
i-7ef62176;StatusCheckFailed_System
i-7ef62176;DiskWriteBytes
i-7ef62176;NetworkIn
i-7ef62176;DiskReadOps
i-7ef62176;NetworkOut
i-7ef62176;StatusCheckFailed_Instance
i-7ef62176;DiskReadBytes
i-7ef62176;DiskWriteOps
i-7ef62176;StatusCheckFailed
$ ./check_cloudwatch.rb --ec2 --instance=i-7ef62176 --metric=CPUUtilization --window=600 --period=300 -w 80 -c 95
OK - Id: i-7ef62176 Metric: CPUUtilization, Last Value: 0.334000 Unit: Percent (2014-06-16 15:43:00 UTC)
|Average=0.334000 Minimum=0.000000 Maximum=1.670000 Sum=1.670000
Pre requisites
Ruby
Ubuntu 12.04 LTS
If your installation come with ruby 1.8, you might have to start the script with /usr/bin/ruby1.9.1 ./check_cloudwatch.rb
, unless you follow the second step on how to make ruby1.9.1 default in your installation.
sudo apt-get install -y ruby1.9.1 ruby1.9.1-dev \
rubygems1.9.1 irb1.9.1 ri1.9.1 rdoc1.9.1 \
build-essential libopenssl-ruby1.9.1 libssl-dev zlib1g-dev
sudo gem install aws-sdk-core --pre
Note: to make ruby1.9.1 default on your system, follow the instructions on:
sudo apt-get update
sudo apt-get install ruby1.9.1 ruby1.9.1-dev \
rubygems1.9.1 irb1.9.1 ri1.9.1 rdoc1.9.1 \
build-essential libopenssl-ruby1.9.1 libssl-dev zlib1g-dev
sudo update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby1.9.1 400 \
--slave /usr/share/man/man1/ruby.1.gz ruby.1.gz \
/usr/share/man/man1/ruby1.9.1.1.gz \
--slave /usr/bin/ri ri /usr/bin/ri1.9.1 \
--slave /usr/bin/irb irb /usr/bin/irb1.9.1 \
--slave /usr/bin/rdoc rdoc /usr/bin/rdoc1.9.1
# choose your interpreter
# changes symlinks for /usr/bin/ruby , /usr/bin/gem
# /usr/bin/irb, /usr/bin/ri and man (1) ruby
sudo update-alternatives --config ruby
sudo update-alternatives --config gem
# now try
ruby --version
RedHat/CentOS
https://gist.github.com/trevorrowe/1870314
sudo yum install -y gcc make \
libxml2 libxml2-devel libxslt libxslt-devel \
rubygems ruby-devel
sudo gem install nokogiri -- --with-xml2-lib=/usr/local/lib \
--with-xml2-include=/usr/local/include/libxml2 \
--with-xslt-lib=/usr/local/lib \
--with-xslt-include=/usr/local/include
sudo gem install aws-sdk --no-ri --no-rdoc
OP5 Appliance (CentOS)
Pre-requisites already in place.
Usage
Examples
- Check an ELB (Elastic Load Balancer) for the number of healthy hosts.
./check_cloudwatch.rb --instance="<INSTANCE_NAME>" --elb --metric="HealthyHostCount" --window=120 --period=60 --critical=:1+ --warning=:2+
OK - Metric: HealthyHostCount, Last Average: 2.0 Count (2014-06-14 13:34:00 UTC)
|Average=2.0,Minimum=2.0,Maximum=2.0,Sum=240.0
- Check an ELB for the total number of ok requests over the last 5 minutes, warning when the number of requests equal or exceed 10 requests, critical at 15
./check_cloudwatch.rb -i <INSTANCE_NAME> --window=3600 --metric=HTTPCode_Backend_2XX --window=300 --period=300 --critical=15 --warning=10 --statistics="Sum"
Thresholds
- --warning={@}{+}, -w {@}{+}
- --critical={@}{+}, -c {@}{+}
The threshold parameter can be a single value or a range, and can handle decimal values.
- A threshold can be checked to be within a range, or outside a range.
- To alert when a value is outside a range, use the prefix "@".
The thresholds can be "soft" or "hard", meaning that ha hard threshold will include the parameter value (by comparing >= or <=). A soft threshold means that the check will not trigger when the checked value is equal to the threshold value (by comparing > or <).
A soft threshold is selected by suffixing "+" to the threshold.
Examples of valid thresholds are:
1, 1.0, 1:+, :1.5, 0:1000, @1:100
- "-c 75" will trigger when the value is equal to or larger than 75
- "-c 75+" will trigger when the value is larger than 75
- "-c 0:1" will trigger when the value is equal to or larger than 0 and equal to or less than 1
- "-c 0:1+" will trigger when the value is larger than 0 and less than 1
- "-c @0:1+" will trigger when the value is outside the soft range
Listing metrics
You can list available metrics for your instance, your load balancer, etc, by using the --list-metrics parameter.
./check_cloudwatch.rb --ec2 -i INSTANCE_ID --list-metrics
Statistics window, and statistics period
The collection of AWS metrics is not done every minute. For example CPUUtilization is collected every 5 minutes. If you are asking for a window of 60 seconds and a period of 60 seconds, it is very likely that Cloudwatch will return an empty result set since there is no data to be presented for that period. This is a feature of how the AWS Cloudwatch data is collected.
The workaround for this is to ask for a longer period, say 10 minutes or longer, to make sure you will get at least one metric in your result set.
./check_cloudwatch.rb --ec2 -i INSTANCE_ID --metric="CPUUtilization" --window=600 --period=60