Thursday, September 12, 2013

How to Setup Amazon CloudWatch to Send Alarm Emails



Amazon Web Services are very powerful and some things are harder than you might expect.  Getting an email notification that something is wrong is one of those things.  I originally wrote this up so that I would remember next time I needed it and then decided it might be helpful to others.

You need to set up several things to get this to work.  You need a metric, which is some value that indicates if the system is ok or not.  You need an alarm that fires when that metric indicates failure.  You need an SNS topic to get notified when an alarm fires.  And lastly you need a subscription to that topic which a where you specify a phone number or email to receive the alarm notification.

create an SNS topic from SNS gui (press “Create and Add”)



give the topic a name and a display name





 create a subscription…some place to send the notification
HTTP, HTTPS, email, sms, SQS  and protocol specific “endpoint”




You will receive a confirmation email, text, etc. depending on the protocol you specified asking if you really want to create this subscription.

Take note of the “ARN” for the topic, it will be used as the alarm-action below






Create a Cloud Watch metric named ZkServerCount
"mon-put-data" is used both to put the data and to create the metric itself if it doesn’t already exist.

We use the command line to create the metric because CloudWatch seems to have a problem if you have defined a lot of metrics…and its easy to create a lot of metrics.  Each instance you create automatically gets half a dozen standard metrics and those metrics live for 14 days regardless of how long your instance lives.  If you create thousands of worker instances  you seem to overwhelm the CloudWatch gui.  We’ve reported this as a bug and are working with Amazon about it but in the meantime we use the command line to create metrics.

Setting up to use the command line
setup environment vars as described in README.TXT.  You must in particular setup the AWS_CLOUDWATCH_HOME variable
set up credentials as per the readme.

Run this command which will both create the metric and give it its first value of three: mon-put-data –metric-name ZkServerCount –namespace MyNameSpace –timestamp 2013-08-25T00:00:00Z – value 3



Create a Cloud Watch metric alarm based on that metric; this alarm will fire when the metric’s value is less than 3:

bin/mon-put-metric-alarm –alarm-name zk-mon –alarm-description “Alarm upon zookeeper server failure” –metric-name ZkServerCount –namespace MyNameSpace –statistic Average –period 300 –threshold 3 –comparison-operator LessThanThreshold –evaluation-periods 1 –alarm-actions arn:aws:sns:us-east-1:abc123:ZooKeeper_failure< 



Create some code that will periodically run the mon-put-data command from step 2 above.  This looks like a lot of code to put a single value into the system but bear in mind that this includes some one-time setup, and that you can add multiple values at a time.    




public void putMetric() {
  BasicAWSCredentials    credentials = BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
  AmazonCloudWatchClient acwc = new AmazonCloudWatchClient(credentials);
  Double metricValue = 3.0;
             MetricDatum datum2 = new MetricDatum().
                withMetricName(INSTANCE_COUNT_METRIC_NAME).
                withTimestamp(new Date()).
                withValue(metricValue). 
                withUnit(StandardUnit.Count);
           PutMetricDataRequest putMetricDataRequest = new PutMetricDataRequest().
                withNamespace(FRAGMENT_NAMESPACE).
                withMetricData(datum1, datum2, datum3);
           acwc.putMetricData(putMetricDataRequest);
       }


When you are all done its of course a good idea to trigger the condition manually just to make sure you actually do get that text or email that you really don't want to get for real!

3 comments:

  1. Amazon Web Services are very powerful and some things are harder than you might expect. Getting an email notification that something is wrong is one of those things. Web based Time Tracking Software

    ReplyDelete
  2. I keep getting "float object has no attribute Strip" error while trying to put in an alarm. What am I doing wrong?

    ReplyDelete