Amazon Web Services are very powerful and some things are harder than you might expect. Getting an email notification that something is wrong is one of those things. I originally wrote this up so that I would remember next time I needed it and then decided it might be helpful to others.
You need to set up several things to get this to work. You need a metric, which is some value that indicates if the system is ok or not. You need an alarm that fires when that metric indicates failure. You need an SNS topic to get notified when an alarm fires. And lastly you need a subscription to that topic which a where you specify a phone number or email to receive the alarm notification.
You need to set up several things to get this to work. You need a metric, which is some value that indicates if the system is ok or not. You need an alarm that fires when that metric indicates failure. You need an SNS topic to get notified when an alarm fires. And lastly you need a subscription to that topic which a where you specify a phone number or email to receive the alarm notification.
create a subscription…some place to send the
notification
HTTP, HTTPS, email, sms, SQS
and protocol specific “endpoint”
You will receive a confirmation email, text, etc. depending on the protocol you
specified asking if you really want to create this subscription.
Create a Cloud Watch metric named
ZkServerCount
"mon-put-data" is used both to put the data and to create the metric itself if it doesn’t
already exist.
We use the command line to create
the metric because CloudWatch seems to have a problem if you have defined a lot
of metrics…and its easy to create a lot of metrics. Each instance you create automatically gets half a dozen
standard metrics and those metrics live for 14 days regardless of how long your
instance lives. If you create thousands
of worker instances you seem to
overwhelm the CloudWatch gui.
We’ve reported this as a bug and are working with Amazon about it but in
the meantime we use the command line to create metrics.
Setting up to use the command
line
setup
environment vars as described in README.TXT. You must in particular setup the AWS_CLOUDWATCH_HOME
variable
set up
credentials as per the readme.
Run this command which will both create the metric and give it its first
value of three: mon-put-data –metric-name
ZkServerCount –namespace MyNameSpace –timestamp 2013-08-25T00:00:00Z – value 3
Create a Cloud Watch metric alarm
based on that metric; this alarm will fire when the
metric’s value is less than 3:
bin/mon-put-metric-alarm –alarm-name zk-mon
–alarm-description “Alarm upon zookeeper server failure” –metric-name
ZkServerCount –namespace MyNameSpace –statistic Average –period 300 –threshold
3 –comparison-operator LessThanThreshold –evaluation-periods 1 –alarm-actions
arn:aws:sns:us-east-1:abc123:ZooKeeper_failure<
Create some code that will
periodically run the mon-put-data command from step 2 above. This looks like a lot of code to put a single value into the system but
bear in mind that this includes some one-time setup, and that you can add multiple
values at a time.
public void putMetric() {
BasicAWSCredentials
credentials =
BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
AmazonCloudWatchClient acwc
= new AmazonCloudWatchClient(credentials);
Double metricValue = 3.0;
MetricDatum datum2
= new MetricDatum().
withMetricName(INSTANCE_COUNT_METRIC_NAME).
withTimestamp(new
Date()).
withValue(metricValue).
withUnit(StandardUnit.Count);
PutMetricDataRequest putMetricDataRequest = new PutMetricDataRequest().
withNamespace(FRAGMENT_NAMESPACE).
withMetricData(datum1, datum2, datum3);
acwc.putMetricData(putMetricDataRequest);
}
}
When you are all done its of course a good idea to trigger the condition manually just to make sure you actually do get that text or email that you really don't want to get for real!