Monitoring CloudWatch key metrics using Slack and manage them using Terraform.

Jose López
November 18th, 2019 · 2 min read

This is the first article of a series of articles about monitoring.

-> Second article: APM using Datadog

-> Third article: Monitoring CodePipeline deployments

Alarm and monitoring systems are a key part of mature products and applications. We are going to learn about some key metrics we must monitor on practically every application. This article will be focused on AWS CloudWatch metrics. We will alert to Slack based on these metrics values and we will use Terraform to manage them as IaC (infrastructure as code) and make them reusable.

AWS CloudWatch key metrics

Amazon CloudWatch is a monitoring and observability service provided by AWS. It’s one of the easiest and better ways to collect data on AWS. In this article, we are going to focus on the following CloudWatch metrics.

  • Application Load Balancer (ALB): 5XX errors, latency, rejected connections and unhealthy hosts.
  • Relational Database Service (RDS): free storage space.

There are some metrics for other services you should also monitor if you are using them.

  • Simple Queue Service (SQS): age of the oldest message in the jobs queue and length of the dead-letter queue (DLQ).
  • Elastic Block Storage (EBS): volume status, volume queue length and volume idle time.
  • Elastic Compute Cloud (EC2): status checks.
  • Elasticache: cache hits and misses and evictions.
  • Lambda: duration, errors, throttles.

Alerting to Slack

We recommend you follow along with the source code for this article.

We are going to leverage CloudWatch Alarms to alert to Slack when a metric surpass a determined threshold. Besides this, we will need a Lambda function and an SNS topic to send messages to Slack.

SNS topic

First of all, we need to create an SNS topic. We will configure our CloudWatch Alarms to notify this topic when an alarm is raised. Next, we have to create a subscription for this topic. This subscription will execute the Lambda function that parses the message data and post a message to Slack.

Lambda function

We need a Lambda function to send messages to Slack. Our Lambda function parses CloudWatch Alarms event messages and extract fields to show a message like the one shown below. It can also be used to send other messages to Slack than CloudWatch Alarms.

You’ll need to create a Slack webhook and set it as an ENV variable for the Lambda function.

Cloudwatch Alarms

Now we have an SNS topic and a Lambda function to post messages to Slack, we will create alerts for the above metrics.

1* Metric namespace: AWS/ApplicationELB
2* Metric name: HTTPCode_ELB_5XX_Count
3* Metric dimension: LoadBalancer
4* Metric period: 1 minute
5* Number of periods: 5
6* Statistic: Sum
7* Alarm condition: > 1
1* Metric namespace: AWS/ApplicationELB
2* Metric name: HTTPCode_Target_5XX_Count
3* Metric dimension: LoadBalancer
4* Metric period: 1 minute
5* Number of periods: 5
6* Statistic: Sum
7* Alarm condition: > 1
1* Metric namespace: AWS/ApplicationELB
2* Metric name: TargetResponseTime
3* Metric dimension: LoadBalancer
4* Metric period: 1 minute
5* Number of periods: 5
6* Statistic: Average
7* Alarm condition: > 0.2 seconds
1* Metric namespace: AWS/ApplicationELB
2* Metric name: RejectedConnectionCount
3* Metric dimension: LoadBalancer
4* Metric period: 1 minute
5* Number of periods: 5
6* Statistic: Sum
7* Alarm condition: > 1
1* Metric namespace: AWS/ApplicationELB
2* Metric name: UnHealthyHostCount
3* Metric dimension: LoadBalancer
4* Metric period: 1 minute
5* Number of periods: 2
6* Statistic: Average
7* Alarm condition: > 0
1* Metric namespace: AWS/RDS
2* Metric name: FreeStorageSpace
3* Metric dimension: DBInstanceIdentifier
4* Metric period: 1 minute
5* Number of periods: 5 or 1 out of 5
6* Statistic: Minimum
7* Alarm condition: < 1000000000 Bytes

With these alarms, you’ll have visibility if something goes wrong in your application.

Thanks for reading!

This was everything for the first article of a series of articles about monitoring. In the next ones, we will learn how to monitor CodePipeline deployments, APM and external services status. Stay tuned!

More articles from Obytes

UX/UI design: How to get started

A step by step guide on how to get started in UX/UI design.

November 5th, 2019 · 6 min read

Performant Web Animations.

Why not to use animation on the web and how to achieve 60FPS.

September 9th, 2019 · 11 min read

ABOUT US

Our mission and ambition is to challenge the status quo, by doing things differently we nurture our love for craft and technology allowing us to create the unexpected.