APM using Datadog.

This is the second article of a series of articles about monitoring.

-> First article: Monitoring CloudWatch key metrics using Slack and manage them using Terraform

-> Third article: Monitoring CodePipeline deployments

-> Fourth article: Monitoring external services status using RSS and Slack

Aplication performance management (APM) is the monitoring and management of performance and availability of software applications. APM strives to detect and diagnose complex application performance problems to maintain an expected level of service. APM is "the translation of IT metrics into business meaning". The ultimate goal of performance monitoring is to supply end users with a top quality end-user experience.

Datadog APM setup

Datadog APM is a Datadog suite tool to monitor, troubleshoot, and optimize end-to-end application performance.

Datadog APM offers a 14-day free trial where you can monitor as many host as you want. After that period, you are billed for the number of hosts and analyzed spans. Check the pricing page for more details.

Datadog uses ddtrace client to gather application performance metrics. There are official ddtrace libraries for .NET, Go, Java, Node.js, PHP, Python and Ruby, and community libraries for other languages. In this post, we are going to set up ddtrace-py, the official ddtrace client for Python.

The other part of the system is Datadog Agent. Datadog Agent is a software that runs on your hosts. It collects events and metrics from hosts (in this case, ddtrace gathers application info and send it to the Agent) and sends them to Datadog.

Deploying and configuring Datadog APM on a Django application running on ECS Fargate

Deploying Datadog Agent

The first part we need is Datadog Agent. In this example, we will use ECS Fargate, so we will deploy it as a sidecar container of the main application. We need to add a new container in the container definitions section of our application task definition.

  {
    "name": "datadog-agent",
    "image": "datadog/agent:6.15.1", # Use always a fixed tag to avoid undesired updates.
    "environment": [
        {
          "name": "ECS_FARGATE",
          "value": "true"
        },
        {
          "name": "DD_APM_ENABLED",
          "value": "true"
        },
        {
          "name": "DD_ANALYTICS_ENABLED",
          "value": "true"
        }
    ],
    "secrets": [{
      "name": "DD_API_KEY",
      "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf"
    }]
    "portMappings": [
      {
        "containerPort": 8126,
        "hostPort": 8126,
        "protocol": "tcp"
      }
    ]
  }

DD_APM_ENABLED enables the Trace Agent and DD_ANALYTICS_ENABLED enables App Analytics.

Installing and configuring ddtrace-py

Once Datadog Agent is up and running it's time to install and configure ddtrace-py on our Django application. First, we need to install the package. This is as easy as running pip install ddtrace. Then, prefix your Python entry-point command with ddtrace-run. For example, if you're using uwsgi, do something like: ddtrace-run uwsgi --ini ./uwsgi.ini

ddtrace-py supports a big number of libraries out of the box. There are also community packages to trace other libraries.

On our Django settings.py file, we are going to add the following options.

DATADOG_TRACE = {
    'DEFAULT_SERVICE': '<SERVICE_NAME>',
    'AGENT_HOSTNAME': '127.0.0.1',
    'TAGS': {
        'env': os.environ.get('ENV'),
    },
    'ENABLED': 'True'
}

After this, ddtrace will start sending traces to Datadog Agent, and we will be able to see them on Datadog UI.

Manual instrumentation

You can trace functions manually by using ddtrace.Tracer.

from ddtrace import tracer
@tracer.wrap()
def execute():
    return 'executed'

More info about how to use the Tracer class is available here.

Visualize APM information on Datadog UI

Datadog offers a handy UI to visualize APM information gathered by ddtrace and Datadog Agent.

First of all, under APM/Services label, you'll see a list of all your services that are sending traces to Datadog.

For every service, you can see differents graphs: latency, errors, number of requests, time spent...

Then you can filter traces by endpoint.

And for every trace, you can see detailed info. What step took most of the time on a request, errors, error messages...

Thanks for reading!

In this article, we covered how to get application monitoring data from our Django application. This will be very useful to find the slowest points of our applications, optimize queries... and improve the user experience and web performance. In the next articles, we will learn how to monitor deployments and external services used on our stack. Stay tuned!