Datadog is a monitoring platform for cloud applications which automatically detects potential application and infrastructure issues as it observes trends and patterns in application metrics like error rate, request rate, and latency—and unexpected behavior.
The monitoring data falls into one of two categories: metrics and events.
Metrics capture a value pertaining to the systems at a specific point in time. Therefore, metrics are usually collected at regular intervals to monitor a system’s evolution over time.
In contrast to metrics, which are collected more or less continuously, events are discrete, infrequent occurrences. Events capture what happened, at a point in time, with optional additional information.
Some examples are Changes: Code releases builds, and build failures.
Datadog was built to meet the unique needs of modern, cloud-scale infrastructure.
Datadog collects monitoring data from Amazon EC2, ELB, RDS, and other AWS services, plus more than 100 other technologies. Furthermore, the Datadog Agent can collect custom metrics from virtually any application.
Datadog’s native support for tagging allows you to aggregate metrics and events on the fly to generate the views that matter most.
Datadog scales automatically with your infrastructure, whether you have tens, hundreds, or thousands of instances. Datadog auto-enrolls new hosts and containers as they come online, using AWS and user-provided tags to include the relevant metrics in existing graphs and alerts.
Virtually any type of monitoring data can be used to trigger a Datadog alert: fixed or dynamic metric thresholds, outliers, events, status checks, and more.
The Datadog Agent is a software that runs on the hosts and it collects events and metrics from hosts and sends them to Datadog, where it can analyze the monitoring and performance data.
Datadog Agent logs are located in the /var/log/datadog/ directory in Linux platform and C:\ProgramData\Datadog\logs directory in windows platform.
The Agent has three main parts: the collector, DogStatsD, and the forwarder:
It runs checks on the current machine for configured integrations and captures system metrics, such as memory and CPU.
It is a StatsD-compatible backend server that you can send custom metrics to from your applications.
It retrieves data from both DogStatsD and the collector, queues it up, and then sends it to Datadog.
Monitoring all of your infrastructures in one place wouldn’t be complete without the ability to know when critical changes are occurring. Datadog gives you the ability to create monitors that actively check metrics, integration availability, network endpoints, and more.
Triggered monitors appear in the event stream, allowing collaboration around active issues in your applications or infrastructure. Datadog provides a high-level view of open issues on the Triggered Monitors page as well as general monitor management on the Manage Monitors page.
Creating a Monitor
Navigate to the Create Monitors page by hovering over Monitors in the main menu and clicking New Monitor in the sub-menu (depending on your chosen theme and screen resolution, the main menu may be at the top or on the left). You are presented with a list of monitor types on the left.
Export your monitor
Export the JSON configuration for a monitor right from the create screen, or on your monitor status page in the upper right corner. If you manage and deploy monitors programmatically, it’s easier to define the monitor in the UI and export the JSON right away.
Any changes to monitors create an event in the event stream that explains the change and shows the user that made the actual change.
APM and Distributed Tracing
Datadog APM provides you with deep insight into your application’s performance-from automatically generated dashboards monitoring key metrics, such as to request volume and latency, to detailed traces of individual requests-side by side with your logs and infrastructure monitoring.
The tracing API is an Agent API rather than a service side API.
Send traces: Datadog’s APM allows you to collect performance metrics by tracing your code to determine which parts of your application are slow or inefficient. Tracing data is sent to the Datadog Agent via an HTTP API.