Chapter 2. Concepts

 2.1. Monitoring key terms and concepts

 Account

An account contains attributes describing a customer's account, including the Id. The account description contains mostly read-only data; however, a few properties can be modified with the API, such as the metadata and webhook_token.

 Alarm and alert

An alarm contains a set of rules that you configure that determine when the monitoring system sends a notification. You can create multiple alarms for the different checks associated with an entity. For example, if your entity is a web server that hosts your company's website, you can create one alarm to monitor the server itself, and another alarm to monitor the website.

When an alarm is triggered it alerts the associated notification plan to send notifications. This process is called the "alerting workflow."

The alarms language provides you with scoping parameters that let you pinpoint the value that will trigger the alarm. The scoping parameters are inherently flexible, so that you can set up multiple checks to trigger a single alarm. The alarm language supplies an adaptable triggering system that makes it easy for you to define different formulas for each alarm that monitors an entity's uptime. To learn how to use the alarm language to create robust monitors, see Alert Triggering and Alarms.

 Check and check type

Once you've created an entity, you can configure one or more checks for it. A check is the foundational building block of the monitoring system, and is always associated with an entity. The check specifies the parts or pieces of the entity that you want to monitor, the monitoring frequency, how many monitoring zones are launching the check, and so on. Basically it contains the specific details of how you are monitoring the entity.

You can associate one or more checks with an entity. An entity must have at least one check, but by creating multiple checks for an entity, you can monitor several different aspects of a single resource.

For each check you create within the monitoring system, you'll designate a check type. The check type tells the monitoring system which method to use, PING, HTTP, SMTP, and so on, when investigating the monitored resource. Rackspace Cloud Monitoring check types are fully described in the Check types section.

Note that if something happens to your resource, the check does not trigger a notification action. Instead, notifications are triggered by alarms that you create separately and associate with the check.

 Collector

The collector collects data on an individual machine or virtual machine via the machine's IP address. Monitoring zones contain many collectors, each of which are within a specific IP address range. Note that there might exist unallocated IP addresses or unmonitored machines within that IP address range.

 Entity

In Rackspace Cloud Monitoring, an entity is the object or resource that you want to monitor. It can be any object or device that you want to monitor. It's commonly a web server, but it might also be a website, a web page or a web service.

When you create an entity, you'll specify characteristics that describe what you are monitoring. At a minimum you must specify a name for the entity. The name is a user-friendly label or description that helps you identify the resource. You can also specify other attributes of the entity, such the entity's IP address, and any meta data that you'd like to associate with the entity.

 ID or Id

All objects in the monitoring system are identified by a uniquely generated identifier, generally expressed as Id, that consists of a two-character type prefix followed by a string of alphanumeric characters. You use an object's Id when you want to perform operations on it. For example, when you want to create a check and associate it with an entity, you must know the entity's Id.

 Metric, cumulative and instantaneous

A metric is a measurement of activity or state on a monitored resource. Checks gather metrics and send them to the monitoring system. Based on your configurations, a set of metrics may trigger an alarm, causing a notification to be sent. Metrics can also be used to create graphs.

For more information, see Cumulative and instantaneous metrics.

 Monitoring agent

The monitoring agent provides insight into the internal workings of your servers with checks for information such as load average and network usage. It is a recommended alternative to the default collector of the monitoring system. The agent operates as a single small service that runs checks that you configure and pushes metrics to the rest of Cloud Monitoring so that the metrics can be analyzed, alerted on, and archived. These metrics are gathered via checks using specified agent check types, and can be used with the other Cloud Monitoring features such as alarms. See Section B.2, “Agent check types” for a list of agent check types.

To learn about installing and configuring monitoring agents, read the Install and Configure section.

 Monitoring zone

When you create a remote check, you specify which monitoring zone(s) you want to launch the check from. A monitoring zone is the point of origin or "launch point" of the check. This concept of a monitoring zone is similar to that of a datacenter, however in the monitoring system, you can think of it more as a geographical region.

You can launch checks for a particular entity from multiple monitoring zones. This allows you to observe the performance of an entity from different regions of the world. It is also a way to prevent noisy alarms. For example, if the check from one monitoring zone reports that an entity is down, a second or third monitoring zone might report that the entity is up and running. This gives you a better picture of an entity's overall health.

 Notification and notification type

A notification is a rule specifying how and to whom an informational message should be sent when an alarm is triggered. You can set up notifications to alert a single individual or an entire team. Some of the notification rules are determined by the specified notification type. Rackspace Cloud Monitoring currently supports webhooks, email, PagerDuty, and SMS notification types for sending notifications.

 Notification plan

A notification plan contains a set of notification rules to execute when an alarm is triggered. A notification plan can contain multiple notifications for each of the following states:

  • Critical

  • Warning

  • OK

 Suppression

Once you've set up your monitoring to your satisfaction, there may come a point when you don't want to receive notifications for a set time period (e.g. a period of scheduled maintenance). In this situation, you can choose to set up a suppression. A suppression silences the notifications from an alarm or a set of alarms for a given amount of time. A single suppression can apply to any number of alarms. You can define the alarms to which it applies at any of several granularity levels by providing a list of Entity IDs, a list of entity ID/check ID pairs, a list of entity ID/alarm ID pairs, or a list of notification plan IDs. You can view records of when an alert would have been sent, but was instead suppressed, in the suppression log. For more details and examples, see the section on creating suppressions Section 5.16, “Suppressions”.

 2.2. How cloud monitoring works

Cloud Monitoring helps you keep a keen eye on all of your resources; from web sites to web servers, routers, load balancers, and more. Here is an overview of the Monitoring workflow:

  1. You create an entity to represent the resource you want to monitor. For example, the entity might represent a web site.

  2. You attach a monitoring check to the entity. For example, you could use the PING check to monitor your web site's public IP address.

    You can run your checks from multiple monitoring zones to provide redundant monitoring and voting logic to avoid noisy alarms.

  3. You create notifications and notification plans. A notification lets you define an action that Cloud Monitoring uses to communicate with you when the state of your resource changes. Notification plans allow you to organize a set of several notifications, or actions.

    For example, you can define a notification that specifies an email address to which Cloud Monitoring will send when a specified condition is met.

  4. You define one or more alarms for each check. An alarm lets you specify trigger conditions for the various metrics returned by the check. When a specific condition is met, the alarm is triggered and your notification plan is put into action.

    For example, your alarm can trigger when a PING response time exceeds a configured value. If this time elapses, the alarm can send you an email, or a webhook to a URL.

 2.3. How the monitoring agent works

Cloud Monitoring also provides the optional Monitoring Agent which you install on the servers you want to monitor. While Cloud Monitoring can poll your servers from multiple data centers, the agent gathers information locally on the server. The agent gathers:

  • Host information regarding network configuration, process tables, and disks to stay current with frequent system configuration changes.

  • Host metrics such as swap, CPU, disk, filesystem, and network device usage.

Some examples of agent checks include:

  • Memory

  • CPU

  • Disk

  • Network

  • Custom (user-definable plug-ins able to monitor any process or statistic on a server or from an application)

The Install and Configure section tells you how to get the Monitoring Agent up and running.

 2.3.1. Automatic in place agent upgrades

As of version 1.1.0-15, all agents include automatic upgrades, by default.

You may opt out of automatic agent upgrades by adding a line, monitoring_update disabled to the agent configuration file, rackspace-monitoring-agent.cfg. When you install the agent, this configuration file is created in the /etc directory on Linux systems or, on Windows systems, in the c:\ProgramData\Rackspace Monitoring\config\ directory.

If you do not opt out of automatic upgrades, whenever there is an agent upgrade, the Rackspace Cloud Monitoring system will send a command to the agent to upgrade.

Things to know about automatic agent upgrades:

  • By installing the Auto-Upgrades agent you agree to this legal notice from Rackspace: "The Rackspace Cloud Monitoring Agent release (number, date) and later includes an Automatic Upgrade feature. The monitoring agent will download and install Rackspace provided updates from time to time. By using this Service, you agree to receive such updates as part of your use of the Services. Instructions for opting out of in place updates are included in the product documentation for Cloud Monitoring."

  • In place upgrades only operate on agents that are online. If your agent is offline, it won't auto-upgrade. When it comes back online, it will upgrade automatically.

  • In place upgrades do not affect package managers except on Windows systems, where the in place upgrade installs the new MSI causing the Windows installation to be upgraded as well. Rackspace recommends package upgrades of your operating system on a regular basis.

  • If you opt out and, later, want to opt-in to automatic in place agent upgrades, simply remove the monitoring_update disabled line that you added from the rackspace-monitoring-agent.cfg file.

  • The agent will restart automatically after the in place upgrade. However, there are some system init flags that can affect the agent shutdown and startup. These are --exit-on-upgrade, and --restart-sysv-on-upgrade. If a system is on upstart or systemd, then it uses the --exit-on-upgrade. If the system is a sysV (System 5) based init system, then it uses --restart-sysv-on-upgrade.

  • You are not notified when the agent automatically upgrades. However, you can watch the rackspace-monitoring-agent repository for updates.

  • If the agent upgrade fails, there is no notification or error message, but there will be a log entry.



loading table of contents...