The Check type and fields reference provides details about the following agent check types supported by the Rackspace Monitoring service.
Note
Most check types include some example metrics. This helps you better understand creating successful alarm criteria.
Remote check types
Rackspace Monitoring supports the following remote check types.
- remote.dns
- remote.ftp-banner
- remote.http
- remote.imap-banner
- remote.mssql-banner
- remote.mysql-banner
- remote.ping
- remote.pop3-banner
- remote.postgresql-banner
- remote.smtp-banner
- remote.smtp
- remote.ssh
- remote.tcp
- remote.telnet-banner
remote.dns
The remote.dns check run a DNS check against a given target. This check should assist in verifying functionality of a DNS server, for example ensuring that it is publishing the domains you think that it should be publishing.
Field | Description | Validation |
---|---|---|
query | Specifies the DNS query. | String, valid hostname |
record_type | Specifies the DNS record type. | String matching the regex /^(A|AAAA| TXT|MX|SOA|CNAME|PTR|NS|MB|MD| MF|MG|MR)$/ |
port | Specifies the port number. The default is 53. | Optional, whole number (may be zero padded), must be an integer between 1-65535 inclusive |
Metric | Description | Type |
---|---|---|
answer | The list of space-separated IP addresses for the specified name resolution. | String |
rtt | The roundtrip time to execute a remote.dns check. | Double |
ttl | Specifies the port number. The default is 53. | Uint32 |
remote.ftp-banner
The remote.ftp-banner check will attempt to connect to a FTP server and verify that it re- sponds to the connection.
Field | Description | Validation |
---|---|---|
port | Specifies the port number. The default is 21. | This field is optional. Must be a whole number (may be zero padded). This value must be an integer between 1-65535 inclusive |
Metric | Description | Type |
---|---|---|
banner | The string sent from the server on connect | String |
banner_match | The matched string from the banner_match regular expression specified during check creation. | String |
body_match | The string representing the body match specified in a remote.ftp-banner check. | String |
duration | The time it took to finish executing the check in milliseconds. | Uint32 |
tt_body | The time to the body measured in milliseconds. | Uint32 |
tt_connect | The time to connect measured in milliseconds. | Uint32 |
tt_firstbyte | The time to first byte measured in milliseconds. | Uint32 |
remote.http
The remote.http check will try to connect to the server and retrieve the specified URL using the specified method, optionally with the password and user for authentication, using SSL, and checking the body with a regex. This can be used to test that a web application running on a server is responding without generating error messages. It can also test if the SSL certificate is valid.
Note: The maximum size of the content returned in a remote.http check is 500k, with overhead and compression taken into account. This limitation helps monitoring remain responsive.
Field | Description | Validation |
---|---|---|
url | Specifies the target URL. | String between 1 and 8096 characters long |
auth_password | Optional auth password | Optional. String between 1 and 255 characters long |
auth_user | Optional auth user | Optional. String between 1 and 255 characters long |
body | Body match regular expression used to run against HTTP response content and generate metric body_match (see Metrics table below). Body is limited to 100k and match is truncated to 80 characters. | Optional. String between 1 and 255 characters long |
body_matches | A map of key/regular-expression pairs used to run against HTTP response content and generate one metric body_match_<key> for each key/regular-expression pair (see Metrics table below). Body is limited to 100k and match is truncated to 80 characters. | Optional. Hash [String,String between 1 and 50 characters long, String matching the regex /^[-_ a-z0-9]+$/i: String,String between 1 and 255 characters long]. Array or object with number of items between 0 and 4. |
follow_redirects | Follow redirects (default:true) | Optional. Boolean. |
headers | Arbitrary headers which are sent with the request. | Optional. Hash [String,String between 1 and 50 characters long: String,String between 1 and 50 characters long]. Array or object with number of items between 0 and 10. A value which is not one of: content-length, user-agent, host, connection, keep-alive, transfer-encoding, upgrade. |
method | HTTP method. The default is GET. | Optional. String. One of (HEAD, GET, POST, PUT, DELETE, INFO) |
payload | Specify a request body (limited to 1024 characters). If a redirect is set, the payload is only sent to the first location. | Optional. String between 1 and 1024 characters long |
Note
When you set up a website and the check always returns unknown content-encoding:
it is because of the HTTP body check limit of 100. This limit is the amount of space for the Monitoring Pollers (where the site is checked from). If the amount of space required to do the HTTP(S) check is greater than 100k, then only the first 100k can be checked.
If the customer uses Compression on the pages, such as compress
or gzip
Content-Encoding, then the full compressed page must be less than or equal to 100k. This is because the full page must be downloaded and uncompressed before it can verify the check.
This is also the reason why you can only check against strings within the first 100k of the web page.
Metric | Description | Type |
---|---|---|
body_match | The string representing the any matched string from HTTP response content using the regular expression specified in body attribute in check. | String |
body_match_ | The metric is generated for each key specified in body_matches check attribute. For example, a body_matches value of {“register”:”Register Now!”, “contact”:”Contact Us”} will generate two metrics: body_match_register and body_match_contact . | String |
bytes | The number of bytes returned from a response payload. | Int32 |
cert_end | The absolute timestamp in seconds for the certificate expiration. This is only available when performing a check on an HTTPS server. | Uint32 |
cert_end_in | The relative timestamp in seconds until certification expiration. This is only available when performing a check on an HTTPS server. | Int32 |
cert_error | A string describing a certificate error in our validation. This is only available when performing a check on an HTTPS server. | String |
cert_issuer | The issue string for the certificate. This is only available when performing a check on an HTTPS server. | String |
cert_start | The absolute timestamp of the issue of the certificate. This is only available when performing a check on an HTTPS server. | Uint32 |
cert_subject | The subject of the certificate. This is only available when performing a check on an HTTPS server. | String |
cert_subject_alternative_names | The alternative name for the subject of the certificate. This is only available when performing a check on an HTTPS server. (See an example alarm following this table.) | String |
code | The status code returned. | String |
duration | The time it took to finish executing the check in milliseconds. | Uint32 |
truncated | The number of bytes that the result was truncated by. | Uint32 |
tt_connect | The time to connect measured in milliseconds. | Uint32 |
tt_firstbyte | The time to first byte measured in milliseconds. | Uint32 |
Note
The following is an example alarm for cert_subject_alternative_names
, where you would replace example.com
with an expected host name on the certificate’s SAN list:
if (metric['cert_subject_alternative_names'] nregex '.*example.com.*') {
return new AlarmStatus(CRITICAL, 'Missing expected SAN');
}
remote.imap-banner
The remote.imap-banner check will attempt to connect to an IMAP server and verify that it response to the connection
Field | Description | Validation |
---|---|---|
port | Port number (default: 143) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
remote.mssql-banner
The remote.mssql-banner check will attempt to connect to a Microsoft SQL database server and verify that it is accepting connections.
Field | Description | Validation |
---|---|---|
port | Port number (default: 1433) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
remote.mysql-banner
The remote.mysql-banner check will attempt to connect to a MySQL database server and verify that it is accepting connections.
Field | Description | Validation |
---|---|---|
port | Port number (default: 3306) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
remote.ping
The remote.ping check will attempt to ping a server.
Field | Description | Validation |
---|---|---|
count | Number of pings to send within a single check. | This field is optional. Must be a whole number (may be zero padded). This value must be an integer between 1-15 inclusive |
Metric | Description | Type |
---|---|---|
available | The whole number representing the percent of pings that returned back for a remote.ping check. | Double |
average | The average response time in milliseconds for all ping packets sent out and later retrieved. | Double |
count | The number of pings (ICMP packets) sent. | Int32 |
maximum | The maximum roundtrip time in milliseconds of an ICMP packet. | Double |
minimum | The minimum roundtrip time in milliseconds of an ICMP packet. | Double |
remote.pop3-banner
The remote.pop3-banner check will attempt to connect to a POP3 mailbox server and verify that it responds to the connection.
Field | Description | Validation |
---|---|---|
port | Port number (default: 110) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
remote.postgresql-banner
The remote.postgresql-banner check will attempt to connect to a PostgreSQL database server and verify that it is accepting connections.
Field | Description | Validation |
---|---|---|
port | Port number (default: 5432) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
remote.smtp-banner
The remote.smtp-banner check will attempt to connect to a SMTP mail server and verify that a HELO/EHLO is received.
Field | Description | Validation |
---|---|---|
port | Port number (default: 25) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
ssl | Enable SSL | Optional. Boolean. |
Metric | Description | Type |
---|---|---|
banner | The string sent from the server on connect. | String |
banner_match | The matched string from the banner_match regular expression specified during check creation. | String |
bytes | The number of bytes returned from a response payload. | Int32 |
cert_end | The absolute timestamp in seconds for the certificate expiration. This is only available when performing a check on an HTTPS server. | Uint32 |
cert_end_in | The relative timestamp in seconds until certification expiration. This is only available when performing a check on an HTTPS server. | Int32 |
cert_error | A string describing a certificate error in our validation. This is only available when performing a check on an HTTPS server. | String |
cert_issuer | The issue string for the certificate. This is only available when performing a check on an HTTPS server. | String |
cert_start | The absolute timestamp of the issue of the certificate. This is only available when performing a check on an HTTPS server. | Uint32 |
cert_subject | The subject of the certificate. This is only available when performing a check on an HTTPS server. | String |
cert_subject_alternative_names | The alternative name for the subject of the certificate. This is only available when performing a check on an HTTPS server. (See an example alarm following this table.) | String |
duration | The time it took to finish executing the check in milliseconds.. | Uint32 |
tt_connect | The time to connect measured in milliseconds. | Uint32 |
tt_firstbyte | The time to first byte measured in milliseconds. | Uint32 |
Note
The following is an example alarm for cert_subject_alternative_names
, where you would replace example.com
with an expected host name on the certificate’s SAN list:
if (metric['cert_subject_alternative_names'] nregex '.*example.com.*') {
return new AlarmStatus(CRITICAL, 'Missing expected SAN');
}
remote.smtp
The remote.smtp check will attempt to connect to a SMTP mail server, send an email from the ‘from’ parameter, to the ‘to’ parameter, with a payload specified by the ‘payload’ parameter setting the EHLO from host to the value in ‘ehlo’.
Field | Description | Validation |
---|---|---|
ehlo | Specifies the EHLO parameter. | Optional. String between 1 and 255 characters long. |
from | Specifies the From parameter. | Optional. String between 1 and 255 characters long. |
payload | Specifies the payload. | Optional. String between 1 and 1024 characters long. |
port | Specifies the port number. | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. |
starttls | Specifies whether the connection should be upgraded to TLS/ SSL. | Optional. Boolean. |
to | Specifies the To parameter. If this field is blank, a “quit” is issued before sending a to line, and the connection is terminated. | Optional. String between 1 and 255 characters long. |
remote.ssh
The remote.ssh check will attempt to SSH to a target.
Field | Description | Validation |
---|---|---|
port | Specifies the port number. The default is 22. | This field is optional. Must be a whole number (may be zero padded). This value must be an integer between 1-65535 inclusive |
Metric | Description | Type |
---|---|---|
duration | Specifies the time it took to finish executing the check in milliseconds. | Uint32 |
fingerprint | Specifies the ssh fingerprint used to verify identity. | String |
remote.tcp
The remote.tcp check will attempt to connect to a host and port, and optionally issue a banner match to ensure that the service is responding as specified. This can be used to test services that are not covered by the existing HTTP, SMTP, SSH, MySQL, etc. checks.
Field | Description | Validation |
---|---|---|
port | Specifies the port number. | Whole number (may be zero padded). Integer between 1-65535 inclusive. |
banner_match | Specifies the banner match regex. | Optional. String between 1 and 255 characters long. |
body_match | Specifies the body match regex. Key/Values are captured when matches are specified within the regex. Note: Maximum body size is 1024 bytes. | Optional. String between 1 and 255 characters long. |
send_body | Send a body. If a banner is provided the body is sent after the banner is verified. | Optional. String between 1 and 1024 characters long. |
ssl | Specifies whether SSL is enabled. | Optional. Boolean. |
Metric | Description | Type |
---|---|---|
banner | Specifies the string that is sent from the server on connect. | String |
banner_match | Specifies the matched string from the banner_match regular expression specified during check creation. | String |
duration | Specifies the time it took to finish executing the check in milliseconds. | Uint32 |
tt_connect | Specifies the time to connect measured in milliseconds. | Uint32 |
tt_firstbyte | Specifies the time to first byte measured in milliseconds. | Uint32 |
remote.telnet-banner
The remote.telnet-banner check will attempt to connect to a Telnet (or similar protocol) server and verify that an appropriate banner is received.
Field | Description | Validation | ||
---|---|---|---|
port | Specifies the port number. (Default: 23) | Optional. Whole number (may be zero padded). Integer between 1-65535 inclusive. | ||
banner_match | Specifies the banner match check. | Optional. String between 1 and 255 characters long. | ||
ssl | Specifies whether SSL is enabled. | Optional. Boolean. |
Agent check types
Rackspace Monitoring supports the following agent check types.
- agent.apache check
- agent.cpu
- agent.disk
- agent.filesystem
- agent.filesystem_state
- agent.load_average
- agent.memory
- agent.mysql
- agent.network
- agent.mssql_database
- agent.mssql_buffer_manager
- agent.mssql_sql_statistics
- agent.mssql_plan_cache
- agent.mssql_memory_manager
- agent.mssql_version
- agent.plugin
- agent.redis
- agent.windows_perfos
agent.apache check
The agent.apache check will retrieve Apache HTTP server metrics
Field | Description | Validation |
---|---|---|
timeout | Specifies the plugin execution timeout in milliseconds. | Optional. Integer. |
url | Specifies the URL. Defaults to http://127.0.0.1/server-status. | Optional. URL. |
Metric | Description | Type |
---|---|---|
busy_workers | Specifies the number of workers serving requests | Int64 |
bytes_per_request | Averages giving the number of request per second, the number of bytes served second | Int64 |
bytes_per_second | Averages giving the number of requests per second, the number of bytes per request | Int64 |
closing | The number of workers closing the connection | Int64 |
cpu_load | Total percentage of CPU used by workers | Double |
dns | The number of workers performing DNS lookup | Int64 |
gracefully_fishing | The number of workers gracefully fishing | Int64 |
idle | The number of idle cleanup workers | Int64 |
idle_workers | The number of idle workers | Int64 |
keepalive | The number of workers kept alive (reading) | Int64 |
logging | The number of workers logging | Int64 |
open | The number of workers with no current process | Int64 |
reading | The number of workers reading the request | Int64 |
requests_per_second | The number of requests per second | Int64 |
sending | The number of workers sending a reply | Int64 |
starting | The number of workers starting up | Int64 |
total_access | Total number of accesses served | Int64 |
total_kbytes | Total kilobytes served | Int64 |
uptime | Time since the last start/restart in milliseconds | Int64 |
waiting | The number of workers waiting for connection | Int64 |
agent.cpu
The agent.cpu check will attempt to measure the usage of the CPU on a host.
Attributes
No fields are present for this particular check type.
Metric | Description | Type |
---|---|---|
idle_percent_average | Recent percentage of CPU time spent idle. | Double |
irq_percent_average | Recent percentage of CPU time spent handling hardware interrupts. | Double |
max_cpu_usage | Recent percentage utilization of the most-utilized CPU. This is useful to detect when some CPUs are pegged while others are idle. | Double |
min_cpu_usage | Recent percentage utilization of the least-utilized CPU. This is useful to detect when some CPUs are pegged while others are idle. | Double |
stolen_percent_average | Recent percentage of CPU time spent waiting for the CPU to service other virtual CPUs. | Double |
sys_percent_average | Recent percentage of CPU time utilized by kernel mode processes. | Double |
usage_average | Recent percentage of CPU time utilized by all processes. processes. | Double |
user_percent_average | Recent percentage of CPU time utilized by user mode processes in a “wait” state. | Double |
wait_percent_average | Recent percentage of CPU time utilized by processes in a “wait” state. | Double |
agent.disk
The agent.disk check exposes disk related metrics (service time, wait time, etc.).
Field | Description | Validation | |
---|---|---|
target | The disk to check (eg ‘/dev/xvda1’) | String between 1 and 512 characters long |
Metric | Description | Type |
---|---|---|
queue | Measured in seconds, this is the current disk queue length, which is an instantaneous measurement of the I/O queue for the given disk/partition. | Int64 |
qtime | Measured in milliseconds, this is the weighted number of milliseconds spent doing I/Os. This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that might be accumulating. | Int64 |
read_bytes | The number of physical disk bytes read, the prefix / will change depending on the mount points discovered. | Int64 |
reads | The number of physical disk reads, the prefix / will change depending on the mount points discovered. | Int64 |
rtime | The amount of time spent reading, the prefix / will change depending on the mount points discovered. | Int64 |
write_bytes | The number of physical disk bytes written, the prefix / will change depending on the mount points discovered. | Int64 |
writes | The number of physical disk writes, the prefix / will change depending on the mount points discovered. | Int64 |
wtime | The amount of time spent writing, the prefix / will change dependending on the mount points discovered. | Int64 |
agent.filesystem
The agent.filesystem check exposes file system related metrics (free space, used space, etc.)
Field | Description | Validation |
---|---|---|
target | The mount point to check, either /var or C:\ | String between 1 and 512 characters long. |
Metric | Description | Type |
---|---|---|
avail | Available space on the filesystem in kilobytes for the current user, which is root, that is running the agent. | Int64 |
free | Free space available on the filesystem in kilobytes including reserved space. This is calculated as number of free file blocks x block size | Int64 |
options | The option used to mount the device to the filesystem. Includes the rw f which indicates the device is in read/write mode. | Int64 |
total | Total space on the filesystem, in kilobytes, including reserved space. This is calculated as number of total file blocks x block size | Int64 |
used | Used space on the filesystem, in kilobytes. This number does not include the reserved space. This is calculated as total - free | Int64 |
files | Number of inodes on the filesystem. | Int64 |
free_files | Number of free inodes on the filesystem. | Int64 |
Note
The reserved space
only applies to Linux systems. It is the space saved for important root processes and possible rescue actions. In some systems the reserved space can be used for fragmentation allocation. For more information about Ext3 and Ext4: https://www.redhat.com/archives/ext3-users/2009-January/msg00026.html.
The files and free_files metrics only apply to Linux systems.
agent.filesystem_state
The agent.filesystem_state check exposes filesystem metrics for read-write/read-only system mounts.
No fields are present for this particular check type.
Metric | Description | Type |
---|---|---|
total_ro | Total number of filesystems mounted read-only. | Int64 |
total_rw | Total number of filesystems mounted read-write | Int64 |
devices_ro | Comma delimited list of devices mounted read-only. | String |
devices_rw | Comma delimited list of devices mounted read-write. | String |
agent.load_average
The agent.load_average check attempts to measure the UNIX style load average on a host.
For more information about the commands used to get the load average, see Check the System Load on Linux.
Attributes
No fields are present for this particular check type.
Metric | Description | Type |
---|---|---|
1m | One minute load average. | Double |
5m | Five minute load average. | Double |
15m | Fifteen minute load average. | Double |
agent.memory
No fields are present for this particular check type.
The memory available to the system is used in three different ways:
- Used by the processese running in the system, this value is under “actual_used” metric.
- Used by the kernel, this value is not returned from the check but can be deduced.
- Not used by either the running processes or kernel, this value is under “free” metric.
For convenience, the system returns the value of used/free memory for the case of including kernel and excluding kernel so that you don’t have to do the calculation in your head.
Metric | Description | Type |
---|---|---|
actual_free | The amount of free memory, ‘free’ plus kernel memory. | Int64 |
actual_used | The actual amount of used memory excluding kernel memory. | Int64 |
free | The amount of free memory not including kernel memory. | Int64 |
ram | The amount of RAM. | Int64 |
swap_free | The amount of free SWAP memory. | Int64 |
swap_page_in | The number of SWAP-in pages. | Int64 |
swap_page_out | The number of SWAP-out pages. | Int64 |
swap_total | The total amount of SWAP memory. | Int64 |
swap_used | The amount of used SWAP memory. | Int64 |
total | The total amount of memory. | Int64 |
used | The total amount of used memory, ‘actual_used’ plus kernel memory | Int64 |
agent.mysql
The agent.mysql check retrieves MySQL server metrics.
Note
Except for the replication.slave_running’ metric, all metrics starting with replication do not show up if there is no slave running.
If the libmysqlclient-dev package is not already present, you should install it on the host where the agent.mysql plug-in runs.
Field | Description | Validation |
---|---|---|
host | Mysql server hostname (default: 127.0.0.1). | Optional. Valid hostname, IPv4 or IPv6 address |
mycnf | Specifies whether my.cnf should be loaded. | Optional. Boolean |
password | Specifies the server password. | Optional. String between 1 and 255 characters long |
port | Specifies the Mysql server port (default: 3306). | Optional. Integer between 1-65535 inclusive |
socket | Specifies the path to the domain socket. | Optional. String between 1 and 255 characters long |
timeout | Specifies the plugin execution timeout in milliseconds | Optional. Integer |
username | Specifies the username. | Optional. String between 1 and 16 characters long |
Metric | Description | Type |
---|---|---|
bytes_received | The number of bytes received from all clients. (statvar_Bytes_received) | Cumulative |
bytes_sent | The number of bytes sent to all clients. (statvar_Bytes_sent) | Cumulative |
core.aborted_clients | The number of connections that were aborted because the client died without closing the connection properly. (statvar_Aborted_clients) | Instantaneous |
core.connections | The number of connection attempts (successful or not) to the MySQL server. (statvar_Connections) | Cumulative |
core.queries | The number of statements executed by the server. (statvar_Queries) | Cumulative |
core.uptime | The number of seconds that the server has been up. (statvar_Uptime) | Instantaneous |
handler.commit | The number of internal COMMIT statements. (statvar_Handler_commit) | Cumulative |
handler.delete | The number of times that rows have been deleted from tables. (statvar_Handler_delete) | Cumulative |
handler.read_first | The number of times that rows have been deleted from tables. (statvar_Handler_delete) | Cumulative |
handler.read_first | The number of times the first entry in an index was read. (statvar_Handler_read_first) | Cumulative |
handler.read_key | The number of requests to read a row based on a key. If this value is high, it is a good indication that your tables are properly indexed for your queries. (statvar_Handler_read_key) | Cumulative |
handler.read_next | The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan. (statvar_Handler_read_next) | Cumulative |
handler.read_prev | The number of requests to read the previous row in key order. This read method is mainly used to optimize ORDER BY … DESC. (statvar_Handler_read_prev) | Cumulative |
handler.read_rnd | The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that do not use keys properly. (statvar_Handler_read_rnd) | Cumulative |
handler.rollback | The number of requests for a storage engine to perform a rollback operation. (statvar_Handler_rollback) | Instantaneous |
handler.savepoint | The number of requests for a storage engine to place a savepoint. (statvar_Handler_savepoint) | Instantaneous |
handler.savepoint_rollback | The number of requests for a storage engine to roll back to a savepoint. (statvar_Handler_savepoint_rollback) | Instantaneous |
handler.update | The number of requests to update a row in a table. (statvar_Handler_update) | Cumulative |
handler.write | The number of requests to insert a row in a table. (statvar_Handler_write) | Cumulative |
innodb.buffer_pool_pages_data | The number of pages containing data (dirty or clean). (statvar_Innodb_buffer_pool_pages_data) | Instantaneous |
innodb.buffer_pool_pages_dirty | The number of pages currently dirty. (statvar_Innodb_buffer_pool_pages_dirty) | Instantaneous |
innodb.buffer_pool_pages_flushed | The number of buffer pool page-flush requests. (statvar_Innodb_buffer_pool_pages_flushed) | Instantaneous |
innodb.buffer_pool_pages_free | The number of free pages. (statvar_Innodb_buffer_pool_pages_free) | Instantaneous |
innodb.buffer_pool_pages_total | The total size of the buffer pool, in pages. (statvar_Innodb_buffer_pool_pages_total) | Instantaneous |
innodb.buffer_pool_read_requests | The number of logical read requests. (statvar_Innodb_buffer_pool_read_requests) | Cumulative |
innodb.buffer_pool_reads | The number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. (statvar_Innodb_buffer_pool_reads) | Cumulative |
innodb.buffer_pool_size | The size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. (sysvar_innodb_buffer_pool_size) | Instantaneous |
innodb.data_pending_fsyncs | The current number of pending fsync() operations. (statvar_Innodb_data_pending_fsyncs) | Instantaneous |
innodb.data_pending_reads | The current number of pending reads. (statvar_Innodb_data_pending_reads) | Instantaneous |
innodb.data_pending_writes | The current number of pending writes. (statvar_Innodb_data_pending_writes) | Instantaneous |
innodb.pages_created | The number of pages created. (statvar_Innodb_pages_created) | Cumulative |
innodb.pages_read | The number of pages read. (statvar_Innodb_pages_read) | Cumulative |
innodb.pages_written | The number of pages written. (statvar_Innodb_pages_written) | Cumulative |
innodb.row_lock_time | The total time spent in acquiring row locks, in milliseconds. (statvar_Innodb_row_lock_time) | Cumulative |
innodb.row_lock_time_avg | The average time to acquire a row lock, in milliseconds. (statvar_Innodb_row_lock_time_avg) | Instantaneous |
innodb.row_lock_time_max | The maximum time to acquire a row lock, in milliseconds. (statvar_Innodb_row_lock_time_max) | Instantaneous |
innodb.row_lock_waits | The number of times a row lock had to be waited for. (statvar_Innodb_row_lock_waits) | Cumulative |
innodb.rows_deleted | The number of rows deleted from InnoDB tables. (statvar_Innodb_rows_deleted) | Cumulative |
innodb.rows_inserted | The number of rows inserted into InnoDB tables. (statvar_Innodb_rows_inserted) | Cumulative |
innodb.rows_read | The number of rows read from InnoDB tables. (statvar_Innodb_rows_read) | Cumulative |
innodb.rows_updated | The number of rows updated in InnoDB tables. (statvar_Innodb_rows_updated) | Cumulative |
key.buffer_size | Index blocks for MyISAM tables are buffered and are shared by all threads. (sysvar_key_buffer_size) | Instantaneous |
max.connections | The maximum permitted number of simultaneous client connections. (sysvar_max_connections) | Instantaneous |
qcache.free_blocks | The number of free memory blocks in the query cache. (statvar_Qcache_free_blocks) | Instantaneous |
qcache.free_memory | The amount of free memory for the query cache. (statvar_Qcache_free_memory) | Instantaneous |
qcache.hits | The number of query cache hits. (statvar_Qcache_hits) | Cumulative |
qcache.inserts | The number of queries added to the query cache. (statvar_Qcache_inserts) | Cumulative |
qcache.lowmem_prunes | The number of queries that were deleted from the query cache because of low memory. (statvar_Qcache_lowmem_prunes) | Instantaneous |
qcache.not_cached | The number of noncached queries (not cacheable, or not cached due to the query_cache_type setting). (statvar_Qcache_not_cached) | Instantaneous |
qcache.queries_in_cache | The number of queries registered in the query cache. (statvar_Qcache_queries_in_cache) | Cumulative |
qcache.size | The amount of memory allocated for caching query results. (sysvar_query_cache_size) | Instantaneous |
qcache.total_blocks | The total number of blocks in the query cache. (statvar_Qcache_total_blocks) | Cumulative |
replication.exec_master_log_pos | The position in the current master binary log file to which the SQL thread has read and executed, marking the start of the next transaction or event to be processed. (show-slave-status.html). | Instantaneous |
replication.last_errno | The error number returned by the most recently executed statement. (show-slave-status.html). | Instantaneous |
replication.last_io_error | The error message of the most recent error that caused the I/O thread to stop (show-slave-status.html). | String |
replication.max_relay_log_size | If a write by a replication slave to its relay log causes the current log file size to exceed the value of this variable, the slave rotates the relay logs (closes the current file and opens the next one). (sysvar_max_relay_log_size) | Instantaneous |
replication.read_master_log_pos | The position in the current master binary log file up to which the I/O thread has read. (show-slave-status.html) | Instantaneous |
replication.relay_log_pos | The position in the current relay log file up to which the SQL thread has read and executed. (show-slave-status.html) | Instantaneous |
replication.seconds_behind_master | In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. (show-slave-status.html) | Instantaneous |
replication.slave_io_running | Whether the I/O thread is started and has connected successfully to the master. Internally, the state of this thread is represented by one of the following three values: MYSQL_SLAVE_NOT_RUN, MYSQL_SLAVE_RUN_NOT_CONNECT, MYSQL_SLAVE_RUN_CONNECT (show-slave- status.html) | Boolean |
replication.slave_io_state | A copy of the State field of the SHOW PROCESSLIST output for the slave I/O thread. This tells you what the thread is doing: trying to connect to the master, waiting for events from the master, reconnecting to the master, and so on. (show-slave-status.html). | String |
replication.slave_open_temp_tables | The number of temporary tables that the slave SQL thread currently has open. If the value is greater than zero, it is not safe to shut down the slave. (statvar_Slave_open_temp_tables). | Instantaneous |
replication.slave_retried_transactions | The total number of times since startup that the replication slave SQL thread has retried transactions. (statvar_Slave_retried_transactions) | Instantaneous |
replication.slave_running | This is ON if this server is a replication slave that is connected to a replication master, and both the I/O and SQL threads are running; otherwise, it is OFF. (statvar_Slave_running) | String |
replication.slave_sql_running | Whether the SQL thread is started. (show- slave-status.html) | Boolean |
thread.cache_size | How many threads the server should cache for reuse. (sysvar_thread_cache_size) | Instantaneous |
threads.connected | The number of currently open connections. (statvar_Threads_connected) | Instantaneous |
threads.created | The number of threads created to handle connections. (statvar_Threads_created) | Cumulative |
threads.running | The number of threads that are not sleeping. (statvar_Threads_running) | Instantaneous |
agent.network
The agent.network check will attempt to measure the usage of network devices on a host.
Field | Description | Validation | |
---|---|---|
target | The network device to check (eg ‘eth0) | String between 1 and 512 characters long |
Metric | Description | Type |
---|---|---|
rx_bytes | The number of bytes received over the interface. | Int64 |
rx_dropped | The number of packets received and subsequently dropped over the interface. | Int64 |
rx_errors | The number of errors received over the interface. | Int64 |
rx_packets | The number of packets received over the interface. | Int64 |
speed | The speed at which the bytes were transmitted over the interface. | Int64 |
tx_bytes | The number of bytes transmitted over the interface. | Int64 |
tx_dropped | The number of packets attempted transmitting and subsequently dropped over the interface. | Int64 |
tx_error | The number of errors while transmitting over the interface. | Int64 |
tx_packets | The number of packets transmitted over the interface. | Int64 |
agent.mssql_database
The agent.mssql_database check returns metrics for a Microsoft SQL Server database.
Field | Description | Validation |
---|---|---|
db | MS SQL Server database name | String between 1 and 255 characters long |
hostname | MS SQL Server hostname | Optional. Valid hostname, IPv4 or IPv6 address |
password | MS SQL Server password | Optional. String between 1 and 255 characters long |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
username | MS SQL Server username | Optional. String between 1 and 255 characters long |
agent.mssql_buffer_manager
The agent.mssql_buffer_manager check returns metrics for the Microsoft SQL Server buffer manager.
Field | Description | Validation |
---|---|---|
computer | MS SQL Server computer name | Optional. Valid hostname, IPv4 or IPv6 address |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
agent.mssql_sql_statistics
The agent.mssql_sql_statistics check returns metrics for the Microsoft SQL Server SQL statistics.
Field | Description | Validation |
---|---|---|
computer | MS SQL Server computer name | Optional. Valid hostname, IPv4 or IPv6 address |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
agent.mssql_plan_cache
The agent.mssql_plan_cache check returns metrics for the Microsoft SQL Server plan cache.
Field | Description | Validation |
---|---|---|
computer | MS SQL Server computer name | Optional. Valid hostname, IPv4 or IPv6 address |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
agent.mssql_memory_manager
The agent.mssql_memory_manager check returns metrics for the Microsoft SQL Server memory manager.
Field | Description | Validation |
---|---|---|
computer | MS SQL Server computer name | Optional. Valid hostname, IPv4 or IPv6 address |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
agent.mssql_version
The agent.mssql_version check returns version information for Microsoft SQL Server.
Field | Description | Validation |
---|---|---|
hostname | MS SQL Server hostname | Optional. Valid hostname, IPv4 or IPv6 address |
password | MS SQL Server password | Optional. String between 1 and 255 characters long |
serverinstance | MS SQL Server instance to query | Optional. String between 1 and 255 characters long |
username | MS SQL Server username | Optional. String between 1 and 255 characters long |
agent.plugin
The agent.plugin check will attempt to run a custom plugin on a host.
Custom plugins are simply executable files which report metrics via stdout
. Plugins are placed on the server to be monitored at an installation path that depends on the operating system:
Operating System | Installation Path |
---|---|
Linux | /usr/lib/rackspace-monitoring-agent/plugins/ |
Windows (32-bit agent installed on a 64-bit system ) | C:\Program Files (x86)\Rackspace Monitoring\plugins |
Windows (64-bit agent installed on a 64-bit system or 32-bit agent installed on a 32-bit system) | C:\Program Files\Rackspace Monitoring\plugins |
After the plugin has been installed on the server, create an agent.plugin
check that specifies the name of the executable file so that the plugin can begin reporting metrics to the monitoring system, like any other check. If the plugin requires any command line arguments, you can specify them using the optional args
array.
Field | Description | Validation |
---|---|---|
file | Name of the plugin file | String matching the regex //[a-zA-Z0-9.- _]+// |
args | Command-line arguments which are passed to the plugin | Optional. Array [Non-empty string]. Array or object with number of items between 0 and 10 |
timeout | Plugin execution timeout in milliseconds | Optional. Integer |
The metrics returned are defined in the plugin script. A plugin can send up to fifty unique metrics at a time.
Community Plugin Repository
A curated repository of plugins created by Rackspace Monitoring users is avaliable on GitHub. Contributions are welcome!
Note
The Rackspace Monitoring Agent is also capable of executing Cloudkick plugins, so if you are a Cloudkick user you can just drop in any existing plugin and it should just work.
Creating Custom Plugins
Creating custom plugins is as simple as writing a script that prints a status and up to fifty metrics to standard out. The format of the status line is:
status
The status string should describe whether the check was able to successfully gather metrics. It could be as simple as “success” to incidate that metrics were successfully gathered. When an error occurs that prevents metrics from being gathered, plugins should print a status that describes the error, then should exit non-zero without printing any metric lines.
The status line can be followed by up to fifty metric lines. Each line is output in the following format:
metric
The following descriptions provide information about parameter values.
Parameter | Description |
---|---|
name | The name of the metric. Spaces are not supported. The format is alpha numeric with colon (:), underscore (_) and dot (.) allowed. Example: memory_free . |
type | The metric can be any of the following types:int32 Signed 32 bit integer value.uint32 Unsigned 32 bit integer value.int64 Signed 64 bit integer value.uint64 Unsigned 64 bit integer value.double Floating point values.string A string value. Note: the monitoring system records string metrics every time they change. String metrics are designed for recording an enumerated state which infrequently changes (for example an HTTP response code which is always 200 during normal operation). You should not store arbitrary, frequently changing values in a string metric. |
value | The value assigned to the metric. |
Putting it all together, the output of a plugin that has successfully executed might look something like:
status Turkey thermometer returned valid response
metric internal_temperature uint32 165
metric ambient_temperature uint32 325
If the plugin failed, it might print the following before exiting non-zero:
status Turkey thermometer not responding
agent.redis
The agent.redis check will retrieve Redis server metrics
Field | Description | Validation |
---|---|---|
hostname | Redis server hostname | Valid hostname, IPv4 or IPv6 address |
password | Optional Redis server password | Optional. String between 1 and 255 characters long |
port | Redis server port | Integer between 1-65535 inclusive |
timeout | Connection timeout in milliseconds | Optional. Integer |
Metric | Description | Type |
---|---|---|
bgrewriteaof_in_progress | (Redis 2.4.16 only) Flag indicating a RDB save is on-going | Int32 |
bgsave_in_progress | (Redis 2.4.16 only) Flag indicating a RDB save is on-going | Int32 |
blocked_clients | Number of clients pending on a blocking call (BLPOP, BRPOP, BRPOPLPUSH) | Int32 |
changes_since_last_save | (Redis 2.4.16 only) Number of changes since the last dump | Int32 |
connected_clients | Number of client connections (excluding connections from slaves) | Int32 |
evicted_keys | Number of evicted keys due to maxmemory limit | Int32 |
pubsub_patterns | Global number of pub/sub pattern with client subscriptions | Int32 |
total_commands_processed | Total number of commands processed by the server | Gauge |
total_connections_received | Total number of connections accepted by the server | Gauge |
uptime_in_seconds | Number of seconds since Redis server start | Int32 |
used_memory | Total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc. | Int32 |
version | Version of the server | String |
agent.windows_perfos
The agent.windows_perfos check returns metrics regarding windows performance data. This check is only available on Windows platforms.
No fields are present for this particular check type.
Metric | Description | Type |
---|---|---|
AlignmentFixupsPersec | Shows the rate, in incidents per second, at which alignment faults, were fixed by the system. | Uint32 |
ContextSwitchesPersec | Shows the combined rate, in incidents per second, at which all processors on the computer were switched from one thread to another. It is the sum of the values of Thread Context Switches/sec for each thread running on all processors on the computer, and is measured in numbers of switches. Context switches occur when a running thread voluntarily relinquishes the processor, or is preempted by a higher priority, ready thread. | Uint32 |
ExceptionDispatchesPersec | Shows the rate, in incidents per second, at which exceptions were dispatched by the system. | Uint64 |
FileControlBytesPersec | Shows the overall rate, in incidents per second, at which bytes were transferred for all file system operations that were neither read nor write operations, such as file system control requests and requests for information about device characteristics or status. | Uint32 |
FileControlOperationsPersec | Shows the combined rate, in incidents per second, of file system operations that were neither read nor write operations, such as file system control requests and requests for information about device characteristics or status. This is the inverse of FileDataOperationsPersec. | Int32 |
FileReadBytesPersec | Shows the overall rate, in incidents per second, at which bytes were read to satisfy file system read requests to all devices on the computer, including read operations from the file system cache. | Uint64 |
FileReadOperationsPersec | The number of errors while transmitting over the interface. | Uint32 |
FileWriteBytesPersec | Shows the overall rate, in incidents per second, at which bytes were written to satisfy file system write requests to all devices on the computer, including write operations to the file system cache. | Uint64 |
FloatingEmulationsPersec | Shows the rate, in incidents per second, of floating emulations performed by the system. | Uint32 |
PercentRegistryQuotaInUse | Percentage of the total registry quota allowed that is currently being used by the system. This property displays the current percentage value only; it is not an average. | Uint32 |
Processes | Shows the number of processes in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval. Each process represents a program that is running. | Uint32 |
ProcessorQueueLength | Shows the number of threads in the processor queue. Unlike the disk counters, this counter shows ready threads only, not threads that are running. There is a single queue for processor time, even on computers with multiple processors.Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of greater than two threads generally indicates processor congestion. | Uint32 |
SystemCallsPersec | Shows the combined rate, in incidents per second, of calls to operating system service routines by all processes running on the computer. These routines perform all of the basic scheduling and synchronization of activities on the computer, and provide access to non-graphic devices, memory management, and name space management. | Uint32 |
SystemUpTime | Shows the total time, in seconds, that the computer has been operational since it was last started. | Uint64 |
Threads | Shows the number of threads in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval. A thread is the basic executable entity that can execute instructions in a processor. | Uint32 |
Hostinfo checks
Hostinfo checks are a special class of checks that run on demand.
In contrast to the remote and agent check types which enable you to schedule alarms or alerts for remote and agent-type checks and run them on a regular schedule, you cannot schedule Hostinfo checks or create alarms or alerts for them.
Create Hostinfo checks to perform tasks like the following:
- Fetch data on demand. For example, you can use a Hostinfo check to pipe data about the host to other services or applications.
- Run occasional checks to troubleshoot an issue.
- Periodically fetch data from large clusters of servers with the granularity of fetching from an individual computer. For example, use a Hostinfo check to retrieve information from a dashboard built on Kibana.
- Use in conjunction with service helper software to generate suggestions that are based on the status of a system or piece information that is required by support technicians.
The following table provides a list of the Hostinfo checks supported by the monitoring service.
Hostinfo checks supported by Rackspace Monitoring
Hostinfo type | Description |
---|---|
connections | Runs the arp -an and netstate -naten commands and retrieves information about any open listening ports and any connections to them. |
iptables | Runs the iptables -S command to retrieve data about IPv4 policies. |
ip6tables | Runs the iptables -S command to retrieve data about IPv6 policies. |
autoupdates | Checks if automatic updates are enabled on a Linux distribution. |
passwd | Reads /etc/passwd and then runs the passwd -S command for every user. Obtains password-related |
pam | Reads /etc/pam.d and retrieves data about pluggable authentication modules. |
cron | Reads files in the crontabs directory and retrieves information about scheduled Cron jobs. |
kernel_modules | Reads the /proc/modules virtual directory and retrieves data about the modules that are loaded into the kernel. |
cpu | Retrieves information about the host’s CPU. |
disk | Retrieves information about the host’s hard disks. |
filesystem | Retrieves information about the host’s filesystem. |
filesystem_state | Retrieves information about the read-only/read-write filesystems. |
login | Reads /etc/login.defs and retrieves data about the login shell. This check does not retrieve any password information or any other sensitive data. |
memory | Retrieve information about the host’s memory. |
network | Retrieves information about the host’s network interface. |
nil | Returns no information. This Hostinfo check is mainly used within the monitoring agent code itself. |
packages | Runs either the dpkg-query or rpm -qa command and retrieves a list of package names and versions. |
procs | Retrieves information about the processes that are running on the host. |
system | Retrieves information about the host’s operating system. |
who | Retrieves information about the user, device, time and host. |
date | Retrieves the date and time on the host. |
sysctl | Runs the sysctl -A command and retrieves all possible key-value pairs of the kernel parameters that can be set at runtime. |
sshd | Runs the sshd -T command and retrieves the configuration parameters for the open SSH daemon. |
fstab | Reads /etc/fstab and retrieves information about the file system configuration. |
fileperms | Reads a pre-specified list of files and checks and retrieves their permissions. |
services | Reads a few folders and files and generates a list of startup services. |
deleted_libs | Greps through the output of lsof -nnP to find deleted or missing libraries for running processes. |
cve | Retrieves a unique sorted list of common vulnerabilities and exposures that have been patched on the host system. |
last_logins | Runs last to get information about previous icurrent logged-in user, bootups and when last started logging. |
remote_services | Runs the netstat -tlpen command to obtain a list of active internet connections to servers and underlying programs that are using them. |
ip4routes | Runs the netstat -nr4 command and retrieves information about the kernel’s IPv4 routing tables. |
ip6routes | Runs the netstat -nr6 command and retrieves information about the kernel’s IPv6 routing tables |
apache2 | Retrieves information about the host’s apache2 instance and installation if it exists. |
fail2ban | Retrieves information about the host’s fail2ban instance and installation. |
lsyncd | Checks the status of the live syncing daemon or lsyncd. |
nginx_config | Returns vhosts, version, includes, status (0 if everything is ok when nginx -t is run), configuration path, prefix and configure arguments for local nginx. |
wordpress | Returns the path, version and edition of local Wordpress instances found via the apache2 and nginx configurations. |
magento | Returns the path, version and edition of local Magento instances found via the apache2 and nginx configurations. |
php | Returns information such as version, type (HHVM/PHP), and errors related to PHP. Uses the CLI and log files to to extract this information. |
postfix | Checks the status of the postfix mail server. |
You can use the Rackspace Monitoring API to run Hostinfo checks. To run a hostinfo check, issue the following cURL request:
Use the following cURL request to run Hostinfo checks by using the monitoring service.
curl -H 'X-Auth-Token: $token' '
https://monitoring.api.rackspacecloud.com/v1.0/
/agents/<agent_id>/host_info/<hostinfo_type>
For more information about how to work with checks using the Rackspace Monitoring API, see the Checks section in the Rackspace Monitoring Developer Guide. For more information working with Hostinfo checks, see the Agent host information.
Check status codes
This section provides a list of a set of status messages and codes that can be returned by various check types.
The following table lists the status messages for various check types and provides a resolution for the issue.
Status code or message | Description | Resolution |
---|---|---|
prevented by ACL 'global' | The HTTP remote check is attempting to access an IP Address that is a private address (127.x.x.x, 192.168.x.x, etc). | Private IP addresses are not supported. Specify a public IP address instead. |