B.2. Agent check types

 B.2.1. agent.apache

 Summary

The agent.apache check will retrieve Apache HTTP server metrics

Table B.22. Attributes
FieldDescriptionValidation
timeout Plugin execution timeout in milliseconds
  • Optional

  • Integer

url URL (defaults to http://127.0.0.1/server-status)
  • Optional

  • URL

Table B.23. Metrics
MetricDescriptionType
busy_workersThe number of workers serving requests. Int64
bytes_per_requestAverages giving the number of request per second, the number of bytes served second. Int64
bytes_per_secondAverages giving the numeber of requests per second, the number of bytes per request. Int64
closingThe number of workers closing connection. Int64
cpu_loadTotal percentage of CPU used by workers. Double
dnsThe number of workers performing DNS lookup. Int64
gracefully_finishingThe number of workers gracefully finishing. Int64
idleThe number of idle cleanup of workers. Int64
idle_workersThe number of idle workers. Int64
keep_aliveThe number of workers kept alive (reading). Int64
loggingThe number of workers logging. Int64
openThe number of workers with no current process. Int64
readingThe number of workers reading request. Int64
requests_per_secondNumber of requests per second.Int64
sendingThe number of workers sending reply.Int64
startingThe number of workers starting up. Int64
total_accessTotal number of accesses served.Int64
total_kbytesTotal kilobytes served.Int64
uptimeTime since the last start/restart in milliseconds.Int64
waitingThe number of workers waiting for connection.Int64

 B.2.2. agent.cpu

 Summary

The agent.cpu check will attempt to measure the usage of the CPU on a host.

Table B.24. Attributes
FieldDescriptionValidation
No fields are present for this particular check type.
Table B.25. Metrics
MetricDescriptionType
idle_percent_averageRecent percentage of CPU time spent idle.Double
irq_percent_averageRecent percentage of CPU time spent handling hardware interrupts.Double
max_cpu_usageRecent percentage utilization of the most-utilized CPU. This is useful to detect when some CPUs are "pegged" while others are idle.Double
min_cpu_usageRecent percentage utilization of the least-utilized CPU. This is useful to detect when some CPUs are "pegged" while others are idle.Double
stolen_percent_averageRecent percentage of CPU time spent waiting for the CPU to service other virtual CPUs.Double
sys_percent_averageRecent percentage of CPU time utilized by kernel mode processes.Double
usage_averageRecent percentage of CPU time utilized by all processes.Double
user_percent_averageRecent percentage of CPU time utilized by user mode processes.Double
wait_percent_averageRecent percentage of CPU time utilized by processes in a "wait" state.Double

 B.2.3. agent.disk

 Summary

The agent.disk check exposes disk related metrics (service time, wait time, etc.).

Table B.26. Attributes
FieldDescriptionValidation
target The disk to check (eg '/dev/xvda1')
  • String between 1 and 512 characters long

Table B.27. Metrics
MetricDescriptionType
queueDisk utilization time, the prefix / will change dependending on the mount points discovered.Int64
read_bytesThe number of physical disk bytes read, the prefix / will change dependending on the mount points discovered.Int64
readsThe number of physical disk reads, the prefix / will change dependending on the mount points discovered.Int64
rtimeThe amount of time spent reading, the prefix / will change dependending on the mount points discovered.Int64
service_timeREMOVEInt64
timeREMOVEInt64
write_bytesThe number of physical disk bytes written, the prefix / will change dependending on the mount points discovered.Int64
writesThe number of physical disk writes, the prefix / will change dependending on the mount points discovered.Int64
wtimeThe amount of time spent writing, the prefix / will change dependending on the mount points discovered.Int64

 B.2.4. agent.filesystem

 Summary

The agent.filesystem check exposes file system related metrics (free space, used space, etc.)

Table B.28. Attributes
FieldDescriptionValidation
target The mount point to check (eg '/var' or 'C:')
  • String between 1 and 512 characters long

Table B.29. Metrics
MetricDescriptionType
availAvailable space on the filesystem in kilobytes, not including reserved space.Int64
freeFree space available on the filesystem in kilobytes, including reserved space.Int64
optionsNo description is available at the moment.String
totalTotal space on the filesystem, in kilobytes.Int64
usedUsed space on the filesystem, in kilobytes.Int64
filesNumber of inodes on the filesystem. Note: this metric is not available on Windows.Int64
free_filesNumber of free inodes on the filesystem. Note: this metric is not available on Windows.Int64

 B.2.5. agent.load_average

 Summary

The agent.load_average check will attempt to measure the Unix-style Load Average on a host.

Table B.30. Attributes
FieldDescriptionValidation
No fields are present for this particular check type.
Table B.31. Metrics
MetricDescriptionType
1m One minute load average. Double
5m Five minute load average. Double
15m Fifteen minute load average. Double

 B.2.6. agent.memory

Table B.32. Attributes
FieldDescriptionValidation
No fields are present for this particular check type.
Table B.33. Metrics
MetricDescriptionType
actual_freeThe actual amount of free memory.Int64
actual_usedThe actual amount of used memory.Int64
freeFree space available on the filesystem in kilobytes, including reserved space.Int64
ramNo description is available at the moment.Int64
swap_freeNo description is available at the moment.Int64
swap_page_inNo description is available at the moment.Int64
swap_page_outNo description is available at the moment.Int64
swap_totalNo description is available at the moment.Int64
swap_usedNo description is available at the moment.Int64
totalTotal space on the filesystem, in kilobytes.Int64
usedUsed space on the filesystem, in kilobytes.Int64

 B.2.7. agent.mysql

 Summary

The agent.mysql check will retrieve MySQL server metrics

[Note]Note

Except for the ‘replication.slave_running' metric, all metrics starting with ‘replication’ will not show up if there is no slave running.

Table B.34. Attributes
FieldDescriptionValidation
host Mysql server hostname (default: 127.0.0.1)
  • Optional

  • Valid hostname, IPv4 or IPv6 address

mycnf Load my.cnf
  • Optional

  • Boolean

password Server password
  • Optional

  • String between 1 and 255 characters long

port Mysql server port (default: 3306)
  • Optional

  • Integer between 1-65535 inclusive

socket Path to domain socket
  • Optional

  • String between 1 and 255 characters long

timeout Plugin execution timeout in milliseconds
  • Optional

  • Integer

username Server username
  • Optional

  • String between 1 and 16 characters long

Table B.35. Metrics
MetricDescriptionType
bytes.receivedThe number of bytes received from all clients. (statvar_Bytes_received)Cumulative
bytes.sentThe number of bytes sent to all clients. (statvar_Bytes_sent)Cumulative
core.aborted_clientsThe number of connections that were aborted because the client died without closing the connection properly. (statvar_Aborted_clients)Instantaneous
core.connectionsThe number of connection attempts (successful or not) to the MySQL server. (statvar_Connections)Cumulative
core.queriesThe number of statements executed by the server. (statvar_Queries)Cumulative
core.uptimeThe number of seconds that the server has been up. (statvar_Uptime)Instantaneous
handler.commitThe number of internal COMMIT statements. (statvar_Handler_commit)Cumulative
handler.deleteThe number of times that rows have been deleted from tables. (statvar_Handler_delete)Cumulative
handler.read_firstThe number of times the first entry in an index was read. (statvar_Handler_read_first)Cumulative
handler.read_keyThe number of requests to read a row based on a key. If this value is high, it is a good indication that your tables are properly indexed for your queries. (statvar_Handler_read_key)Cumulative
handler.read_nextThe number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan. (statvar_Handler_read_next)Cumulative
handler.read_prevThe number of requests to read the previous row in key order. This read method is mainly used to optimize ORDER BY ... DESC. (statvar_Handler_read_prev)Cumulative
handler.read_rndThe number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that do not use keys properly. (statvar_Handler_read_rnd)Cumulative
handler.read_rnd_nextThe number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. (statvar_Handler_read_rnd_next)Cumulative
handler.rollbackThe number of requests for a storage engine to perform a rollback operation. (statvar_Handler_rollback).Instantaneous
handler.savepointThe number of requests for a storage engine to place a savepoint. (statvar_Handler_savepoint).Instantaneous
handler.savepoint_rollbackThe number of requests for a storage engine to roll back to a savepoint. (statvar_Handler_savepoint_rollback).Instantaneous
handler.updateThe number of requests to update a row in a table. (statvar_Handler_update).Cumulative
handler.writeThe number of requests to insert a row in a table. (statvar_Handler_write).Cumulative
innodb.buffer_pool_pages_dataThe number of pages containing data (dirty or clean). (statvar_Innodb_buffer_pool_pages_data).Instantaneous
innodb.buffer_pool_pages_dirtyThe number of pages currently dirty. (statvar_Innodb_buffer_pool_pages_dirty).Instantaneous
innodb.buffer_pool_pages_flushedThe number of buffer pool page-flush requests. (statvar_Innodb_buffer_pool_pages_flushed).Instantaneous
innodb.buffer_pool_pages_freeThe number of free pages. (statvar_Innodb_buffer_pool_pages_free).Instantaneous
innodb.buffer_pool_pages_totalThe total size of the buffer pool, in pages. (statvar_Innodb_buffer_pool_pages_total).Instantaneous
innodb.buffer_pool_read_requestsThe number of logical read requests. (statvar_Innodb_buffer_pool_read_requests).Cumulative
innodb.buffer_pool_readsThe number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. (statvar_Innodb_buffer_pool_reads).Cumulative
innodb.buffer_pool_sizeThe size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. (sysvar_innodb_buffer_pool_size).Instantaneous
innodb.data_pending_fsyncsThe current number of pending fsync() operations. (statvar_Innodb_data_pending_fsyncs).Instantaneous
innodb.data_pending_readsThe current number of pending reads. (statvar_Innodb_data_pending_reads).Instantaneous
innodb.data_pending_writesThe current number of pending writes. (statvar_Innodb_data_pending_writes).Instantaneous
innodb.pages_createdThe number of pages created. (statvar_Innodb_pages_created).Cumulative
innodb.pages_readThe number of pages read. (statvar_Innodb_pages_read).Cumulative
innodb.pages_writtenThe number of pages written. (statvar_Innodb_pages_written).Cumulative
innodb.row_lock_timeThe total time spent in acquiring row locks, in milliseconds. (statvar_Innodb_row_lock_time).Cumulative
innodb.row_lock_time_avgThe average time to acquire a row lock, in milliseconds. (statvar_Innodb_row_lock_time_avg).Instantaneous
innodb.row_lock_time_maxThe maximum time to acquire a row lock, in milliseconds. (statvar_Innodb_row_lock_time_max).Instantaneous
innodb.row_lock_waitsThe number of times a row lock had to be waited for. (statvar_Innodb_row_lock_waits).Cumulative
innodb.rows_deletedThe number of rows deleted from InnoDB tables. (statvar_Innodb_rows_deleted).Cumulative
innodb.rows_insertedThe number of rows inserted into InnoDB tables. (statvar_Innodb_rows_inserted).Cumulative
innodb.rows_readThe number of rows read from InnoDB tables. (statvar_Innodb_rows_read).Cumulative
innodb.rows_updatedThe number of rows updated in InnoDB tables. (statvar_Innodb_rows_updated).Cumulative
key.buffer_sizeIndex blocks for MyISAM tables are buffered and are shared by all threads. (sysvar_key_buffer_size).Instantaneous
max.connectionsThe maximum permitted number of simultaneous client connections. (sysvar_max_connections).Instantaneous
qcache.free_blocksThe number of free memory blocks in the query cache. (statvar_Qcache_free_blocks).Instantaneous
qcache.free_memoryThe amount of free memory for the query cache. (statvar_Qcache_free_memory).Instantaneous
qcache.hitsThe number of query cache hits. (statvar_Qcache_hits).Cumulative
qcache.insertsThe number of queries added to the query cache. (statvar_Qcache_inserts).Cumulative
qcache.lowmem_prunesThe number of queries that were deleted from the query cache because of low memory. (statvar_Qcache_lowmem_prunes).Instantaneous
qcache.not_cachedThe number of noncached queries (not cacheable, or not cached due to the query_cache_type setting). (statvar_Qcache_not_cached).Instantaneous
qcache.queries_in_cacheThe number of queries registered in the query cache. (statvar_Qcache_queries_in_cache).Cumulative
qcache.sizeThe amount of memory allocated for caching query results. (sysvar_query_cache_size).Instantaneous
qcache.total_blocksThe total number of blocks in the query cache. (statvar_Qcache_total_blocks).Cumulative
replication.exec_master_log_posThe position in the current master binary log file to which the SQL thread has read and executed, marking the start of the next transaction or event to be processed. (show-slave-status.html).Instantaneous
replication.last_errnoThe error number returned by the most recently executed statement. (show-slave-status.html).Instantaneous
replication.last_io_errorerror message of the most recent error that caused the I/O thread to stop (show-slave-status.html).String
replication.max_relay_log_sizeIf a write by a replication slave to its relay log causes the current log file size to exceed the value of this variable, the slave rotates the relay logs (closes the current file and opens the next one). (sysvar_max_relay_log_size).Instantaneous
replication.read_master_log_posThe position in the current master binary log file up to which the I/O thread has read. (show-slave-status.html).Instantaneous
replication.relay_log_posThe position in the current relay log file up to which the SQL thread has read and executed. (show-slave-status.html).Instantaneous
replication.seconds_behind_masterIn essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. (show-slave-status.html).Instantaneous
replication.slave_io_runningWhether the I/O thread is started and has connected successfully to the master. Internally, the state of this thread is represented by one of the following three values: MYSQL_SLAVE_NOT_RUN, MYSQL_SLAVE_RUN_NOT_CONNECT, MYSQL_SLAVE_RUN_CONNECT (show-slave-status.html).Boolean
replication.slave_io_stateA copy of the State field of the SHOW PROCESSLIST output for the slave I/O thread. This tells you what the thread is doing: trying to connect to the master, waiting for events from the master, reconnecting to the master, and so on. (show-slave-status.html).String
replication.slave_open_temp_tablesThe number of temporary tables that the slave SQL thread currently has open. If the value is greater than zero, it is not safe to shut down the slave; see Section 17.4.1.22, “Replication and Temporary Tables”. (statvar_Slave_open_temp_tables).Instantaneous
replication.slave_retried_transactionsThe total number of times since startup that the replication slave SQL thread has retried transactions. (statvar_Slave_retried_transactions).Instantaneous
replication.slave_runningThis is ON if this server is a replication slave that is connected to a replication master, and both the I/O and SQL threads are running; otherwise, it is OFF. (statvar_Slave_running).String
replication.slave_sql_runningWhether the SQL thread is started. (show-slave-status.html).Boolean
thread.cache_sizeHow many threads the server should cache for reuse. (sysvar_thread_cache_size).Instantaneous
threads.connectedThe number of currently open connections. (statvar_Threads_connected).Instantaneous
threads.createdThe number of threads created to handle connections. (statvar_Threads_created).Cumulative
threads.runningThe number of threads that are not sleeping. (statvar_Threads_running).Instantaneous

 B.2.8. agent.network

 Summary

The agent.network check will attempt to measure the usage of network devices on a host.

Table B.36. Attributes
FieldDescriptionValidation
target The network device to check (eg 'eth0')
  • String between 1 and 512 characters long

Table B.37. Metrics
MetricDescriptionType
rx_bytesThe number of bytes received over the interface.Int64
rx_droppedThe number of packets received and subsequently dropped over the interface.Int64
rx_errorsThe number of errors received over the interface.Int64
rx_packetsThe number of packets received over the interface.Int64
speedNo description is available at the moment.Int64
tx_bytesThe number of bytes transmitted over the interface.Int64
tx_droppedThe number of packets ateempted transmitting and subsequently dropped over the interface.Int64
tx_errorsThe number of errors while transmitting over the interface.Int64
tx_packetsThe number of packets transmitted over the interface.Int64

 B.2.9. agent.redis

 Summary

The agent.redis check will retrieve Redis server metrics

Table B.38. Attributes
FieldDescriptionValidation
host Redis server hostname
  • Valid hostname, IPv4 or IPv6 address

port Redis server port
  • Integer between 1-65535 inclusive

password Optional redis server password
  • Optional

  • String between 1 and 255 characters long

timeout Connection timeout in milliseconds
  • Optional

  • Integer

Table B.39. Metrics
MetricDescriptionType
bgrewriteaof_in_progress(Redis 2.4.16 only) Flag indicating a AOF rewrite operation is on-going. Int32
bgsave_in_progress(Redis 2.4.16 only) Flag indicating a RDB save is on-going. Int32
blocked_clientsNumber of clients pending on a blocking call (BLPOP, BRPOP, BRPOPLPUSH) Int32
changes_since_last_save(Redis 2.4.16 only) Number of changes since the last dump. Int32
connected_clientsNumber of client connections (excluding connections from slaves). Int32
evicted_keysNumber of evicted keys due to maxmemory limit. Int32
pubsub_patternsGlobal number of pub/sub pattern with client subscriptions. Int32
total_commands_processedTotal number of commands processed by the server. Gauge
total_connections_receivedTotal number of connections accepted by the server. Gauge
uptime_in_secondsNumber of seconds since Redis server start. Int32
used_memoryTotal number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc. Int32
versionVersion of the Redis server. String

 B.2.10. agent.windows_perfos

 Summary

The agent.windows_perfos check will return metrics regarding windows performance data.

Table B.40. Attributes
FieldDescriptionValidation
No fields are present for this particular check type.
Table B.41. Metrics
MetricDescriptionType
AlignmentFixupsPersecAlignment Fixups/sec - Shows the rate, in incidents per second, at which alignment faults were fixed by the system.Uint32
ContextSwitchesPersecContext Switches/sec - Shows the combined rate, in incidents per second, at which all processors on the computer were switched from one thread to another. It is the sum of the values of Thread Context Switches/sec for each thread running on all processors on the computer, and is measured in numbers of switches. Context switches occur when a running thread voluntarily relinquishes the processor, or is preempted by a higher priority, ready thread.Uint32
ExceptionDispatchesPersecException Dispatches/sec - Shows the rate, in incidents per second, at which exceptions were dispatched by the system.Uint32
FileControlBytesPersecFile Control Bytes/sec - Shows the overall rate, in incidents per second, at which bytes were transferred for all file system operations that were neither read nor write operations, such as file system control requests and requests for information about device characteristics or status.Uint64
FileControlOperationsPersecFile Control Operations/sec - Shows the combined rate, in incidents per second, of file system operations that were neither read nor write operations, such as file system control requests and requests for information about device characteristics or status. This is the inverse of File Data Operations/sec.Uint32
FileDataOperationsPersecFile Data Operations/sec - Shows the combined rate, in incidents per second, of read and write operations on disks, serial, or parallel devices. This is the inverse of File Control Operations/sec.Uint32
FileReadBytesPersecFile Read Bytes/sec - Shows the overall rate, in incidents per second, at which bytes were read to satisfy file system read requests to all devices on the computer, including read operations from the file system cache.Uint64
FileReadOperationsPersecFile Read Operations/sec - Shows the combined rate, in incidents per second, of file system read requests to all devices on the computer, including requests to read from the file system cache.Uint32
FileWriteBytesPersecFile Write Bytes/sec - Shows the overall rate, in incidents per second, at which bytes were written to satisfy file system write requests to all devices on the computer, including write operations to the file system cache.Uint64
FileWriteOperationsPersecFile Write Operations/sec - Shows the combined rate, in incidents per second, of file system write requests to all devices on the computer, including requests to write to data in the file system cache.Uint32
FloatingEmulationsPersecFloating Emulations/sec - Shows the rate, in incidents per second, of floating emulations performed by the system.Uint32
PercentRegistryQuotaInUsePercentage of the total registry quota allowed that is currently being used by the system. This property displays the current percentage value only; it is not an average.Uint32
ProcessesShows the number of processes in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval. Each process represents a program that is running.Uint32
ProcessorQueueLengthProcessor Queue Length - Shows the number of threads in the processor queue. Unlike the disk counters, this counter shows ready threads only, not threads that are running. There is a single queue for processor time, even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of greater than two threads generally indicates processor congestion.Uint32
SystemCallsPersecSystem Calls/sec - Shows the combined rate, in incidents per second, of calls to operating system service routines by all processes running on the computer. These routines perform all of the basic scheduling and synchronization of activities on the computer, and provide access to nongraphic devices, memory management, and name space management.Uint32
SystemUpTimeSystem Up Time - Shows the total time, in seconds, that the computer has been operational since it was last started.Uint64
ThreadsShows the number of threads in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval. A thread is the basic executable entity that can execute instructions in a processor.Uint32

 B.2.11. agent.plugin

 Summary

The agent.plugin check will attempt to run a custom plugin on a host.

 Installing Custom Plugins

Custom plugins are simply executable files which report metrics via stdout. Plugins are placed on the server to be monitored at an installation path that depends on the operating system:

Operating System Installation Path
Linux /usr/lib/rackspace-monitoring-agent/plugins/
Windows (32-bit agent installed on a 64-bit system ) C:\Program Files (x86)\Rackspace Monitoring\plugins
Windows (64-bit agent installed on a 64-bit system or 32-bit agent installed on a 32-bit system) C:\Program Files\Rackspace Monitoring\plugins

Once the plugin has been installed to the server, create an agent.plugin check that specifies the name of the executable file, and the plugin will begin reporting metrics to the monitoring system, just like any other check. If the plugin requires any command line arguments, these may be specified using the optional args array.

Table B.42. Attributes
FieldDescriptionValidation
file Name of the plugin file
  • String matching the regex //[a-zA-Z0-9\.\-_]+//

args Command-line arguments which are passed to the plugin
  • Optional

  • Array [Non-empty string]

  • Array or object with number of items between 0 and 10

timeout Plugin execution timeout in milliseconds
  • Optional

  • Integer

Table B.43. Metrics
MetricDescriptionType
Available metrics are determined by the plugin.

 Community Plugin Repository

A curated repository of plugins created by Cloud Monitoring users is avaliable on GitHub. Contributions are welcome!

[Note]Note

The Cloud Monitoring Agent is also capable of executing Cloudkick plugins, so if you are a Cloudkick user you can just drop in any existing plugin and it should just work.

 Creating Custom Plugins

Creating custom plugins is as simple as writing a script that prints a status and up to 10 metrics to standard out. The format of the status line is:

status <status>

The status string should describe whether the check was able to successfully gather metrics. It could be as simple as "success" to incidate that metrics were successfully gathered. When an error occurs that prevents metrics from being gathered, plugins should print a status that describes the error, then should exit non-zero without printing any metric lines.

The status line may be followed by up to 10 metric lines. The format of a metric line is:

metric <name> <type> <value>

where:

name

is the name of the metric. No spaces are allowed. Example: memory_free.

type

is the type of the metric. This must be one of:

int32

Signed 32 bit integer value.

uint32

Unsigned 32 bit integer value.

int64

Signed 64 bit integer value.

uint64

Unsigned 64 bit integer value.

double

Floating point values.

string

A string value.Note: the monitoring system records string metrics every time they change. String metrics are designed for recording an enumerated state which infrequently changes (for example an HTTP response code which is always 200 during normal operation). You should not store arbitrary, frequently changing values in a string metric.

value

is the value of the metric.

Putting it all together, the output of a plugin that has successfully executed might look something like:

status Turkey thermometer returned valid response
metric internal_temperature uint32 165
metric ambient_temperature uint32 325

If the plugin failed, it might print the following before exiting non-zero:

status Turkey thermometer not responding


loading table of contents...