Monitoring as a Service#

Rackspace Private Cloud Powered By OpenStack

Last updated: Jun 15, 2021

The monitoring services for Rackspace Private Cloud OpenStack (RPCO) Solutions, which is included as part of the Core Support agreement, ensures that host level monitoring and OpenStack services operate within optimal parameters. The monitoring agent runs a continual and comprehensive set of custom plugins across all hosts. The monitoring pollers continually test OpenStack endpoint connectivity while returning various HTTP response metrics, ensuring that your cloud maintains optimal health.

The monitoring service#

Rackspace delivers Fanatical Support® for the world’s leading clouds. It has specialized expertise, available 24x7x365, and results-obsessed customer service that’s been around since 1999.

Work with the Rackspace support team to customize your monitoring configuration in the following ways:

  • Set the frequency and timeout of your monitoring plug-ins (for example, every minute, every five minutes, and so on).

  • Tune thresholds for definable alarm templates (for example, disk space, memory capacity, and so on).

  • Determine which members of your organization should receive an auto-generated MyRack notification for each alert.

The following table shows the alert severity levels and expected response times:

Severity level

First live response

Emergency — instances are failing or the OpenStack cloud is partially or wholly inoperable

15 minutes

Urgent — new instances cannot be launched or terminated

1 hour

Standard — new instance launches are delayed or errors occur when interacting with the OpenStack API

4 hours

Rackspace data centers versus customer data centers#

The Rackspace monitoring service differs for Rackspace and customer data centers in the following ways:

For both Rackspace data centers and customer data centers, the following elements apply:

  • The poller and agent connection endpoints are requested by using service record (SRV) domain name service (DNS) records and provide a pool of addresses to the following Rackspace Monitoring regions:

    • ORD

    • DFW

    • LON

  • Agents and pollers connect securely over port 443 to endpoint addresses.

  • 24/7 access to all three endpoint regions is required for functional monitoring.

  • Agents are deployed to all physical hosts and kubernetes clusters.

  • Deployment playbooks require access to the following resources:

    • Rackspace Monitoring repositories for agent and poller packages

    • Python Packaging Authority (pypa.io)

    • System-level package repositories (apt or yum)

For customer data centers, only, the following elements apply:

  • Private network monitoring (PNM) pollers are deployed to the physical control plane nodes if endpoints are RFC1918 addresses.

  • Optionally, agent and poller connections can be forwarded through a web proxy.

  • Hardware monitoring of server chassis is the customer’s responsibility. This includes processor, memory, and physical disk monitoring.

    • Standard Rackspace chassis offerings are supported only in Rackspace data centers (for Dell or HP devices) or on OpenStack Anywhere (for roll-in rack deployments).

Monitoring options#

The agents and pollers collect metrics for individual hosts and overall cloud level. The following sections explore these in more detail.

Host-level monitoring#

Hardware monitoring includes the following elements:

  • For Rackspace-managed infrastructure in any data center location: the monitoring service monitors status of processors, memory, physical disks, raid volumes, raid controller, and raid controller battery.

  • All devices can have the following elements monitored: Ping/SSH and bonding interface status

  • The control plane hosts have monitoring and alarming for the following elements: disk space, disk utilization, CPU idle time, memory capacity, and conntrack count. Additional metrics are gathered for network interface throughput, but do not result in a notification or ticket.

  • The non-control plane hosts have monitoring and alarming for disk space and conntrack count. Additional metrics are gathered for disk utilization, CPU idle time, memory capacity, and network interface throughput, but these do not result in a notification or ticket.

RPCO offerings#

RPCO monitoring includes the following elements:

Openstack service

Elements monitored

ceph

ceph overall cluster health, mons health (quorum), osd status, radosgw status

cinder

cinder local api, cinder volume status, cinder scheduler status

designate

designate local api, designate mdns, designate process

glance

glance local api, glance registry

heat

heat local api, heat api cloudformation, heat api cloudwatch

ironic

ironic local api, ironic compute status, ironic conductor status

keystone

keystone local api

kubernetes

mk8s local api, mk8s auth, mk8s etg, mk8s etp, mk8s ui, process checks

neutron

neutron local api, neutron dhcp agent, neutron l3 agent, neutron linuxbridge agent, neutron ovs agent, neutron metering agent, neutron metadata agent, neutron agent conntrack count, neutron qrouter conntrack count

nova

nova local api, nova metadata api, nova cert, nova compute status, nova conductor status, nova console (spice/novnc), nova consoleauth, nova scheduler status

octavia

octavia local api, octavia lb error, octavia quota check, process checks

swift

swift account process, swift account server, swift async, swift container process, swift container replication, swift container server, swift md5, swift object process, swift object replication, swift object server, swift proxy server, swift quarantine, swift time sync

hummingbird

hummingbird account process, hummingbird account server, hummingbird container process, hummingbird container server, hummingbird object process, hummingbird object server, hummingbird proxy server

memcached

memcached local api, memcached connections

galera

cluster size, wsrep state, connections, file limits, innodb row lock time, innodb deadlocks, access errors, aborted connections, holland backup

rabbitmq

disk free, memory, max channels per connection, file limits, processes, sockets, unconsumed messages, queue growth rate, messages without consumers

OpenStack HTTP monitoring includes the following elements:

HTTP function

Elements monitored

HTTP API validation and uptime of all applicable supported service endpoints

cinder, designate, glance, heat api, heat cfn, heat cw, horizon, ironic, keystone, managed kubernetes, mk8s ui, neutron, nova, octavia

HTTP access/health check

hummingbird, swift

HTTPs certification expiry

if applied to endpoints