Cloud Best Practices

Best Practices with Cloud and OpenStack Flex

This guide outlines principles and best practices for building resilient, maintainable, and scalable infrastructure on an OpenStack-based public cloud like Rackspace OpenStack Flex. These recommendations are important parts in operating in a shared, multi-tenant cloud environment where treating your infrastructure as dynamic, not static, helps lead to operational success.

Cattle, Not Pets

❝If your server has a name, you're doing it wrong.❞

Cloud-native infrastructure assumes failure. You don’t manually repair a server; you redeploy it from a known working state.

Key Practices:

  • Use generic hostnames like web-001, app-002, and never finance-db01.
  • No configuration drift — servers are built from the same base image + configuration automation.

Automation Tools:

  • cloud-init: Bootstrap config for new VMs by using cloud-init scripts fed into the user-data section when building a VM. This allows you to enact changes within the OS at build time rather than needing to SSH in afterwards to perform the steps.
  • Ansible/Terraform: Provisioning and repeatable configuration with tools like Ansible and Terraform allow you to build out your infrastructure in a standardized and efficient way .

Sample cloud-init Script:

#cloud-config
hostname: appserver
package_update: true
packages:
  - nginx
runcmd:
  - systemctl enable nginx
  - systemctl start nginx

NOTE: Any server that can’t be rebuilt in under 10 minutes from code + image is a liability

Infrastructure as Code (IaC)

Declarative infrastructure should be seen as a requirement, not an enhancement.

What to Define as Code:

  • VMs, networks, subnets, routers, security groups
  • DNS records
  • Floating IP associations
  • Volume attachments

Tooling Recommendations:

Example: Terraform VM Resource

resource "openstack_compute_instance_v2" "web" {
  name            = "web-server"
  image_name      = "Ubuntu 24.04"
  flavor_name     = "gp.5.4.8"
  key_pair        = "default-key"
  security_groups = ["web"]

  network {
    uuid = openstack_networking_network_v2.private.id
  }

  user_data = file("cloud-init-web.yaml")
}

Don’t Marry IPs — Use DNS Everywhere

Hard-coding IP addresses ties your infrastructure to brittle assumptions. If an IP is ever lost or needs to be updated - you may have to manually dig through your code to find and make changes. Instead, leverage:

Best Practices:

  • Set up FQDNs for internal services (e.g., db.internal.example.com)
  • Use Floating IPs only at edge endpoints; keep internal traffic on tenant networks.

Example:

# BAD PRACTICE
DATABASE_HOST=10.5.3.42

# GOOD PRACTICE
DATABASE_HOST=db.internal.example.com

Snapshots ≠ Backups

Snapshots

  • Good for temporary state (e.g., pre-upgrade)
  • Are stored in the same availability zone
  • Need to be tested before being relied upon.

Backups

  • Use Cinder Volume Backups to remote or redundant storage backends
  • Automate backups using tools like Restic for file/app-level backup
  • Backup frequency should align with Recovery Point Objective goals

Best Practices:

  • Perform test restores regularly
  • Store metadata with backups (e.g., database schema + backup timestamp)
  • Separate data volumes from boot volumes

Security Best Practices

Instance Access

  • Use SSH Key Pairs , never passwords
  • Store keys in a password manager or secrets vault

Network Segmentation

  • Limit inbound rules by source CIDR
  • Avoid “Allow All” (0.0.0.0/0) unless necessary
  • Use bastion hosts to further isolate entry points to devices.

Boot, Storage, and Scaling

Boot Strategy

  • Boot from Volume: Enables a persistent root disk that persists after the VM is deleted.
  • Ephemeral Boot: Fast throwaway workloads - never put important data that isn't considered disposable on an ephemeral disk.

Storage Strategy

  • Separate storage types by role:

    • Boot volume for OS
    • Data volume for DBs
    • Scratch ephemeral for temp/cache

Observability and Monitoring

Key Metrics to Monitor

  • CPU, Memory, Disk I/O
  • API response times
  • Instance availability
  • Backup status

Lifecycle Management and Hygiene

Regular Maintenance Tasks:

  • Rotate Floating IPs, SSH keys, or other credentials
  • Delete orphaned volumes, ports, and snapshots that are no longer in-use
  • Audit security groups and access lists

Use Tags

  • Tag resources with owner, project, env (e.g., dev, staging, prod)
  • Enables cleanup automation and cost tracking

Dev/Test/Staging Environments

  • Maintain non-production environments like a Dev or Staging environment.
  • Mirror your production infrastructure in code after testing successfully in non-production environments.
  • Use CI/CD to deploy infrastructure changes safely

Final Takeaways

PrincipleWhy It Matters
Disposable InfrastructureResilience, speed, consistency
IaC EverywhereReproducibility, versioning
DNS Over IPsFlexibility, maintainability
Separate StateMakes scaling and recovery easier
Automate BackupsProtection against data loss