Cloud Best Practices

Best Practices with Cloud and OpenStack Flex

This guide outlines principles and best practices for building resilient, maintainable, and scalable infrastructure on an OpenStack-based public cloud like Rackspace OpenStack Flex. These recommendations are important parts in operating in a shared, multi-tenant cloud environment where treating your infrastructure as dynamic, not static, helps lead to operational success.

Cattle, Not Pets

❝If your server has a name, you're doing it wrong.❞

Cloud-native infrastructure assumes failure. You don’t manually repair a server; you redeploy it from a known working state.

Key Practices:

Use generic hostnames like web-001, app-002, and never finance-db01.
No configuration drift — servers are built from the same base image + configuration automation.

Automation Tools:

cloud-init: Bootstrap config for new VMs by using cloud-init scripts fed into the user-data section when building a VM. This allows you to enact changes within the OS at build time rather than needing to SSH in afterwards to perform the steps.
Ansible/Terraform: Provisioning and repeatable configuration with tools like Ansible and Terraform allow you to build out your infrastructure in a standardized and efficient way .

Sample cloud-init Script:

#cloud-config
hostname: appserver
package_update: true
packages:
  - nginx
runcmd:
  - systemctl enable nginx
  - systemctl start nginx

NOTE: Any server that can’t be rebuilt in under 10 minutes from code + image is a liability

Infrastructure as Code (IaC)

Declarative infrastructure should be seen as a requirement, not an enhancement.

What to Define as Code:

VMs, networks, subnets, routers, security groups
DNS records
Floating IP associations
Volume attachments

Tooling Recommendations:

Heat Templates use Orchestration for OpenStack-native workflows
Terraform with OpenStack provider for cross-cloud compatibility. Read more about working with Terraform or OpenTofu on OpenStack Flex
Ansible for configuration after provisioning. Read more about working with Ansible on OpenStack Flex

Example: Terraform VM Resource

resource "openstack_compute_instance_v2" "web" {
  name            = "web-server"
  image_name      = "Ubuntu 24.04"
  flavor_name     = "gp.5.4.8"
  key_pair        = "default-key"
  security_groups = ["web"]

  network {
    uuid = openstack_networking_network_v2.private.id
  }

  user_data = file("cloud-init-web.yaml")
}

Don’t Marry IPs — Use DNS Everywhere

Hard-coding IP addresses ties your infrastructure to brittle assumptions. If an IP is ever lost or needs to be updated - you may have to manually dig through your code to find and make changes. Instead, leverage:

Best Practices:

Set up FQDNs for internal services (e.g., db.internal.example.com)
Use Floating IPs only at edge endpoints; keep internal traffic on tenant networks.

Example:

# BAD PRACTICE
DATABASE_HOST=10.5.3.42

# GOOD PRACTICE
DATABASE_HOST=db.internal.example.com

Snapshots ≠ Backups

Snapshots

Good for temporary state (e.g., pre-upgrade)
Are stored in the same availability zone
Need to be tested before being relied upon.

Backups

Use Cinder Volume Backups to remote or redundant storage backends
Automate backups using tools like Restic for file/app-level backup
Backup frequency should align with Recovery Point Objective goals

Best Practices:

Perform test restores regularly
Store metadata with backups (e.g., database schema + backup timestamp)
Separate data volumes from boot volumes

Security Best Practices

Instance Access

Use SSH Key Pairs , never passwords
Store keys in a password manager or secrets vault

Network Segmentation

Limit inbound rules by source CIDR
Avoid “Allow All” (0.0.0.0/0) unless necessary
Use bastion hosts to further isolate entry points to devices.

Boot, Storage, and Scaling

Boot Strategy

Boot from Volume: Enables a persistent root disk that persists after the VM is deleted.
Ephemeral Boot: Fast throwaway workloads - never put important data that isn't considered disposable on an ephemeral disk.

Storage Strategy

Separate storage types by role:
- Boot volume for OS
- Data volume for DBs
- Scratch ephemeral for temp/cache

Observability and Monitoring

Key Metrics to Monitor

CPU, Memory, Disk I/O
API response times
Instance availability
Backup status

Lifecycle Management and Hygiene

Regular Maintenance Tasks:

Rotate Floating IPs, SSH keys, or other credentials
Delete orphaned volumes, ports, and snapshots that are no longer in-use
Audit security groups and access lists

Use Tags

Tag resources with owner, project, env (e.g., dev, staging, prod)
Enables cleanup automation and cost tracking

Dev/Test/Staging Environments

Maintain non-production environments like a Dev or Staging environment.
Mirror your production infrastructure in code after testing successfully in non-production environments.
Use CI/CD to deploy infrastructure changes safely

Final Takeaways

Principle	Why It Matters
Disposable Infrastructure	Resilience, speed, consistency
IaC Everywhere	Reproducibility, versioning
DNS Over IPs	Flexibility, maintainability
Separate State	Makes scaling and recovery easier
Automate Backups	Protection against data loss

Updated 2 months ago