Cloud Best Practices
Best Practices with Cloud and OpenStack Flex
This guide outlines principles and best practices for building resilient, maintainable, and scalable infrastructure on an OpenStack-based public cloud like Rackspace OpenStack Flex. These recommendations are important parts in operating in a shared, multi-tenant cloud environment where treating your infrastructure as dynamic, not static, helps lead to operational success.
Cattle, Not Pets
❝If your server has a name, you're doing it wrong.❞
Cloud-native infrastructure assumes failure. You don’t manually repair a server; you redeploy it from a known working state.
Key Practices:
- Use generic hostnames like
web-001
,app-002
, and neverfinance-db01
. - No configuration drift — servers are built from the same base image + configuration automation.
Automation Tools:
- cloud-init: Bootstrap config for new VMs by using cloud-init scripts fed into the user-data section when building a VM. This allows you to enact changes within the OS at build time rather than needing to SSH in afterwards to perform the steps.
- Ansible/Terraform: Provisioning and repeatable configuration with tools like Ansible and Terraform allow you to build out your infrastructure in a standardized and efficient way .
Sample cloud-init Script:
#cloud-config
hostname: appserver
package_update: true
packages:
- nginx
runcmd:
- systemctl enable nginx
- systemctl start nginx
NOTE: Any server that can’t be rebuilt in under 10 minutes from code + image is a liability
Infrastructure as Code (IaC)
Declarative infrastructure should be seen as a requirement, not an enhancement.
What to Define as Code:
- VMs, networks, subnets, routers, security groups
- DNS records
- Floating IP associations
- Volume attachments
Tooling Recommendations:
- Heat Templates use Orchestration for OpenStack-native workflows
- Terraform with OpenStack provider for cross-cloud compatibility. Read more about working with Terraform or OpenTofu on OpenStack Flex
- Ansible for configuration after provisioning. Read more about working with Ansible on OpenStack Flex
Example: Terraform VM Resource
resource "openstack_compute_instance_v2" "web" {
name = "web-server"
image_name = "Ubuntu 24.04"
flavor_name = "gp.5.4.8"
key_pair = "default-key"
security_groups = ["web"]
network {
uuid = openstack_networking_network_v2.private.id
}
user_data = file("cloud-init-web.yaml")
}
Don’t Marry IPs — Use DNS Everywhere
Hard-coding IP addresses ties your infrastructure to brittle assumptions. If an IP is ever lost or needs to be updated - you may have to manually dig through your code to find and make changes. Instead, leverage:
Best Practices:
- Set up FQDNs for internal services (e.g.,
db.internal.example.com
) - Use Floating IPs only at edge endpoints; keep internal traffic on tenant networks.
Example:
# BAD PRACTICE
DATABASE_HOST=10.5.3.42
# GOOD PRACTICE
DATABASE_HOST=db.internal.example.com
Snapshots ≠ Backups
Snapshots
- Good for temporary state (e.g., pre-upgrade)
- Are stored in the same availability zone
- Need to be tested before being relied upon.
Backups
- Use Cinder Volume Backups to remote or redundant storage backends
- Automate backups using tools like Restic for file/app-level backup
- Backup frequency should align with Recovery Point Objective goals
Best Practices:
- Perform test restores regularly
- Store metadata with backups (e.g., database schema + backup timestamp)
- Separate data volumes from boot volumes
Security Best Practices
Instance Access
- Use SSH Key Pairs , never passwords
- Store keys in a password manager or secrets vault
Network Segmentation
- Limit inbound rules by source CIDR
- Avoid “Allow All” (
0.0.0.0/0
) unless necessary - Use bastion hosts to further isolate entry points to devices.
Boot, Storage, and Scaling
Boot Strategy
- Boot from Volume: Enables a persistent root disk that persists after the VM is deleted.
- Ephemeral Boot: Fast throwaway workloads - never put important data that isn't considered disposable on an ephemeral disk.
Storage Strategy
-
Separate storage types by role:
- Boot volume for OS
- Data volume for DBs
- Scratch ephemeral for temp/cache
Observability and Monitoring
Key Metrics to Monitor
- CPU, Memory, Disk I/O
- API response times
- Instance availability
- Backup status
Lifecycle Management and Hygiene
Regular Maintenance Tasks:
- Rotate Floating IPs, SSH keys, or other credentials
- Delete orphaned volumes, ports, and snapshots that are no longer in-use
- Audit security groups and access lists
Use Tags
- Tag resources with
owner
,project
,env
(e.g., dev, staging, prod) - Enables cleanup automation and cost tracking
Dev/Test/Staging Environments
- Maintain non-production environments like a Dev or Staging environment.
- Mirror your production infrastructure in code after testing successfully in non-production environments.
- Use CI/CD to deploy infrastructure changes safely
Final Takeaways
Principle | Why It Matters |
---|---|
Disposable Infrastructure | Resilience, speed, consistency |
IaC Everywhere | Reproducibility, versioning |
DNS Over IPs | Flexibility, maintainability |
Separate State | Makes scaling and recovery easier |
Automate Backups | Protection against data loss |
Updated about 3 hours ago