AWS WAF pillar four: Performance efficiency tools and best practices

Efficient business operations play an increasing role in how companies can truly transform business through cloud computing.

Harnessing the full power of the AWS cloud involves far more than building a solid technical infrastructure. Amazon developed the Well-Architected Framework (WAF) to enable companies to build the most secure, high-performing, resilient, and efficient infrastructure possible for their businesses.

Operational excellence, one of the five pillars or areas of focus in the AWS WAF, covers best practices around developing robust, repeatable processes for all aspects of managing your cloud infrastructure.

Operational Excellence in the AWS cloud starts with preparation

Like a pilot runs through a pre-flight checklist before takeoff, AWS recommends using operational checklists to ensure that your workloads are ready for production operation and prevent migrating untested workloads to production.

Use checklists

Create and use the following checklists for operational excellence in AWS:

  • Operational Checklist: Create an operational checklist that you use to evaluate if you are ready to operate the workload.
  • Planning checklist: This might seem redundant, but you should have a plan that syncs with company events, milestones, and roadmaps to stay in front of events that might cause sudden increases in traffic and requests for specific resources, where network performance could impact a company’s revenue or reputation.
  • Security checklist: Security is among the most misunderstood features of the cloud. Develop a detailed security checklist and use it to ensure that you are ready to securely operate the workload and respond to any security event or attack.

AWS configuration management best practices

You should document how you monitor, measure, and manage your architecture, your environments, and the configuration parameters for resources within them in a way that allows you to easily identify components for tracking and troubleshooting. Changes to configurations should also be trackable and automated. Within a configuration management database (CMDB), you should record a detailed resource tracking program by using tags, metadata, and thorough, accessible documentation of your entire architecture and infrastructure configuration.

Automate cloud deployment

Automation can take human error out of the operational excellence equation. Best practices for automation include regular quality assurance testing and defined mechanisms that can continually track, audit, rollback, and review changes as warranted.

Deployment best practices

Best practices for AWS deployment automation include:

  • A deployment pipeline (such as source code repository, build systems, deployment, and testing automation) with standard automated procedures for continuous integration and continuous development.
  • An automated release management process.
  • A process to revert changes if they produce operational issues.
  • Risk management strategies (blue/green, canary, A/B testing) to continually assess risks.
  • System monitoring using CloudWatch® to monitor system performance.
  • Alarms and notifications based on key performance thresholds that indicate problems or opportunities for improvement.
  • Automated actions based on performance, such as using Auto Scaling to add capacity based on current conditions automatically.
  • A process to track and save logs, including application logs, AWS service-specific logs, VPC flow logs, and CloudTrail®, to be able to troubleshoot and review performance.

Responding efficiently in AWS

Responding to network problems is as important as preventing them in the first place. You should be prepared to automate responses as much as possible, including alerts and notifications as well as actions and recovery. It is also important to have escalation procedures in place to get the right issue to the right resources as quickly as possible.

Best practices for responding efficiently

Best practices for responding to unplanned events include:

  • Create an event response playbook for everyone to follow, which defines the circumstances for when this playbook should be activated, including escalation guidelines and procedures.
  • Automate responses as much as possible, such as using Auto Scaling to add capacity when the system passes critical load thresholds instantly.
  • Develop a Root Cause Analysis (RCA) to ensure that you can resolve, document, and fix issues so they do not happen in the future. Make sure you’re not just fixing symptoms of a deeper problem.
  • Develop an escalation process that puts the necessary stakeholders and systems in place for receiving alerts when escalations occur.
  • Automate escalation as much as possible based on demand or time thresholds, sending the issue to the right resources.
  • Create an automated escalation queue between appropriate functional teams based on priority, impact, and intake mechanisms.
  • Use a demand- or time-based approach to escalate higher in the organization as impact, scale, or time to resolution or recovery of incident increases.
  • Define when external escalation to AWS or an AWS partner would be engaged.


The AWS Operational Excellence pillar focuses on running and monitoring systems to deliver business value and continually improve processes and procedures. It helps organizations spread the benefits of cloud adoption beyond the IT department and ensure that the cloud infrastructure can efficiently manage changes, respond to events, and automate standards-based tasks and processes to manage daily operations successfully.

Learn more about the other Well-Architected Framework pillars in this series:

Learn more about Rackspace AWS services.

Use the Feedback tab to make any comments or ask questions. You can also click Sales Chat to chat now and start the conversation.

post avatar
Rackspace Onica Team

Share this information: