Post go-live Activities
From the moment you go live with your solution, RAS Digital Experience teams across all shifts are ready to provide ongoing support. Each Digital Experience team has participated in internal knowledge-transfer sessions that educate participants about the particular aspects of your deployment.
At this time, you should have the following:
- Enabled automatic ticket creation
- Alert monitors in place
- A runbook that provides detailed and actionable steps to take for each alert type.
From this point forward, the RAS implementation phase is complete, and your environment is being monitored.
Refer to the following sections to ensure that your monitoring solution continues to run smoothly after you go live.
- Request maintenance
- Use APM to identify when deployment occurred
- Prepare for high volume events
- Resolve emergency incidents
Request Maintenance
You must submit a ticket to request maintenance any time that you expect Rackspace to work on your application. You need maintenance when a change results in downtime or could have an impact on service availability. While maintenance is typically associated with deploying code or configuration changes, you also need maintenance for small changes, including activating and deactivating a user.
When you request maintenance, you must:
- Make the service request at least 48 hours in advance. We schedule maintenance on a first-come, first-served basis.
- Provide as much detail as possible, including the build number, the type of maintenance (deployment, patch, or configuration change), how long the maintenance should take, and any other technical details that help us provide a smooth maintenance process.
The sooner you provide your detailed maintenance request, the better the outcome.
Why does Rackspace request 48 hours notice?
A lot of preparation goes into ensuring that your maintenance goes smoothly. As soon as you make your request, we review it, prepare for the maintenance, quality check our preparation, and then schedule the maintenance. You might receive a request from the RAS team to schedule a maintenance call within 24 hours after you submit your ticket.
Additionally, all maintenance is subject to calendar availability. The best plan is to contact Rackspace with notice as soon as you have a firm date and time preference so we can confirm availability. The technical details can come later.
To notify Rackspace of planned maintenance, open a ticket in the Rackspace login portal that describes the event and includes the proposed date, time, and time zone. Rackspace uses this ticket to communicate with you for the duration of the event. The ticket is also the best place to upload written plans and the files necessary for proper event execution.
Use APM to Identify when Deployment Occurred
In both New Relic and AppDynamics, you can set release points that enable you to compare application performance between releases. For example, if the checkout process is slower or generates more errors after you deploy changes to your environment, the APM tooling helps you identify precisely when and what kind of change you deployed. If required, you can use the APM tooling to roll back to a previous release.
To enable deployment tracking, create a ticket to request that we enable deployment tracking.
For AppDynamics, you can add an API call to your deployment process, which creates a custom even to track performance across multiple deployments. For more information, see Using AppDynamics to track performance between releases.
Prepare for High Volume Events
Use the following considerations and recommendations to plan for high seasonal traffic. Your Rackspace Account team and RAS Digital Experience team can work with you to understand and implement these recommendations. They come up with a Seasonal Readiness Plan for your environment to ensure that your end users are not impacted by poor performance during peak seasonal traffic periods.
General guidelines:
- Limit code deployments leading up to peak traffic periods to minimize changes in the environment.
- Consider limiting or disabling back-end administrative functionality, such as imports, catalog updates, and so on, during peak traffic periods.
- Consider scheduling regular, short touchpoints with all appropriate parties throughout the peak period.
Two to four months before peak traffic begins
Perform load testing: Engage the Rackspace Customer Success team, RAS, and Network Security to participate in load testing to observe, analyze, and take action on the results.
Depending on your environment’s size and complexity, use marketing analytics data to calculate the expected peak flow of traffic and perform load testing. Refer to the following guidelines when you plan for and execute application load testing:
- Many environments can expect a 20-30 percent increase in traffic year over year.
- Develop a load testing plan that tests all application functions and tests for at least 125 percent of expected peak traffic.
- Use a load-testing tool that incorporates realistic think time to gauge the impact of real users more accurately. If you do not have a preferred vendor or tool, Rackspace has load testing partners with whom you can engage.
- Test against an environment that uses the same codebase and is as close to production as possible. Modern, complex applications do not scale linearly, and you cannot always accurately extrapolate real-world performance.
- Use metrics to establish the success or failure of a test. For example, a test is successful when at least 95 percent of transactions complete without errors.
- You might need to pconduct multiple load tests to ensure that you can iterate through the results and achieve stability at peak user traffic levels.
One to two months before the peak traffic period
Integration points review: Work with the RAS team to review integration points approximately one to two months before the peak traffic period. Include internal and external integrations such as search, payment gateways, OMS, inventory, mobile, and so on.
- Ensure that load testing includes integration points.
- Where possible, review integration point architectures to determine if they are highly available.
- Understand the impact on users if the integration service is degraded or unavailable
Configuration review: Work with the RAS team to proactively review your environment’s configuration and APM metrics, including the application, web servers, caching layers, databases, and so on. This review can uncover bottlenecks that developed during load testing.
Ensure that you allow for time to make changes and then test again.
APM configuration review: During the APM configuration review, engage with the Rackspace team to:
- Review baselines
- Review business transactions configurations
- Ensure that you configure health rules appropriately for the environment.
- Review dashboards to reduce troubleshooting time in the event of issues.
Two to four weeks before the peak traffic period
Network layer review: Engage the Rackspace Account team and Network Security team to review performance of firewalls, load balancers, and any inline devices, such as IDS, to ensure that they can handle increased traffic.
CDN and caching tuning: Engage the Rackspace Account team and RAS to ensure as many static assets as possible are cached. Ideally, the CDN or cache layer should take as much traffic as possible, which reduces calls back to the application.
One to three weeks before the peak traffic period
Schedule jobs review: Engage the RAS team to review any scheduled jobs, such as cron jobs, scheduled tasks, and backups, to minimize impact during peak hours.
Review URL monitoring: Engage Rackspace RAS to ensure synthetic transactions are in place to monitor the availability and functionality of critical business paths (for example, add to cart, checkout, search, and so on).
Review account guidelines: Engage the Rackspace Account team to verify that escalation paths, return-to-service instructions, and runbooks are up-to-date and technically valid.
Resolve Emergency Incidents
Each production environment has a monitor that constantly assesses the availability of that environment. If the environment becomes unavailable, the monitor automatically creates an Emergency ticket that notifies the Rackspace RAS engineer of the issue. We handle Emergency tickets, which are the highest level of severity, with urgency.
The Rackspace RAS engineer takes the following actions upon notification of the Emergency incident:
- Acknowledges the alert in the Rackspace Customer Portal
- Reviews the Account Management Guidelines and the Operations Runbook to determine how to proceed.
- Opens a call bridge and includes the appropriate personnel on your team, such as the DBA or network administrator.
- Notifies all stakeholders by using a tie-in hitch that an issue has occurred, what the issue is, if the issue is ongoing, and that Rackspace is actively troubleshooting it. The tie-in hitch mechanism ensures that all Emergency incidents get the proper review and attention.
The Emergency incident handling process is more efficient when there is an Operations Runbook that provides the RAS engineer with the procedures to follow. The runbook can help the RAS engineer determine the steps to take in a site-down scenario.
Updated about 1 year ago