Keeping Customer Service Up and Running, Even During Major Cloud Outages
In early December 2021, Amazon Web Services (AWS) suffered a historic outage that affected a major portion of their cloud hosting infrastructure. Popular entertainment services like Netflix and Disney+ were impacted. Tragically, many users couldn’t finish streaming the new Beatles documentary! OK, maybe that is not so tragic, but suppose all of your customer engagement systems ground to halt too. Loan officers are no longer able to engage with prospective clients to close business. Customers are unable to contact their insurance carrier to open a new policy or file a claim. The list of potential trouble goes on and on. When remote customer service is the gateway to your organization for so many of your customers, a service outage can affect your business on multiple levels. A breakdown in customer communication leads directly to lost revenue, as well as customer dissatisfaction that can lead to churn.
So how can companies better prepare for business or network interruptions? First, it is important to choose solutions that have adequate recoverability and failover options built into their platforms. To remain operational, it is key to choose cloud products that include:
- Hot redundancy
Ensuring adequate resources to immediately take over if others fail - Self-healing capabilities
Ability to automatically detect and self-correct without manual intervention - Disaster recovery
Having people, processes and technology to quickly respond to emergencies 24/7
For SaaS providers, being hosted on a major cloud infrastructure provider such as AWS is just the beginning. These companies need to invest in system architectures with a strong redundant, distributed infrastructure which is very expensive but worthwhile. Not only will distributing key components and functions increase reliability, it will improve system performance. Many vendor contingency plans include tools that allow for manual adjustments when a problem is identified. Unfortunately in the case of this AWS outage, companies were blocked from being able to access their services to make such changes. Think of it like your kitchen is burning and you have hoses, but you cannot get inside the house. So how do you address the issues and keep your business on track?
One safeguard is to build a self-healing architecture that can automatically adjust to issues as they arise. Extending the above analogy, it’s like having an expensive, high-end heat detecting sprinkler system in your house in case you can’t get in there with the water hoses.
Architected for Availability
Glia’s Digital Customer Service (DCS) solution is built with a self-healing, redundant, self-correcting architecture to immediately address any disruptions. We have invested heavily in building a secure, resilient, financial-grade platform that delivers proven, consistent service for our clients and their customers.
Our system is built to be active across at least three availability zones, so that if there is a problem in more than one of them, Glia can continue operating. To avoid any downtime, we don’t “failover” to a secondary availability zone, but actually run simultaneously across all the zones – an architecture known as “triple-hot.” Additionally, the system is designed to be “self-healing,” so that if, for example, there is a sudden spike in volume, the system can dynamically scale to accommodate that. And if an issue with a subsystem is detected, we automatically self-correct as appropriate. The platform’s self-healing capabilities ensured that even when root access to the AWS East region was unavailable in December and we could not manually adjust things, our system was able to automatically adjust to minimize any issues. The result: no impact to our customers and their mission-critical operations.
Had the recent outage been worse with the entire region going offline, Glia has a formally documented disaster recovery plan that defines in-depth the activities and resources required to provide recovery capabilities. The mostly automated failover includes dynamically copying data in near real-time and storing it in multiple physical locations separated by at least 60 and up to 300 miles apart, in case that is needed as part of a recovery process. This plan is tested at least twice a year, is 99% automated and takes under 30 minutes to complete.
Deliver Consistent Customer Service in the Cloud
As we mentioned above, offering a seamless digital customer experience can mean the difference between retaining and losing customers. Your communication systems are your virtual “front door” to your business, and it is imperative that your digital doors remain unlocked and accessible during business hours. The ability to maintain connection with your customers will not only instill confidence in your services, but it will reinforce the value that you provide.