If your servers went out today, how long would it take your team to get your business up and running again without a BCDR? What if there was a ransomware incident and your data became crypto locked? How much does an hour of lost productivity cost your business?
If you’re B2B, not being available for your clients could cost you hard-earned relationships. And if you’re B2C, downtime could mean a silent cash register for who knows how long.
It’s one thing to lose a day’s worth of unsaved work from an unexpected power-outage, but natural disasters like floods and fires may even separate you from or damage your office.
Finding answers and building solutions for these hypotheticals before it’s too late is what business continuity and disaster recovery (BCDR) is all about.
How do we define business continuity and disaster recovery?
Simply put, BCDR is the practice of making sure your company is able to continue to run with as little interruption as possible during “disasters” of all shapes and sizes.
We put this article together to bring you today’s best practices for BCDR so that when disaster strikes—and it will strike—you’ll be able to roll with the punches with little to no difficulty.
Business Continuity vs Disaster Recovery
Although business continuity and disaster recovery tend to go hand in hand, they have important differences that you should be aware of.
Business continuity refers to a business’s ability to continue operations and functions in a time of distress or disruption. These disruptions can be natural disasters, cyberattacks, power outages, or even communication failures.
Business continuity is a proactive response to potential threats.
To make this idea more tangible, let’s go deeper into the hardware failure scenario that shuts down your server, which we mentioned earlier.
A solid preemptive measure for your server going down could be having a BCDR device that regularly backs up your servers and allows you to create virtual instances from those backups.
With this technology in place, your business can be quickly switched over to operating from one of these virtualized backups and continue as though your physical server were still operational.
This way, you’ll be able to keep working while your IT team gets the physical server replaced.
Virtualizing your server with a BCDR device is also a viable solution for more dramatic disaster scenarios that force your organization to transition partially or fully to a remote work situation.
Because business continuity is proactive, careful planning is necessary to adequately prepare for potential disruptions.
Disaster recovery is what your business relies on when business continuity cannot be maintained.
Disaster recovery is reactive by nature and deals primarily with the restoration of functional business operations.
To give you an example, let’s say a transformer blows, causing power to go out in your main office building, thus preventing you from accessing your office workstation.
In this scenario, your disaster recovery plan would inform all relevant parties on how to reestablish access to your IT resources, data, and applications.
This process might involve sending your employees home and having them switch to using a redundant cloud-based server while you wait for power to be restored.
Creating Your Business Continuity & Disaster Recovery Plans
When creating your business continuity & disaster recovery plans, you’ll want to consider the different scenarios and levels of disruption that could impact your business operations.
Your organization will have its own unique variables that play into how you respond to data-disasters and natural disasters alike, so your plan should reflect that.
Ask what your company would do for business continuity and disaster recovery if the disruptive event occurred:
- At a local level
- At a regional level
- At a national level
If local, would you still be able to access office resources temporarily?
If a fiber line is cut in your area, do you have a backup source for internet and telecom access? What if a regional or national issue such as pandemic were to displace your employees from their workstations?
Questions like these are the kind you need to ask as you’re mapping out disruption scenarios.
Before you get to any actual planning, you should also conduct an analysis of your IT infrastructure.
This will help you determine which utilized resources you need to keep things operational, which resources you can survive without, and how they holistically relate to your overall infrastructure.
Business Continuity Planning
Your business continuity plan should include:
- A Threat Analysis — This part of the plan identifies potential disruptions as well as the damage they could cause to affected resources. For instance, a natural disaster (threat) could deal critical damage to your data infrastructure.
- Role Assignments — In an incident scenario, who does what? Your organization should have a well-defined chain of command that takes the absence of critical staff into consideration. This is one reason it’s useful to cross-train employees so that everyone, regardless of their official department, can act with competence when in an emergency.
- Communication Strategy — Your communication strategy should clarify how important information will be distributed in response to the disruptive incident and after it is resolved. It should detail who will be responsible for communicating with employees and how said employees will manage to contact relevant parties.
- Your Backup Plans — This should at least detail where your backups are being stored, how to access backup power resources like generators and inverters, and mention backup communications and any cloud servers that can substitute on-premise servers.
- Infrastructure and Hardware Solutions — What infrastructure and hardware solutions are necessary to maintain business continuity? If the disaster generated a hardware failure or damaged your data-infrastructure, how would you respond?
Disaster Recovery Planning
The reason your business needs a disaster recovery plan is that business continuity cannot always be maintained.
If disruption to day-to-day operations becomes unavoidable, your organization needs to know what to do to bring things back to normal, and the disaster recovery plan should make it clear how to do this.
When determining the particulars of your disaster recovery plan, you’ll want to define two primary factors:
- Your recovery time objective (RTO)
- Your recovery point objective (RPO)
Once you’ve decided on your RTOs and RPOs, you’ll be able to assess your process and infrastructure options to figure out what best meets your needs.
What do we mean by Recovery Time and Recovery Point Objectives?
Recovery Time Objective
Recovery Time Objectives are targets for how quickly your business should be back to normal operations after various kinds of failure.
You’ll determine your RTO by defining how long your business can tolerate downtime.
To do this, conduct an assessment of how your business’s operations work. This will help you can identify what critical tasks can still be done even in the event of a disaster.
Different business components may require different RTOs since some systems take longer to restore than others.
For instance, if your wifi goes out, it may only take minutes to restore. But how long would it take to restore loss of internet access at the office level?
Your organization’s answers to these kinds of specific questions will help you narrow down your RTOs and generate a sufficient disaster recovery plan.
Recovery Point Objective
Recovery Point Objectives are targets for what state you should be able to restore your systems to after various kinds of failure.
Establishing your RPO clarifies how current your data restoration needs to be.
For example, if a business’s most recent copy of data available after an outage is from 18 hours ago, and the target RPO is 20 hours, then that copy would fall within the parameters of their target RPO.
Your RPOs can vary based on what you’re recovering. From a disaster recovery perspective, your main focus should be at the systems and infrastructure level.
Losing a few files is never good, but typically RPOs won’t be defined for any given user’s workstation or files, but rather for whole servers or key equipment like network firewalls.
Once again, it’s important to clearly define your RTOs and RPOs so that you can determine what technical solutions your business will need to reach those objectives.
However, that decision often rests upon choosing the right failover solution for your organization.
Systems and Data Failover Solutions
A failover solution is related more to business continuity than disaster recovery.
However, some failover solutions like Datto’s SIRIS device provide integrated continuity and disaster recovery options.
The Datto SIRIS device is a perfect example of a hardware failover solution.
If one of your critical servers fail, your SIRIS device can create a virtualized instance of that failed server and allow you to run your business operations from it temporarily while the issues impacting your primary system are being resolved.
It’s important to understand that while your virtual server is running, changes will be made to your system both by users and from regular system updates.
When the issue with your primary server is resolved and you spin down the virtual server, the SIRIS device will merge the two copies together so that you get to keep working with the most updated version.
If the disaster destroyed the original server and you end up having to purchase new hardware, you’ll be able to reimage the virtual system onto the new servers.
Your company might be small enough to need only one or two servers, but if you have an entire room full of them, you’ll need more than a hardware failover solution for some scenarios.
If, for instance, there was a power surge or lightning strike, you’re not going to be able to run all the programs and workstations you normally would be able to from one piece of hardware like Datto’s SIRIS device.
The best practice in this scenario is to spin up the servers via cloud virtualization. This allows your business to leverage the scalable computational power of the cloud to run as many server instances as necessary to maintain normal operations while your local infrastructure is fixed.
Cloud virtualization also enables remote work capabilities as well. With your servers on the cloud, you can remote into your cloud-based office with a VPN for added security.
If you end up requiring more space, your cloud-virtualized infrastructure will be able to expand to meet your needs.
Again, failover solutions are not meant to be permanent solutions because it can be costly to run your business from the cloud in this way.
They’re intended to keep your operations functioning as you wait for complete restoration of your primary servers.
A more permanent and cost effective approach would be to implement a geo-redundant cloud solution so that you can continue business operations despite any disruptive scenario one of your server locations might be experiencing.
Like the previous failover solutions we’ve described, which solve for failovers in your server infrastructure, an internet failover solution activates whenever your primary internet network goes down. It’s a way to maintain internet continuity.
If you use a BCDR enabled network firewall like Sophos XG, it should have an automatic failover which ensures that, if one internet network goes down, your devices will quickly switch over to the backup.
We suggest having two separate connections, from two different providers, using two different types of technology.
For instance, one connection might be via cable while the other is DSL. This way, you’re not only covered if one provider goes down, but if there’s a problem with the networking infrastructure that both providers might share, you’ll still remain connected.
Simply put, a power failover solution provides backup power when your primary power source goes out through the use of Uninterruptible Power Supplies (UPSs).
For most businesses, we recommend having a power backup at every workstation. Even just a 1,500VA backup will offer you a good amount of time to continue working during small power interruptions or to save work and safely power down.
Keep in mind that UPS batteries degrade over time, so you’ll want to get replacements every few years.
Testing Your BCDR Plans
Having detailed BCDR plans in place is good. Knowing that your plans will actually work when you need them to is better.
Outside of a disaster occurring, conducting tests on your plans is the only way to know that they’ll suffice in a real crisis scenario.
Testing your plans will also help keep your staff ready to respond by reminding them what their roles are and what they should do in the event of a disaster.
Testing Systems Failover
The process of testing your systems failover solutions involves spinning up virtualized instances of your servers and then attempting to conduct normal business operations.
Whether you’re virtualizing via the cloud or with a local device, you’ll want to test whether you can access your virtual servers from multiple workstations to verify that the process works as intended.
At the same time or separately, you’ll also want your IT staff to attempt to recover your server backups to make sure that processes work properly.
Remember to check that the data and applications that return to the primary server are intact and meet your recovery point and recovery time objectives.
Testing Power Failover
The process of testing your power failover solutions is really quite simple—just unplug your primary power supply from the wall and check to see that your devices stay on.
Testing Internet Failover
To test your internet failover solution, disconnect one of your internet connections at its source—where your ISP’s infrastructure meets your own—to make sure your firewall automatically switches to your backup connection.
When testing your internet failover, you’ll be dropping and switching to a different provider, which means a change in your primary IP address.
Because of this, active connections could be temporarily disrupted, like phone calls, video conferences, etc, which is why we recommend conducting internet failover tests after business hours.
How Often Should You Conduct a Complete Failover Test?
At a minimum, you should conduct a test of all your failover solutions once a year. Larger organizations may consider doing a full test every six months, but typically no more than that.
For critical situations, quarterly testing may be an appropriate option, but that level of frequency might be better suited for partial systems testing.
And of course, if you ever run into an issue with your failover systems, a test right then and there will help you diagnose the issue.
As important as it is to test your various systems, it’s equally important to test your personnel so that relevant parties will know how to respond to a disaster incident.
You’ll want to make sure that key managers know what your BCDR plans are, where their offline and online copies are located, and how to implement them.
Key managers should assign specific individuals to handle tasks from the BCDR plans. Who they assign will depend on a variety of factors like the individual’s role and skill set.
Your key managers should also establish a hierarchy for personnel failover. If the person who was responsible for checking on the IT systems isn’t available, who will take their place? What if their replacement is also unavailable?
The plan should be comprehensive enough for you to know the primary, secondary, and tertiary backup personnel who will be responding during a disruptive incident.
The Importance of Reviewing Your BCDR Plans
Technology changes and improves over time and your BCDR plans should iterate to adapt to those changes, lest they go stale and lose their utility.
To accomplish this, review your plans regularly so that your company can keep tabs on what technologies within your organization need updating or replacement.
We suggest conducting these reviews at least on an annual basis as part of your failover testing process, and then analyzing the results.
It’s also worth reviewing your technology needs whenever there are critical changes in your operation, such as moving to a different sized office, seeing a dramatic increase or decrease in staff, or even a complete system migration to the cloud.
If you end up having to use your BCDR plans, don’t forget to assess how well it worked, where it fell short, and what your organization could do to improve it.
Keep Your Business Ready for Disruptive Events
Although it’d be great if nothing ever went wrong, reality has a way of throwing our businesses curve balls every so often.
Instead of burying our heads in the sand like an ostrich, it’s better to keep your business prepared for the unknown with a comprehensive BCDR plan.
- What is BCDR? – Business continuity and disaster recovery is the practice of ensuring that your business IT is capable of continually running without serious interruption, regardless of disaster incidents. The disasters can be natural, but they can also come in other forms such as hardware failure or power outages.
- Business Continuity vs Disaster Recovery – While business continuity refers to a business’s ability to remain operational during disruptive events, disaster recovery refers to the process of restoring your business to its pre-failure state.
- Create Scenario Driven Plans – There’s no such thing as a one size fits all BCDR plan. They should be designed to respond to that fit likely scenarios for your unique organization at a local, offsite, regional, and national level.
- Business Continuity Plan Components – Your business continuity plan should contain a threat analysis, determine role assignments, have a communication strategy, contain your backup plans, and mention any infrastructure and hardware solutions necessary for continuity.
- Data Recovery Plan Components – Your data recovery plan will be defined primarily by your recovery time objectives, which clarify how long your business can tolerate downtime, and recovery point objectives, which clarify how current the systems and data you restore from must be.
- System and Data Failover Solutions – Failover solutions backup your systems automatically in the event of a disruptive incident that impacts your server’s availability. It’s related more to business continuity than disaster recovery. The types of failover include:
- Hardware Failover – Creates a virtualized instance of your failed server from which you can temporarily run business operations.
- Cloud Failover – Similar to hardware failover solutions, but allows your business to leverage the scalable computational power of the cloud to run many server instances while your local infrastructure is fixed.
- Internet Failover – Activates when your primary internet network goes down to maintain internet continuity.
- Power Failover – Provides backup power source via Uninterruptible Power Supplies (UPSs) when your primary power source goes out.
- Testing Your BCDR Plans – There are two ways to know your BCDR plans work. The first way is to go through a disruptive event and hope for the best. The other is to test your plans. Conducting tests is also a great way to keep your team prepared and aware of what they should do during a disaster.
- Testing Frequency – You should conduct a complete test at least once a year, but if you have a larger organization, a bi-annual test may be justified. It’s rare for businesses to require quarterly testing, but if your company deals with a large amount of critical data, quarterly testing may be appropriate.
- Reviewing Your BCDR Plans – Technology isn’t stagnant. It changes and improves over time, and so should your BCDR plans. Failure to do so may leave your business unprepared to properly respond to a disaster incident, which could cost your business a great deal.
Who Should Take Care of BCDR?
The person in charge of creating your BCDR plan ought to be the CTO or CIO since they’re probably the people who know most about your company’s technology capabilities and needs.
They’ll know which scenarios are going to need to be taken into consideration when planning for a disaster.
However, if your business doesn’t have those roles filled, you can still get an augmented version of a CIO with our Managed IT services.
Our virtual CIO offering would fill that specific leadership gap without breaking your budget.
Instead of paying a new executive a complete salary plus benefits to handle your occasional BCDR needs, you could use our virtual CIO for creating, reviewing, and testing your BCDR plan when you need to.
Your MSP services, like your BCDR plans, shouldn’t be “one size fits all,” which is why we prefer to invest time in understanding your company’s unique needs.
It lets us make your solutions personal so that things can easily stay business as usual.