Friday, October 1, 2010

Planning for disaster

Last month's blackout caught many companies, as well as the city, by surprise. More than ever, it brought home the importance of implementing a disaster recovery plan.

It was the beginning of a somewhat uneventful Thursday morning when power went out across Dubai. It wasn't a cause of much alarm until the blackout extended for several hours. As a result, many companies were caught off-guard, business transactions were halted, and the city, in general, was believed to have lost tens of millions of dollars in potential business.


The power outage that happened on June 9 was a simple reminder of how any disaster 
man-made or natural — can have serious implications to businesses. Companies that had set aside putting a disaster recovery plan in place were suddenly jolted by the reality that in today's IT-centric world, even a minor incident can trigger a chain of events that can take a toll on their revenues.

"It was a wake-up call for customers who were putting disaster recovery on the side," says Qais Gharaibeh, district partner sales manager, EMC Middle East. "It also boosted the confidence of those who have already made the investments both into the disaster recovery and business continuity concepts."


One such company is the American Hospital Dubai. Because of the criticality of the services it provides, it is important for American Hospital Dubai to keep its vital systems and facilities up and running, says Rajan Chadha, group IT director, American Hospital Dubai. To ensure that the hospital functions no matter what, American Hospital Dubai has deployed fully redundant systems across the hospital's entire IT infrastructure.


"Our network was designed in such a way that there's a lot of redundancy. Our core switch is one into two. When one is being used, another similar core switch is on standby," Chadha says.


"There's one-to-one redundancy. When one goes down, the other one takes over automatically," he adds.


Even American Hospital Dubai's optic fibre backbone is redundant in order to make sure that connectivity is available at all times. Needless to say, its data centre is composed of several servers running the same functions to ensure that mission-critical applications, such as its patient administration services, are always available to its staff and customers, and inside, those server components 
— whenever possible — are also duplicated.

"There are two local area network (LAN) cards in each server. If one LAN card goes down, the other one takes over. Similarly, if one power supply goes down within the server, another one takes over so that there is a redundant power supply also," Chadha adds.


As a rule of thumb, Chadha says American Hospital Dubai follows a 70-30 buying decision to distribute the risk and lessen the impact of the risk. "When we buy switches for end users, we make sure that we keep some spare capacity. So, if I need seven ports, I usually buy ten ports," Chadha explains.


American Hospital Dubai is also equipped with critical power (or generators) and uninterruptible power supply systems. Data is backed up periodically and several copies are kept, including one in an off-site location and another in a fireproof safe in the IT department's office.


"The system automatically picks up all the log files every six hours. So whatever was done since the last backup it would go and synchronize with our server located in our head office in Deira through a dedicated leased line. In addition, we have a cold backup of the system every week. One copy is stored off-site and another one is stored in our IT office," elaborates Chadha.


But only a few companies in the region have disaster recovery plans. As the recent power outage revealed, even some IT companies such as those in DIC (Dubai Internet City) did not have any disaster recovery measures in place.


"It surprises me how little planning is being done," says Omar Dajani, technical manager, Veritas Software Middle East. "This [planning] is true not only in the region but also worldwide."


Disaster recovery planning is not something that a company's IT department can or should do on its own. Senior management and business users should also be involved to develop compelling justifications and obtain complete support for those plans and assure its success.


According to Dajani, he believes the company is serious about disaster recovery if the push is coming from the organisation's decision-makers.


"Whenever I go visit clients who are interested in disaster recovery, my first question to them is: 'Where is this [interest] coming from?' I believe that if the company is serious about protecting its data and establishing a disaster recovery plan, [the directive] has to come from the chief executive or at the senior management level because that tells me that there is a very serious concern and, therefore, a very strong interest in creating the plan," Dajani explains.


In American Hospital Dubai's case it was an easy discussion since management was made aware of the risks and understands the impact of such risks to the hospital's operations.


"We have a committee composed of medical and non-medical teams. Since the budgeting is efficiently accounted for, everyone lays on the table everything that is required. On my part, I tell them the advantages and disadvantages and, based on those recommendations, we sit and discuss with the chief executive officer and the chief financial officer what needs to be prioritised," Chadha says.


Convincing senior management to buy into disaster recovery is not a difficult task, says Dajani. He says it is more a question of awareness 
 making management realise how valuable data is and how much potential revenue will be lost if data is not protected. Questions like: 'If your IT infrastructure is down, what kind of impact will it have on your business?' or 'If you can no longer conduct financial transactions, what kind of impact will it have on your revenue? ' should act as the stepping stone in persuading top management to look at investing in disaster recovery.

"Often times it surprises them and the answer is always: 'I can't afford any downtime'. I usually come up with follow-up questions like, 'How are your systems protected?' 'How are they being protected in a way that data is also protected and available all the time for you whenever you need them?'" Dajani says. 


"It's very surprising how just a ten-minute conversation can make them realise that if they suffer a failure for more than 15 minutes that can seriously jeopardise their entire operation. Usually, if they are receptive to the concept of disaster recovery, it's not often very hard to convince them that it is not only necessary but it is required in order for their business to survive," he adds.


"Criticality tends to be what a customer defines as important for his business. It takes into account the financial and customer aspects and the relationship aspects of the business," says Saeed al Barwani, chief executive officer, eHosting Datafort.


According to EMC's Gharaibeh, two cost factors have to be brought up whenever a discussion about disaster recovery takes place.


"Companies should look at two important factors. One of them is RPO (recovery point objective) and the second is RTO (recovery time objective)," Gharaibeh says. "Companies usually evaluate their capability to recover from disaster by measuring these two parameters so that they know how much loss their business can sustain in terms of a disaster."


RPO is the point in time that the restarted infrastructure will reflect. How far back in time does data need to be recovered in order for a business to re-launch operations after a disaster occurs? How far back the data can be stored? Essentially, this is the rollback that will be experienced as a result of the recovery.


RTO, on the other hand, is the time goal for the re-establishment and recovery of  business function or resource during the execution of disaster recovery or business continuity plans. It is about how much time it will take after a disaster happened before the data necessary for re-launching operations is recovered 
— how much time is required for data to be recovered?

Investing in a disaster recovery solution varies from company to company and from requirement to requirement, says Gharaibeh. It depends on the size of the business and the understanding of the risks the business can go through in terms of a disaster.


Essentially it will depend on how much data you can afford to lose, Dajani adds. 
"Disaster recovery comes in different levels. It's a direct relation between how much data you have and how much data you can afford to lose. If you have lots of data, and you can't afford to lose any of it, the cost is going to be much higher. If you have a little bit of data and you can afford to lose a few hours of transaction, then the cost is going to be low," Dajani explains.

"The more availability you are looking for, the higher the cost," he adds. 


For companies that can't afford to have highly sophisticated disaster recovery tools, all is not lost. Dajani recommends that a simple backup of data that is done on a regular basis can go a long way especially for smaller organisations.


"Backing up data once or twice a day on an external hard drive is a simple yet effective disaster recovery practice. It is not sophisticated but it works," suggests Dajani.


Outsourcing is another viable option. It is particularly popular with large international companies that have a set of disaster recovery strategy but do not have the requisite IT infrastructure or skill sets in the Middle East. eHosting Datafort currently offers disaster recovery and business continuity services to more than 200 companies.


The benefits of such a service include freeing companies to delegate security issues to a specialist company and freeing up vital resources, which can be focused on core business processes.


"Our services are customised for each customer, depending on their business needs and requirements," says Mark Lamb, director, technical operations, eHosting Datafort.


According to Col. Kuldeep Bhatnager, director of eHosting Datafort's corporate business, companies that employ eHosting Datafort's services can avail of its wide range of IT services and resources including the accessibility of multiple data centres scattered in different locations and the availability of parallel backup offices 
— or disaster recovery centres — which companies can utilise in case it is not possible to work in their own offices because of a disaster.

However, if you do decide to buy your own disaster recovery solution, Gharaibeh suggests that companies go for vendors that have complete solutions "because once you involve multiple vendors, integrating your whole disaster recovery environment becomes complicated when you synchronise between them," he explains.


He also recommends that companies demand a proof of concept. "Seeing is believing, and there's nothing better than setting your own data and seeing how it works for you," Gharaibeh claims.


Implementing a disaster recovery plan is not just about buying technology but also making sure that processes are implemented as well. It is important that people in the company are made aware of the plan. Mock disasters and simulations must be done as well to test how your staff or your crisis management team 
— if you have one — will react to a particular incident.

"Nine out of ten companies don't test their disaster recovery plan because of two things: they don't have the time because testing can take several hours, or they can't afford for their systems to be down, especially when the company is operating 24 hours a day," Dajani says.


"But it's important to test your plan because that way you can determine whether it will work according to your objectives and expectations," he adds.


The objective of the training is to make sure that the people involved are well versed with their roles and responsibilities. "Everything has to be in synced and that measures have to be implemented like clockwork," says Al Barwani. 


"It has to be almost an automatic reaction. Every person, process, engagement and deliverable should fall automatically into place when you need to engage in a business continuity environment. It should be second nature," Al Barwani claims.


According to Dajani, testing and evaluation are also important to review whether it will still work once your business environment changes. "It is guaranteed that within two years any IT infrastructure gets a 100% makeover. That's the way it is. Everything changes 
— you get more data, more hardware, more software, more people, fewer people, etc. — and you need to determine that your plan will still work for your business," Dajani comments.

"It's an ongoing process. Each solution has to be reviewed from its entirety to the base up. Make sure that everything grows in the same way. Most failures have been because policies and procedures have not been kept up to date," says Lamb.


"Any solution will have to work in totality of the business. It isn't about producing a box. It has to have a continuous development process not only of the measures deployed but also of the people involved," adds Al Barwani.


One thing that most companies should remember is to provide proper documentation of its disaster recovery plan. It is no good for companies 
— even those with well-compiled plans — if  the plan is stored in a PC that is down because of a disaster. 

"After every point, document, document, document," stresses Dajani. "Devices crash and data become inaccessible so there has to be a hard copy of the plan." 


The documented plan needs to be as detailed as possible, including details about what type of data is stored on each tape used for backing up information.


"You should know where your data is stored. Identify what is in each tape because you might have to go to a different site to do disaster recovery. The last thing you want is to have a box full of tapes and say, 'What are in these tapes?'" Dajani explains.


For companies that are looking at designing a disaster recovery plan, it is important that they should look at it in the context of business continuity instead.


"Think of business continuity. Disaster recovery and business continuity go together. Disaster only happens when the business continuity plan is not there. Everything crashing down at the same time is very remote but possible," says Chadha.


"Always hope for the best but be prepared for the worst. Don't work on the premise that the worst will not happen to you. Anything can happen at any time," advises Chadha.