IT Management

File Servers – The Business Case for High Availability

Introduction

You have probably heard of high availability transaction processing servers.  You have most likely read about the sophisticated systems used by the airlines to sell tickets online.  They have to be non-stop because downtime translates to lost orders and revenue.  In this article I will discuss the economics of using non-stop technologies for everyday applications.  I will show that even ordinary file sharing applications can benefit from inexpensive Linux based Pacemaker clustering technology.

Availability Goal

What is our availability goal?  Our goal should be to take prudent and cost effective measures to reduce computer downtime to nil in the required service window.  I’m not talking about 99.999 % (five 9s) up time.  This is the popular (and very expensive) claim made by high availability vendors.  I’m talking about maintaining enough up time to service the application.  Take a simple example, for office document preparation the service time window is office hours (9-5).  The rest of the time the desktop PCs can be turned off, nobody is there to operate them anyway.  You only need the PCs for 5 days a week for 8 hours a day or for 2080 hours per desktop PC per year.  This translates into an up time requirement of 24 percent.   Ideally you want the desktop PCs to be available all the time during office hours but are willing to give up availability for routine maintenance and for the infrequent breakdowns that may occur only once per workstation every five years or so.  Perhaps you have two spare desktop PC workstations for every 100.  This extra capacity allows your office workers to resume their work on a spare while their workstation is being repaired.  In this example the cost of maintaining adequate availability is the cost of maintaining two spare desktop PCs.  You might adjust this cost to account for real world conditions at the work site.  Wide swings in operating temperature or poor quality electricity supply, might dictate that you increase the number of spare PCs.  Sounds like a low stress, straightforward availability solution.

Network Effects

The problem gets more complicated when the desktop PCs are networked together and all the documents are stored on a central file server rather than on each workstation’s hard drive.  There is a multiplicative effect.  If the file server is not available then all 100 document processing PCs are rendered unavailable.  Then you have 98 (remember the 2 spares from above) workers being paid but not producing documents.  A failure during office hours can become expensive.  One hour of downtime can cost as much as $1500 in lost worker wages.  A day of downtime can cost $12,000 of lost worker wages.  How long will it take for a hardware repair person to travel to your site?  How long will it take for spare parts to arrive?  How long will it take the repair person to replace the parts?  How long will it take for damaged files to be replaced from backup by your own people?  A serious but not unlikely failure can take several days to be completely resolved.  Its not unreasonable to assume that such a $24,000 failure can occur once every 5 years.  This is a very simple example.  We are not talking about a complicated order-entry or inventory control system.  We are talking about 98 office workers saving files to a central file share so that they can be indexed and backed up.

The Effects of Time

I’m going to add another wrinkle to our office document processing example.  This file sharing setup has been in use for 4 years.  Time flies.  The hardware is getting old faster than you realize.  Old hardware is more likely to fail.  It has been through more thunderstorms, more A/C breakdowns, people knocking the server by accident and all that.  You’ve been noticing that your hardware maintenance plan is costing more every year.  How long is the hardware vendor going to stock spare parts for your obsolete office equipment?  Please forgive me for playing on your paranoia but the real world can be rude.

Time for an Upgrade

In this scenario you conclude that you are going to have to replace that file server soon.  Its going to be a pain to migrate all the files to a new unit.  I am going to have to upgrade to a new version of Windows server.  How much is that going to cost?  How much has Windows changed?  If I am going to have to go to all this trouble, why not get some new improvement out of it.  I know I can get bigger disks and more RAM (random access memory) for less money than I paid for the old server.  Whoops.  Windows is going to cost more.  I have to pay a charge for every workstation attached to it.  That CAL (Client Access License) price has gone up.  I read something about high availability clustering in Windows.  Enterprise Server does that.  Wow.  Look at the price of that!  Remember that $12,000 per day of downtime cost overhang?  It’s more of an issue now that you are dealing with an old system.

A Debian Cluster Solution

Enough of this already.  Since I asked so many questions and raised so many doubts, I owe you, the reader, some answers.  Debian Linux provides a very nice high availability solution for file servers.  You need two servers with directly attached storage and also a third little server that can be little more than a glorified workstation.  You need Debian Squeeze 64 bit edition that has the Pacemaker, Corosync, drbd and Samba packages installed for each server.  The software is free.  You pay for the hardware and a trustworthy Linux consultant who can set everything up for you.  What you get is a fully redundant quorum cluster with fully redundant storage, multiple CPU cores on each node, much more RAM than you had before and much more storage capacity.

Here are hardware price estimates:

Tie breaker node: Two hard drives, 512MB RAM $500.00
Name brand file server node: 8 2TB SATA drives, 24GB RAM, 1 4 core CPU chip,  3 year on site parts and labor warranty. $6,000.00
Second file server node like above. $6,000.00
Misc parts for storage and control networks. $200.00
Total: $12,700.00

Each file server node has software RAID 5 and each node holds 14 terabytes of disk storage.  Because it is completely redundant across nodes, total cluster storage capacity is 14 terabytes.  Performance of this unit will be much better than the old unit.  It effectively has 4 CPUs per file storage node and much more RAM for file buffering.  Software updates from Debian are free.  You just need someone to apply the security patches and version upgrades.

The best feature is complete redundancy for file processing.  In our file server example, any one of the nodes can completely fail and file server processing will continue.  Based on the lost labor time cost estimates above, this system pays for itself if it eliminates 1 day of downtime in a five year period.  You also have hardware maintenance savings of whatever the yearly charge is for your old system times 3 years because you get 3 years of warranty coverage on the new hardware.  You have the consultant’s charges for converting to the new system, but remember, you were going to have to pay that fee for a new Windows system as well.

Conclusion

I hope I have stirred your interest in Linux Pacemaker based clusters.  I have shown a file server upgrade that pays for itself by reducing downtime.  You also upgrade your file server’s performance while reducing out of pocket expenses for software and hardware maintenance.  Not a bad deal.

Posted by Laird Hariu in Debian, Eternally Regenerative Software Administration, Systems Management, Virtualization, 0 comments

One way to bring Innovative FOSS and Linux solutions to your organization

Here is an effective way to try out Linux or FOSS (Free and Open Source Software) to help grow your organization. Hat tip to CIO magazine for

Adam Hartung’s short little InnovationZone article “Outsource for Growth”.

Mr. Hartung recommends outsourcing to grow your organization … to do new things … to innovate … to be more flexible. Since your IT staff is probably too busy to take on growth projects like these, it makes a lot of sense to use consulting experts, like LinuxForce, to develop, test, and provision innovative FOSS infrastructure for you. How could your organization benefit from the outsourcing for innovation approach to build out a FOSS or Linux-based solution to grow your business?

Posted by CJ Fearnley in Development, 0 comments

Licensing Considerations When Integrating FOSS and Proprietary Software

Recently, I was looking for resources to help explain the implications of integrating FOSS (Free and Open Source Software) with proprietary software. This question is important for any organization who might want to build an “embedded” or “dedicated” system or product which might include either their own proprietary software or third party applications. I discovered that most of the information available about FOSS licensing addresses this issue rather obliquely. This post will cover the basics so that such organizations can see how straightforward it would be to include FOSS in their projects. Standard disclaimers about my not being a lawyer apply.

Use of any software system requires understanding the software licensing involved. There are many dozens of FOSS licenses that could apply in your situation, so understanding the details is necessary to assess compatibility. In broad terms, these can be effectively summarized by the Debian Free Software Guidelines and The Open Source Definition. Any software that complies with those specifications will freely (in both the “freedom” and money senses of the word) permit running proprietary and FOSS applications on the same system. This is easier said than done. Since Debian has analyzed most common FOSS and quasi-FOSS licenses, their archive and their license page can be used to assess how the license might work in your situation (for example, anything in “main” would be OK for co-distribution and running in mixed environments). So we always start with Debian’s meticulous and well-documented analysis to assess any license.

From a high-level perspective, the requirements that FOSS licensing will impose on an organization wanting to include it with proprietary software will primarily be in the form of providing appropriate attribution (acknowledgement) and providing the source code for any FOSS software (including any modifications) shipped as part of an integrated system. Typically the attribution requirement can be satisfied by simply referencing the source code of the FOSS components. The source code requirement can be met by providing the source code for all of the FOSS distributed as part of the integrated system, for example, by placing it on a documented ftp site.

There are some subtle issues that may arise during the software development process if proprietary and FOSS code are “linked” together. In such cases it is necessary to ensure that code over which one wants to assert proprietary control is only combined with FOSS code that explicitly supports such commingling. So it is necessary for a company wanting to keep its code proprietary to keep it “separate” from the FOSS code used in the integrated system. The use of the words “linked” and “separate” is intentionally vague because the terms of each FOSS license will need to be examined by legal counsel to understand the precise requirements. In these situations, there are license compliance management systems that can be adopted to help ensure that this separation is maintained.

Those are the basic issues. The references below describe these and related issues involved with Linux & FOSS licensing in much more depth:

Posted by CJ Fearnley in Development, Systems Management, 0 comments

Please Document the Shop: On the importance of good systems documentation

We have all heard this: You need to document the computer infrastructure. You never know when you might be “hit by a bus”. We hear this and think many frightening things, reassure ourselves that it will never happen and then put the request on the back burner. In this article I will expand on the phrase “hit by a bus” and then look at the consequences.

Things do happen to prevent people from coming into work. The boss calls home. Talks to the wife and makes the sad discovery that Mike wont be coming in anymore. He passed away last night in bed. People get sudden illnesses that disable them. Car accidents happen.

More often than these tragedies occur, thank goodness, business conditions change without warning. In reorganizations whole departments disappear, computer rooms are consolidated and moved, companies are bought and whole workforces replaced. I have had the unhappy experience of living through some of this.

Some organizations have highly transient workforces because of the environment that they operate in. Companies located near universities benefit from an influx of eager young, upwardly mobile university graduates. These workers are eager to gain experience but soon find higher paying jobs in the “real world” further away from campus. These companies have real turnover problems. People are moving up so quickly, they don’t have time to write things down.

Even when you keep people in place and maintain a fairly stable environment, people discover that what they have documented in their heads can just fade away. This is getting to be more and more of an issue. Networks and servers and other such infrastructure functions have been around for 20 years in many organizations. Fred the maintainer retired five years ago. Fred the maintainer was transferred to sales. The longer systems are around, the more things can happen to Fred. Fred might be right where he was 20 years ago. He just can’t remember what he did.

What does all this mean? What are the consequences of losing organizational knowledge in a computer organization? To be blunt, it creates a hideous environment for your computer people. The system is a black box to them. They are paralyzed. They are rightfully afraid. Every small move they make can bring down the system in ways they cannot predict. Newcomers take much longer to train. Old-timers learn to survive by looking busy while doing nothing. The politics of the shop and the whole company is made bloody by the various interpretations of the folklore of the black box. He/she who waves their arms hardest rules the day. This is no way for your people to live.

This is no way for the computer infrastructure to live as well. While the games are played the infrastructure evolves more slowly and slowly. Before long the infrastructure is frozen. Nobody dares to touch it. The only way to fix it is to completely replace it at considerable expense. In elaborate infrastructures this is easier said than done. The productive lifetime of the platform is shortened. It was not allowed to grow and evolve to lengthen its lifetime. Think of the Hubble Telescope without all the repairs and enhancements over the years. It would have burned out in re-entry long ago.

Having made my case, I ask again; for your own good, please document the shop. Make these documents public and make them accurate. Record what actually is rather than what you wish it to be. It is better to be a little embarrassed for a short while than to be mislead later on. Update the documentation when changes occur. An out of date document can be as bad as no document at all. Make an effort to record facts. At the same time don’t leave out general philosophies that guided the design and other qualitative information because it helps your successors interpret the facts when ambiguities occur.

Think of what you leave behind. Persuade your boss to make this a priority as well. Hopefully the people at your next workplace will do the same.

Posted by Laird Hariu in Eternally Regenerative Software Administration, Security, Systems Management, 0 comments

Introducing RemoteResponder.LinuxForce.Net

If you know the history of LinuxForce, you know that we’ve been doing remote systems administration using FOSS (Free and Open Source Software) since our founding in 1995. And we’ve called our remote systems administration service Remote Responder for a long time too. But the website RemoteResponder.LinuxForce.Net is new.

The new site is part of our educational initiative to explain the issues involved in administering FOSS-based IT infrastructures to achieve the promise of greater reliability and ever-improving functionality while keeping costs low and meeting an organizations’ ever-evolving business needs. Check out our new website RemoteResponder.LinuxForce.Net and let us know what you think.

Posted by CJ Fearnley in News, 0 comments