Pacemaker

5 things about FOSS Linux virtualization you may not know

In January I attended the 10th annual Southern California Linux Expo. In addition to speaking and running the Ubuntu booth, I had an opportunity to talk to other sysadmins about everything from selection of distribution to the latest in configuration management tools and virtualization technology.

I ended up in a conversation with a fellow sysadmin who was using a proprietary virtualization technology on Red Hat Enterprise Linux. Not only did he have surprising misconceptions about the FOSS (Free and Open Source Software) virtualization tools available, he assumed that some of the features he was paying extra for (or not, as the case may be) wouldn’t be in the FOSS versions of the software available.

Here are five features that you might be surprised to find included in the space of FOSS virtualization tools:

1. Data replication with verification for storage in server clusters

When you consider storage for a cluster there are several things to keep in mind:

  • Storage is part of your cluster too, you want it to be redundant
  • For immediate failover, you need replication between your storage devices
  • For data integrity, you want a verification mechanism to confirm the replication is working

Regardless of what you use for storage (a single hard drive, a RAID array, or an iSCSI device), the open source DRBD (Distributed Replicated Block Device) offers quick replication over a network backplane and verification tools you can run at regular intervals to ensure deta integrity.

Looking to the future, the FOSS distributed object store and file system Ceph is showing great promise for more extensive data replication.

2. Automatic failover in cluster configurations

Whether you’re using KVM Kernel-based Virtual Machine or Xen, automatic failover can be handled via a couple of closely integrated FOSS tools, Pacemaker and Corosync. At the core, Pacemaker handles core configuration of the resources themselves and Corosync handles quorum and “aliveness” checks of the hosts and resources and logic to manage moving Virtual Machines.

3. Graphical interface for administration

While development of graphical interfaces for administration is an active area, many of the basic tasks (and increasingly, more complicated ones) can be made available through the Virtual Machine Manager application. This manager uses the libvirt toolkit, which can also be used to build custom interfaces for management.

The KVM website has a list of other management tools, ranging from command-line (CLI) to Web-based: www.linux-kvm.org/page/Management_Tools

As does the Xen wiki: wiki.xen.org/wiki/Xen_Management_Tools

4. Live migrations to other hosts

In virtualized environments it’s common to reboot a virtual machine to move it from one host to another, but when shared storage is used it is also possible to do live migrations on KVM and Xen. During these live migrations, the virtual machine retains state as it moves between the physical machines. Since there is no reboot, connections stay intact and sessions and services continue to run with only a short blip of unavailability during the switch over.

Documentation for KVM, including hardware and software requirements for such support, can be found here: www.linux-kvm.org/page/Migration

5. Over-allocating shared hardware

KVM has the option to take full advantage of hardware resources by over-allocating both RAM (with adequate swap space available) and CPU. Details about over-allocation and key warnings can be found here: Overcommitting with KVM.

Conclusion

Data replication with verification for storage, automatic failover, graphical interface for administration, live migrations and over-allocating shared hardware are currently available with the FOSS virtualization tools included in many modern Linux distributions. As with all moves to a more virtualized environment, deployments require diligent testing procedures and configuration but there are many on-line resources available and the friendly folks at LinuxForce to help.

Posted by Elizabeth Krumbach in Development, Systems Management, Virtualization, 0 comments

An Infrastructure for Server Clusters for High Availability

As announced in our Cluster Services Built With FOSS post, LinuxForce’s Cluster Services are built exclusively with Free and Open Source Software (FOSS). Here is an expanded outline of the basic architecture of our approach to High-Availability (HA) clustering.

Overview

In any HA deployment there are two main components: hosts and guests. The hosts are the systems which are the core of the cluster itself. The host runs with very limited services dedicated for the use and functioning of the cluster. The host systems handle resource allocation, from persistent storage to RAM to the number of CPUs each guest gets. The host machines give an “outside” look at guest performance and give the opportunity to manipulate them from outside the guest operating system. This offers significant advantages when there are boot or other failures which traditionally would require physical (or at least console) access to debug. The guests in this infrastructure are the virtual machines (VMs) which will be running the public-facing services.

On the host, we define a number of “resources” to manage the guest systems. Resources are defined for ping checking the hosts, bringing up shared storage or storage replication (like drbd) as primary on one machine or the other and launching the VMs.

In the simplest case, the cluster infrastructure is used for new server deployments, in which case the operating system installs are fresh and the services are new. More likely a migration from an existing infrastructure will be necessary. Migrations from a variety of sources are possible including from physical hardware, other virtualization technologies (like Xen) or different KVM infrastructures which may already use many of the same core features, like shared storage. When a migration is required downtime can be kept to a minimum through several techniques.

Hardware configuration

The first consideration when you begin to build a cluster is the hardware. The basic requirement for a small cluster is 3 servers and a fast dedicated network backplane to connect the servers. The three servers can all be active as hosts, but we typically have a configuration where two machines are the hosts and a third, less powerful arbitrator system is available to make sure there is a way to break ties when there is resource confusion.

Two live resource hosts

These systems will be where the guests are run. They should be as similar as possible down to the selection of processor brand and amount of RAM and storage capabilities so that both machines are capable of fully taking over for the other in case of a failure, thus ensuring high availability.

The amount of resources required will be heavily dependent upon the services you’re running. When planning we recommend thinking about each guest as a physical machine and how many resources it needs, allowing room for inevitable expansion of services over time. You can over-commit both CPU and RAM on KVM, so you will want to read a best practices guide such as Chapter 6: Overcommitting with KVM. Disk space requirements and configuration will vary greatly depending upon your deployment, including the ability to use shared storage backplanes and replicated RAID arrays, but Linux Software RAID will typically be used for the core operating system install controlling each physical server. Additionally, using a thorough testing process so you know how your services will behave if they run out of resources is critical to any infrastructure change.

Tie-breaker (arbitrator)

A third server is required to complete quorum for the cluster. In our configuration this machine doesn’t need to have high specs or a lot of storage space. We typically use at least RAID1 so we have file system redundancy for this host.

1000M switch

A fast switch whose only job is to handle traffic between the three machines is highly recommended for assured speed of these two vital resources:

  1. Storage backplane
  2. Corosync/Pacemaker communication

It’s best to keep these off a shared network, which may be prone to congestion or failure, since fast speeds for both these resources are important for a properly functioning cluster.

Key software components

There are many options when it comes to selecting your HA stack, from which Linux distribution to use, to what storage replication system to use. We have selected the following:

Debian GNU/Linux

Like most LinuxForce solutions, we start with a base of Debian stable, currently Debian “Squeeze” 6.0. All of the software mentioned in this article comes from the standard Debian stable repository and is open source and completely free of charge.

Logical Volume Manager (LVM)

We use LVM extensively throughout our deployments for the flexibility of easy reallocation of filesystem resources. In a cluster infrastructure it is used to create separate disk images for each guest and then may be used again inside this disk image for partitioning.

Distributed Replicated Block Device (DRBD)

DRBD is used for replicating storage between the two hosts which have their own storage. Storage needs could also be met by shared storage or other data replication mechanisms.

Kernel-based Virtual Machine (KVM)

Since hardware-based virtualization is now ubiquitous on modern server hardware we use KVM for our virtualization technology. It allows fully virtualized VMs running their own unmodified kernels to run directly on the hardware without the overhead of a hypervisor or emulation.

Pacemaker & Corosync

Pacemaker and Corosync are used together to do the heavy lifting of the cluster management. The two services are deeply intertwined, but at the core Pacemaker handles core configuration of the resources themselves, and Corosync handles quorum and “aliveness” checks of the hosts and resources and determination of where resources should go.

Conclusion

We have deployed this infrastructure for mission critical services including DNS, FTP and web server infrastructures serving everything from internal ticketing systems to high-traffic public-facing websites. For a specific example of implementation of this infrastructure, see Laird Hariu’s report File Servers – The Business Case for High Availability where he covers the benefits of HA for file servers.

Image in this post by Jeannie Moberly, licensed under CC BY-SA 3.0.

Posted by Elizabeth Krumbach in Development, Systems Management, Tech Notes, Virtualization, 0 comments

File Servers – The Business Case for High Availability

Introduction

You have probably heard of high availability transaction processing servers.  You have most likely read about the sophisticated systems used by the airlines to sell tickets online.  They have to be non-stop because downtime translates to lost orders and revenue.  In this article I will discuss the economics of using non-stop technologies for everyday applications.  I will show that even ordinary file sharing applications can benefit from inexpensive Linux based Pacemaker clustering technology.

Availability Goal

What is our availability goal?  Our goal should be to take prudent and cost effective measures to reduce computer downtime to nil in the required service window.  I’m not talking about 99.999 % (five 9s) up time.  This is the popular (and very expensive) claim made by high availability vendors.  I’m talking about maintaining enough up time to service the application.  Take a simple example, for office document preparation the service time window is office hours (9-5).  The rest of the time the desktop PCs can be turned off, nobody is there to operate them anyway.  You only need the PCs for 5 days a week for 8 hours a day or for 2080 hours per desktop PC per year.  This translates into an up time requirement of 24 percent.   Ideally you want the desktop PCs to be available all the time during office hours but are willing to give up availability for routine maintenance and for the infrequent breakdowns that may occur only once per workstation every five years or so.  Perhaps you have two spare desktop PC workstations for every 100.  This extra capacity allows your office workers to resume their work on a spare while their workstation is being repaired.  In this example the cost of maintaining adequate availability is the cost of maintaining two spare desktop PCs.  You might adjust this cost to account for real world conditions at the work site.  Wide swings in operating temperature or poor quality electricity supply, might dictate that you increase the number of spare PCs.  Sounds like a low stress, straightforward availability solution.

Network Effects

The problem gets more complicated when the desktop PCs are networked together and all the documents are stored on a central file server rather than on each workstation’s hard drive.  There is a multiplicative effect.  If the file server is not available then all 100 document processing PCs are rendered unavailable.  Then you have 98 (remember the 2 spares from above) workers being paid but not producing documents.  A failure during office hours can become expensive.  One hour of downtime can cost as much as $1500 in lost worker wages.  A day of downtime can cost $12,000 of lost worker wages.  How long will it take for a hardware repair person to travel to your site?  How long will it take for spare parts to arrive?  How long will it take the repair person to replace the parts?  How long will it take for damaged files to be replaced from backup by your own people?  A serious but not unlikely failure can take several days to be completely resolved.  Its not unreasonable to assume that such a $24,000 failure can occur once every 5 years.  This is a very simple example.  We are not talking about a complicated order-entry or inventory control system.  We are talking about 98 office workers saving files to a central file share so that they can be indexed and backed up.

The Effects of Time

I’m going to add another wrinkle to our office document processing example.  This file sharing setup has been in use for 4 years.  Time flies.  The hardware is getting old faster than you realize.  Old hardware is more likely to fail.  It has been through more thunderstorms, more A/C breakdowns, people knocking the server by accident and all that.  You’ve been noticing that your hardware maintenance plan is costing more every year.  How long is the hardware vendor going to stock spare parts for your obsolete office equipment?  Please forgive me for playing on your paranoia but the real world can be rude.

Time for an Upgrade

In this scenario you conclude that you are going to have to replace that file server soon.  Its going to be a pain to migrate all the files to a new unit.  I am going to have to upgrade to a new version of Windows server.  How much is that going to cost?  How much has Windows changed?  If I am going to have to go to all this trouble, why not get some new improvement out of it.  I know I can get bigger disks and more RAM (random access memory) for less money than I paid for the old server.  Whoops.  Windows is going to cost more.  I have to pay a charge for every workstation attached to it.  That CAL (Client Access License) price has gone up.  I read something about high availability clustering in Windows.  Enterprise Server does that.  Wow.  Look at the price of that!  Remember that $12,000 per day of downtime cost overhang?  It’s more of an issue now that you are dealing with an old system.

A Debian Cluster Solution

Enough of this already.  Since I asked so many questions and raised so many doubts, I owe you, the reader, some answers.  Debian Linux provides a very nice high availability solution for file servers.  You need two servers with directly attached storage and also a third little server that can be little more than a glorified workstation.  You need Debian Squeeze 64 bit edition that has the Pacemaker, Corosync, drbd and Samba packages installed for each server.  The software is free.  You pay for the hardware and a trustworthy Linux consultant who can set everything up for you.  What you get is a fully redundant quorum cluster with fully redundant storage, multiple CPU cores on each node, much more RAM than you had before and much more storage capacity.

Here are hardware price estimates:

Tie breaker node: Two hard drives, 512MB RAM $500.00
Name brand file server node: 8 2TB SATA drives, 24GB RAM, 1 4 core CPU chip,  3 year on site parts and labor warranty. $6,000.00
Second file server node like above. $6,000.00
Misc parts for storage and control networks. $200.00
Total: $12,700.00

Each file server node has software RAID 5 and each node holds 14 terabytes of disk storage.  Because it is completely redundant across nodes, total cluster storage capacity is 14 terabytes.  Performance of this unit will be much better than the old unit.  It effectively has 4 CPUs per file storage node and much more RAM for file buffering.  Software updates from Debian are free.  You just need someone to apply the security patches and version upgrades.

The best feature is complete redundancy for file processing.  In our file server example, any one of the nodes can completely fail and file server processing will continue.  Based on the lost labor time cost estimates above, this system pays for itself if it eliminates 1 day of downtime in a five year period.  You also have hardware maintenance savings of whatever the yearly charge is for your old system times 3 years because you get 3 years of warranty coverage on the new hardware.  You have the consultant’s charges for converting to the new system, but remember, you were going to have to pay that fee for a new Windows system as well.

Conclusion

I hope I have stirred your interest in Linux Pacemaker based clusters.  I have shown a file server upgrade that pays for itself by reducing downtime.  You also upgrade your file server’s performance while reducing out of pocket expenses for software and hardware maintenance.  Not a bad deal.

Posted by Laird Hariu in Debian, Eternally Regenerative Software Administration, Systems Management, Virtualization, 0 comments