Tech Notes

An Infrastructure for Server Clusters for High Availability

As announced in our Cluster Services Built With FOSS post, LinuxForce’s Cluster Services are built exclusively with Free and Open Source Software (FOSS). Here is an expanded outline of the basic architecture of our approach to High-Availability (HA) clustering.

Overview

In any HA deployment there are two main components: hosts and guests. The hosts are the systems which are the core of the cluster itself. The host runs with very limited services dedicated for the use and functioning of the cluster. The host systems handle resource allocation, from persistent storage to RAM to the number of CPUs each guest gets. The host machines give an “outside” look at guest performance and give the opportunity to manipulate them from outside the guest operating system. This offers significant advantages when there are boot or other failures which traditionally would require physical (or at least console) access to debug. The guests in this infrastructure are the virtual machines (VMs) which will be running the public-facing services.

On the host, we define a number of “resources” to manage the guest systems. Resources are defined for ping checking the hosts, bringing up shared storage or storage replication (like drbd) as primary on one machine or the other and launching the VMs.

In the simplest case, the cluster infrastructure is used for new server deployments, in which case the operating system installs are fresh and the services are new. More likely a migration from an existing infrastructure will be necessary. Migrations from a variety of sources are possible including from physical hardware, other virtualization technologies (like Xen) or different KVM infrastructures which may already use many of the same core features, like shared storage. When a migration is required downtime can be kept to a minimum through several techniques.

Hardware configuration

The first consideration when you begin to build a cluster is the hardware. The basic requirement for a small cluster is 3 servers and a fast dedicated network backplane to connect the servers. The three servers can all be active as hosts, but we typically have a configuration where two machines are the hosts and a third, less powerful arbitrator system is available to make sure there is a way to break ties when there is resource confusion.

Two live resource hosts

These systems will be where the guests are run. They should be as similar as possible down to the selection of processor brand and amount of RAM and storage capabilities so that both machines are capable of fully taking over for the other in case of a failure, thus ensuring high availability.

The amount of resources required will be heavily dependent upon the services you’re running. When planning we recommend thinking about each guest as a physical machine and how many resources it needs, allowing room for inevitable expansion of services over time. You can over-commit both CPU and RAM on KVM, so you will want to read a best practices guide such as Chapter 6: Overcommitting with KVM. Disk space requirements and configuration will vary greatly depending upon your deployment, including the ability to use shared storage backplanes and replicated RAID arrays, but Linux Software RAID will typically be used for the core operating system install controlling each physical server. Additionally, using a thorough testing process so you know how your services will behave if they run out of resources is critical to any infrastructure change.

Tie-breaker (arbitrator)

A third server is required to complete quorum for the cluster. In our configuration this machine doesn’t need to have high specs or a lot of storage space. We typically use at least RAID1 so we have file system redundancy for this host.

1000M switch

A fast switch whose only job is to handle traffic between the three machines is highly recommended for assured speed of these two vital resources:

  1. Storage backplane
  2. Corosync/Pacemaker communication

It’s best to keep these off a shared network, which may be prone to congestion or failure, since fast speeds for both these resources are important for a properly functioning cluster.

Key software components

There are many options when it comes to selecting your HA stack, from which Linux distribution to use, to what storage replication system to use. We have selected the following:

Debian GNU/Linux

Like most LinuxForce solutions, we start with a base of Debian stable, currently Debian “Squeeze” 6.0. All of the software mentioned in this article comes from the standard Debian stable repository and is open source and completely free of charge.

Logical Volume Manager (LVM)

We use LVM extensively throughout our deployments for the flexibility of easy reallocation of filesystem resources. In a cluster infrastructure it is used to create separate disk images for each guest and then may be used again inside this disk image for partitioning.

Distributed Replicated Block Device (DRBD)

DRBD is used for replicating storage between the two hosts which have their own storage. Storage needs could also be met by shared storage or other data replication mechanisms.

Kernel-based Virtual Machine (KVM)

Since hardware-based virtualization is now ubiquitous on modern server hardware we use KVM for our virtualization technology. It allows fully virtualized VMs running their own unmodified kernels to run directly on the hardware without the overhead of a hypervisor or emulation.

Pacemaker & Corosync

Pacemaker and Corosync are used together to do the heavy lifting of the cluster management. The two services are deeply intertwined, but at the core Pacemaker handles core configuration of the resources themselves, and Corosync handles quorum and “aliveness” checks of the hosts and resources and determination of where resources should go.

Conclusion

We have deployed this infrastructure for mission critical services including DNS, FTP and web server infrastructures serving everything from internal ticketing systems to high-traffic public-facing websites. For a specific example of implementation of this infrastructure, see Laird Hariu’s report File Servers – The Business Case for High Availability where he covers the benefits of HA for file servers.

Image in this post by Jeannie Moberly, licensed under CC BY-SA 3.0.

Posted by Elizabeth Krumbach in Development, Systems Management, Tech Notes, Virtualization, 0 comments

One way to migrate Xen virtual machines to KVM in Debian

There are dozens of virtualization products on the market. When we launched our first high-availability cluster in early 2008 we chose Xen due to it’s ability to run on non-virtualized hardware, support in Debian 4.0 (Etch) and general flexibility. We’ve learned a lot about the upstream of Xen, including the challenges that Debian maintainers face, and we were increasingly drawn to another free and open source virtualization technology, Kernel Based Virtual Machine (KVM). The primary downside to KVM is that it requires special CPU hardware support to run, but this hardware support is now almost ubiquitous on modern servers. KVM has the advantage of being supported upstream in the Linux kernel itself, removing the onus of difficult kernel patching from the Debian Developers and has become the supported virtualization option for Ubuntu, Fedora and Red Hat. Additionally, KVM allows the guests to run unaltered, meaning you don’t need a special kernel and can run many OSes, from Linux to FreeBSD to Windows 7.

We still work on a number of machines which lack hardware virtualization support, but as our customers upgrade hardware we’ve begun moving production virtual machines from Xen to KVM. In tackling the migrations of these production virtual machines we encountered several challenges, the major ones being:

  • In Xen, partitions were created in separate logical volumes on the host and mounted by Xen itself and as a result we didn’t require Logical Volume Manager (LVM) within our Xen guests, in KVM the logical volume on the host for a virtual machine is a single disk image, not separate partitions.
  • In Xen, the kernel package is not installed, in KVM it is required
  • In Xen, you don’t have an independent bootloader on the OS, in KVM you need one to boot

The first step was to create a partition table on the new KVM image which is identical to the one on Xen. We wanted to use LVM within the guest, which required a Matryoshka (Russian) doll approach. First we’d create a volume group on the host to give us the typical flexibility of LVM host-side, and then we’d create one on the disk image giving us the flexibility required within the guest itself to expand any partitions. Finally we’d need to bootstrap the new system and copy the files over.

One way to go about all of this is a manual process. This solution would allow for scripting the procedure but requires a significant investment of time to get everything right so there is the least amount of down time possible. Since we only have a half dozen of these VMs to migrate in total, we looked for some way which already existed for handling all these steps in a familiar way, which is when we looked to the Debian Installer. Assuming a local mirror of core Debian packages (recommended), we could install a skeleton system on a test IP address which was properly partitioned, had LVM configured and bootstrapped in less than 20 minutes. We could then take this skeleton system that we’re sure is an functioning, bootable system and move the files over from the Xen installs, arguably decreasing our risk with each migration and the downtime required.

To get started, you’ll first need to calculate the total size of the current VM and lvcreate a slice of that size on the KVM system, then launch the Debian installer against the image using virt-install, something like:

virt-install --connect qemu:///system -n guest.example.com -r 768 -f /dev/mapper/VolumeGroup-guest.example.com -s 12 -c /var/lib/libvirt/images/debian-506-i386-netinst.iso --vnc --noautoconsole --accelerate --os-type linux --os-variant debianLenny

In this example we assume the debian-506-i386-netinst.iso is in /var/lib/libvirt/images/ and we want 768M of RAM, the information you put here is similar to the information that you would have previously defined in your /etc/xen/guest.example.com.cfg file for Xen.

Then use virt-manager to connect (we actually connected from a remote desktop running virt-manager) to the running install session (the standard Debian installer does not provide serial access) and install Debian, you will need the root password to launch the installer. Proceed to install.

When you get to the step to partition the disks, lay out the partition table to be identical to the VM you want to migrate to it except put it on LVM, put /boot on a separate partition outside of LVM. Complete the install, including installing grub.

Confirm that the system will boot and works on a test IP address and make a copy of your /etc/fstab to the host system, you’ll need this later.

You now have a skeleton install of Debian which runs on KVM with the LVM partitions you need.

To begin the actual data migration, you’ll want to mount the volumes within your new KVM disk, this can be done with the help of a great little mapping tool called kpartx. To map and activate the volume group on the guest, following these steps:

kpartx -av /dev/VolumeGroup/guest.example.com
vgscan
vgchange -ay guest

In this example we assume the Linux Volume on the host is called “guest.example.com” and the Linux Volume within the guest is called “guest”.

Now that the host can see the Volume Group on the guest, and all the Volumes in /dev/mapper/ you’ll want to mount them.

Once mounted, you’ll be able to start your rsync. To incur the least amount of downtime, you’ll want to rsync the large data partitions (like /srv, /home, /var, perhaps /usr) while your production host is still running. Note: All rsyncs completed during this process must be done with the “–numeric-ids” option so the permissions are not inherited from the host!

While you’re rsyncing the data, you’ll want to go into your Xen system and install the following packages (installing these will no impact your Xen system, it doesn’t strictly use them so it will simply ignore them):

  • linux-image-2.6-686
  • grub
  • lvm2

When you’ve completed moving the largest portions of your Xen guest, bring down the production Xen guest (downtime starts now!) and mount the filesystems. And begin rsyncing the data over (preferably over a crossover cable for the fastest transfer, remember to use –numeric-ids in your rsync).

Once the rsync is complete, edit the following files on the KVM host:

  • /mnt/guest/etc/fstab – use the version you saved to the host in a previous step
  • /mnt/guest/etc/inittab – uncomment the ttyS0 line to allow for serial access from the KVM host for virsh
  • /mnt/guest/etc/udev/rules.d/70-persistent-net.rules – comment out eth0 line so eth0 can come up on the new KVM MAC address

Unmount the filesystems on both sides and on the KVM side disable the volume group and use kpartx again to unmap the filesystem from the host:

sudo vgchange -an guest
sudo kpartx -dv /dev/VolumeGroup/guest.example.com

You’re now ready to boot the VM on the KVM side. This can be done with virt-manager or virsh.

Since you just moved the machines to a new server, and probably new MAC addresses, you will probably need to run the arping command to reclaim the IP address of the VM and all its service IP addresses.

Some things to confirm are working:

  • Networking (confirm there are no lingering arp caching issues)
  • Email (where applicable, confirm system messages etc are being sent)
  • All services running (confirm key services, review monitoring dashboard, log in via ssh)
  • Confirm that you have contiguous logs

Now that we’ve completed one of these migrations we have a lot of ideas about how to improve the process, including the possibility of making the whole process more scriptable, but this quick method leveraging the Debian installer for easier disk configuration and bootstrapping worked very well.

Posted by Elizabeth Krumbach in Debian, Systems Management, Tech Notes, Virtualization, 0 comments

Some thoughts on best practices for SMTP blocking of e-mail spam

Blocking e-mail spam at the time of SMTP (Simple Mail Transfer Protocol) transfer has become a best practice. There is no point wasting precious bandwidth & disk space and spending time browsing a huge spambox when most of the incoming flow is clearly spam. At LinuxForce our e-mail hygiene service, LinuxForceMail, makes extensive use of SMTP blocking techniques (using free and open source software such as Exim, Clam AV, SpamAssassin and Policyd-weight). But we are extremely careful to only block sites and e-mails that are so “spammy” that we are justified in blocking it. That doesn’t prevent false positives, but it keeps them to a minimum.

Recently we investigated an incident where one of our users had their e-mail blocked by another company’s anti-spam system. In investigating the problem, we learned that some vendors support an option to block e-mail whose Received header is on a blacklist (in our case it was Barracuda, but other vendors are also guilty). Let me be blunt: this is boneheaded, but the reason is subtle so I can understand how the mistake might be made.

First, blocking senders appearing on a blacklist at SMTP time is good practice. But to understand why blocking Received headers at SMTP time is bad, it is important to understand how e-mail transport works. The sending system opens a TCP/IP connection from a particular IP address. That IP address should be checked against blacklists. And other tests on the envelope can help identify spam. But the message headers including the Received header are not so definite. We shall see that even a blacklisted IP in these headers may be legitimate. So blocking such e-mail incurs unnecessary risks.

The problem occurs when a user of an ISP (Internet Service Provider) sends an e-mail from home, they are typically using a transient, “dynamic” IP address. Indeed it is possible that their IP address has just changed. Since the new address may have been previously used by someone infected with a virus sending out spam, this “new” IP address may be on the blacklists. So, due to no fault of your own, you have a blacklisted IP address (I will suppress my urge to rant for IPv6 when everyone can finally have their own IP address and be responsible for its security).

Now, when you send an e-mail through your ISP’s mail server, it records your (blacklisted) IP as the first Received header. So your (presumably secure) system sending a legitimate message through your ISP’s legitimate, authenticating mail server is blacklisted by your recipients’ overambitious anti-spam system. Ouch. That is why blocking such an e-mail is just wrong. This kind of blocking creates annoying unnecessary complications for the users and admins at both sides. Using e-mail filtering to put such e-mails into a spam folder would be a reasonable way to handle the situation. Filtering is able to handle false positives whereas blocking generates unrecoverable errors.

Do not block e-mail based on the Received header!

Posted by CJ Fearnley in Security, Systems Management, Tech Notes, Ubuntu, 0 comments

Crossroads in FOSS Projects: Some Business Considerations

At our Seminar last month, Managing FOSS to Lower Costs and Achieve Business Results, several participants asked about the dynamics of FOSS (Free and Open Source Software) projects that reach a crossroads (a failure, a merger, loss of key personnel, etc). I had not expected that concern because with commercial software, it seems to me, the problem is more severe. When you have the source code and the right to modify and redistribute it, the source gives many more options (and its freedoms provide many more protections) than when commercial software goes bankrupt or gets bought by a competitor for instance.

But the reason for the questions may be a lack of understanding about how FOSS projects work. They involve individual human beings, perhaps just a single person or, more likely, several people from many organizations and even different cultures around the world joined in common purpose. For various cultural reasons, the project may be “owned” by an entity — usually a non-profit, but some are for-profit or even government owned, while others may simply be an “ad hoc initiative”. Some projects have explicit constitutions and defined processes for organizing the work and handling problems others are more informal.

At any time, any human social structure can experience a crossroads that could lead it to fail suddenly or wither on the vine in a gradual descent into “oblivion”. The cause of the failure will shape the results, but a very common situation is that conflicting visions or approaches for the project result in a “fork”. Then a sub-group of the original project takes the source code and starts a “new” project to develop the code in a new direction. Sometimes the original project “dies” and sometimes both continue resulting in two projects. Since multiple FOSS projects serving the same function or market incur inefficiencies due to duplicate development, there is a strong cultural value in the FOSS world to try to find a way to accommodate everyone in the project and prevent forks. When it works, the result is great software that meets everyone’s needs. But the reality is that often it is more effective to have multiple implementations of the same functionality so that each can be optimized for distinct objectives. Frequently one cannot know which approach will be best until many years of development and evaluation have transpired.

I recently learned about a FOSS project that forked when a friend asked me to copy some files to his new “My Book Essential”, a Western Digital product that provides 1TB of USB (Universal Serial Bus) storage. The My Book uses the poorly documented, non-free NTFS (New Technology File System). Linux has three projects that support NTFS: an in kernel driver, ntfsprogs (the Linux-NTFS project), and NTFS-3G. It turns out that all three were available for my Debian Lenny (5.0.3) system. First, I tried the in kernel support and learned that it was still read-only. Then I tried ntfsprogs which failed to mount the My Book:

NTFS-fs error (device sdc1): load_system_files(): Volume is dirty. Mounting read-only. Run chkdsk and mount in Windows.

I realized that since it was a new device it probably did not ship from the factory with a dirty volume. It was probably a bug. So I tried NTFS-3G which worked very well. In my research of the situation I was able to determine that both NTFS-3G and Linux-NTFS are under active development and have features missing from the other. So each has value and I’m glad my distribution included both. In Debian Lenny, the NTFS-3G driver has better support for writing files.

This illustrates one of the benefits of a crossroads in a FOSS project: you can end up with two good tools to add to your toolbox!

Posted by CJ Fearnley in Debian, FOSS Community, Tech Notes, 0 comments

Xen Virtualization: Migrating 32-bit domUs to 64-bit dom0s

Virtualization provides the facility to run multiple isolated computer operating systems on one piece of computing hardware. There has been a huge increase of interest in virtualization technology because recent advances in multi-core technology provide significantly more computing power in each machine with ever decreasing costs. Virtualization is one of the best ways to take advantage of these big changes in hardware.

Currently, Xen is the most mature FOSS (Free and Open Source Software) virtualization technology. Although we love the idea of KVM, since it requires a special processor extension on X86 systems, it cannot work on older hardware. So for at least another few years, we think Xen is the more flexible choice for FOSS virtualization projects.

The Xen infrastructure consists of the Xen hypervisor which “runs the show”, a domain 0 (dom0) which runs a special, privileged version of the operating system (typically Linux, but NetBSD and Solaris are also supported), and one or more domain U (domU) “guest” (or “User”) operating systems. We have found that Xen is easy to configure in many situations, but we encountered some complications in running a domU on a dom0 with a different architecture.

We recently migrated some 32-bit domUs running Debian Etch (4.0) from a 32-bit dom0 to a newer 64-bit dom0 running Debian Lenny (5.0). We did a direct move (using rsync) of the Logical Volume Manager (LVM) slices from the 32-bit dom0 to the 64-bit dom0. This means we’d now be running our 32-bit Etch domUs on a 64-bit Lenny dom0.

The first question was whether this would be possible. Absolutely! 32-bit domUs have no trouble running on 64-bit dom0s, we could even use the 64-bit Xen kernel in these 32-bit systems to avoid additional kernel installations we’d need to maintain on the dom0. The second question was whether we could properly load the 64-bit kernel modules inside our domU. Again, yes! But with a caveat: the domUs were 32-bit Etch, so the 64-bit Lenny kernel modules were not simply installable via apt. We realized that copying over the .deb package for the kernel modules and running dpkg -i --force-architecture linux-modules-2.6.26...deb would not be a maintainable way to handle the kernel module updates moving forward. So we weighed our options:

1. Serve these modules via a network file system (such as NFS) to each domU on bootup.

2. Deploy a script that would notify the domU and copy the new kernel modules .deb to it for installation. We could then install the new module package at our discretion.

We decided that the first option violated our strict security policy which calls for running as few services on the dom0 as possible. Since the second solution is scriptable and therefore automatable, it fit our vision of having easily maintainable systems regardless of the underlying complexity. So we installed the 64-bit modules prior to migration so that all the proper modules would be loaded as soon as we brought up the domU on the new Dom0. The result was a flawless migration of our 32-bit domUs to the new 64-bit dom0.

Posted by Elizabeth Krumbach in Tech Notes, Virtualization, 0 comments