Archive for the ‘Systems Management’ Category

Cluster Services Built With FOSS

Thursday, January 12th, 2012

By Elizabeth Krumbach

Built on the Free/Open Source Software (FOSS) model for cluster deployments, LinuxForce staff has been hard at work over the past months developing and deploying LinuxForce Cluster Services built upon exclusively FOSS technologies and on December 15th we put out a press release:

Announcing LinuxForce Cluster Services

In September Laird Hariu wrote the article “File Servers – The Business Case for High Availability” where, in addition to building a case to use clusters, he also briefly outlined how Debian and other FOSS could be used to create a cluster for a file server. File servers are just the beginning, we have deployed clusters which host web, mail, DNS and more.

The core of this infrastructure uses Debian 6.0 (Squeeze) 64-bit and then depending upon the needs and budget of the customer, and whether they have a need for high availability, we use tools including Pacemaker, Corosync, rsync, drbd and KVM. Management of this infrastructure is handled remotely through the virtualization API libvirt using the virsh and Virtual Machine Manager.

The ability to use such high-quality tools directly from the repositories in the stable Debian distribution keeps our maintenance costs down, avoids vendor lock-in and gives companies like ours the ability offer these enterprise-level clustering solutions to small and medium size businesses for reasonable prices.

File Servers – The Business Case for High Availability

Tuesday, September 13th, 2011

By Laird Hariu

Introduction

You have probably heard of high availability transaction processing servers.  You have most likely read about the sophisticated systems used by the airlines to sell tickets online.  They have to be non-stop because downtime translates to lost orders and revenue.  In this article I will discuss the economics of using non-stop technologies for everyday applications.  I will show that even ordinary file sharing applications can benefit from inexpensive Linux based Pacemaker clustering technology.

Availability Goal

What is our availability goal?  Our goal should be to take prudent and cost effective measures to reduce computer downtime to nil in the required service window.  I’m not talking about 99.999 % (five 9s) up time.  This is the popular (and very expensive) claim made by high availability vendors.  I’m talking about maintaining enough up time to service the application.  Take a simple example, for office document preparation the service time window is office hours (9-5).  The rest of the time the desktop PCs can be turned off, nobody is there to operate them anyway.  You only need the PCs for 5 days a week for 8 hours a day or for 2080 hours per desktop PC per year.  This translates into an up time requirement of 24 percent.   Ideally you want the desktop PCs to be available all the time during office hours but are willing to give up availability for routine maintenance and for the infrequent breakdowns that may occur only once per workstation every five years or so.  Perhaps you have two spare desktop PC workstations for every 100.  This extra capacity allows your office workers to resume their work on a spare while their workstation is being repaired.  In this example the cost of maintaining adequate availability is the cost of maintaining two spare desktop PCs.  You might adjust this cost to account for real world conditions at the work site.  Wide swings in operating temperature or poor quality electricity supply, might dictate that you increase the number of spare PCs.  Sounds like a low stress, straightforward availability solution.

Network Effects

The problem gets more complicated when the desktop PCs are networked together and all the documents are stored on a central file server rather than on each workstation’s hard drive.  There is a multiplicative effect.  If the file server is not available then all 100 document processing PCs are rendered unavailable.  Then you have 98 (remember the 2 spares from above) workers being paid but not producing documents.  A failure during office hours can become expensive.  One hour of downtime can cost as much as $1500 in lost worker wages.  A day of downtime can cost $12,000 of lost worker wages.  How long will it take for a hardware repair person to travel to your site?  How long will it take for spare parts to arrive?  How long will it take the repair person to replace the parts?  How long will it take for damaged files to be replaced from backup by your own people?  A serious but not unlikely failure can take several days to be completely resolved.  Its not unreasonable to assume that such a $24,000 failure can occur once every 5 years.  This is a very simple example.  We are not talking about a complicated order-entry or inventory control system.  We are talking about 98 office workers saving files to a central file share so that they can be indexed and backed up.

The Effects of Time

I’m going to add another wrinkle to our office document processing example.  This file sharing setup has been in use for 4 years.  Time flies.  The hardware is getting old faster than you realize.  Old hardware is more likely to fail.  It has been through more thunderstorms, more A/C breakdowns, people knocking the server by accident and all that.  You’ve been noticing that your hardware maintenance plan is costing more every year.  How long is the hardware vendor going to stock spare parts for your obsolete office equipment?  Please forgive me for playing on your paranoia but the real world can be rude.

Time for an Upgrade

In this scenario you conclude that you are going to have to replace that file server soon.  Its going to be a pain to migrate all the files to a new unit.  I am going to have to upgrade to a new version of Windows server.  How much is that going to cost?  How much has Windows changed?  If I am going to have to go to all this trouble, why not get some new improvement out of it.  I know I can get bigger disks and more RAM (random access memory) for less money than I paid for the old server.  Whoops.  Windows is going to cost more.  I have to pay a charge for every workstation attached to it.  That CAL (Client Access License) price has gone up.  I read something about high availability clustering in Windows.  Enterprise Server does that.  Wow.  Look at the price of that!  Remember that $12,000 per day of downtime cost overhang?  It’s more of an issue now that you are dealing with an old system.

A Debian Cluster Solution

Enough of this already.  Since I asked so many questions and raised so many doubts, I owe you, the reader, some answers.  Debian Linux provides a very nice high availability solution for file servers.  You need two servers with directly attached storage and also a third little server that can be little more than a glorified workstation.  You need Debian Squeeze 64 bit edition that has the Pacemaker, Corosync, drbd and Samba packages installed for each server.  The software is free.  You pay for the hardware and a trustworthy Linux consultant who can set everything up for you.  What you get is a fully redundant quorum cluster with fully redundant storage, multiple CPU cores on each node, much more RAM than you had before and much more storage capacity.

Here are hardware price estimates:

Tie breaker node: Two hard drives, 512MB RAM $500.00
Name brand file server node: 8 2TB SATA drives, 24GB RAM, 1 4 core CPU chip,  3 year on site parts and labor warranty. $6,000.00
Second file server node like above. $6,000.00
Misc parts for storage and control networks. $200.00
Total: $12,700.00

Each file server node has software RAID 5 and each node holds 14 terabytes of disk storage.  Because it is completely redundant across nodes, total cluster storage capacity is 14 terabytes.  Performance of this unit will be much better than the old unit.  It effectively has 4 CPUs per file storage node and much more RAM for file buffering.  Software updates from Debian are free.  You just need someone to apply the security patches and version upgrades.

The best feature is complete redundancy for file processing.  In our file server example, any one of the nodes can completely fail and file server processing will continue.  Based on the lost labor time cost estimates above, this system pays for itself if it eliminates 1 day of downtime in a five year period.  You also have hardware maintenance savings of whatever the yearly charge is for your old system times 3 years because you get 3 years of warranty coverage on the new hardware.  You have the consultant’s charges for converting to the new system, but remember, you were going to have to pay that fee for a new Windows system as well.

Conclusion

I hope I have stirred your interest in Linux Pacemaker based clusters.  I have shown a file server upgrade that pays for itself by reducing downtime.  You also upgrade your file server’s performance while reducing out of pocket expenses for software and hardware maintenance.  Not a bad deal.

One way to migrate Xen virtual machines to KVM in Debian

Thursday, November 18th, 2010

By Elizabeth Krumbach

There are dozens of virtualization products on the market. When we launched our first high-availability cluster in early 2008 we chose Xen due to it’s ability to run on non-virtualized hardware, support in Debian 4.0 (Etch) and general flexibility. We’ve learned a lot about the upstream of Xen, including the challenges that Debian maintainers face, and we were increasingly drawn to another free and open source virtualization technology, Kernel Based Virtual Machine (KVM). The primary downside to KVM is that it requires special CPU hardware support to run, but this hardware support is now almost ubiquitous on modern servers. KVM has the advantage of being supported upstream in the Linux kernel itself, removing the onus of difficult kernel patching from the Debian Developers and has become the supported virtualization option for Ubuntu, Fedora and Red Hat. Additionally, KVM allows the guests to run unaltered, meaning you don’t need a special kernel and can run many OSes, from Linux to FreeBSD to Windows 7.

We still work on a number of machines which lack hardware virtualization support, but as our customers upgrade hardware we’ve begun moving production virtual machines from Xen to KVM. In tackling the migrations of these production virtual machines we encountered several challenges, the major ones being:

  • In Xen, partitions were created in separate logical volumes on the host and mounted by Xen itself and as a result we didn’t require Logical Volume Manager (LVM) within our Xen guests, in KVM the logical volume on the host for a virtual machine is a single disk image, not separate partitions.
  • In Xen, the kernel package is not installed, in KVM it is required
  • In Xen, you don’t have an independent bootloader on the OS, in KVM you need one to boot

The first step was to create a partition table on the new KVM image which is identical to the one on Xen. We wanted to use LVM within the guest, which required a Matryoshka (Russian) doll approach. First we’d create a volume group on the host to give us the typical flexibility of LVM host-side, and then we’d create one on the disk image giving us the flexibility required within the guest itself to expand any partitions. Finally we’d need to bootstrap the new system and copy the files over.

One way to go about all of this is a manual process. This solution would allow for scripting the procedure but requires a significant investment of time to get everything right so there is the least amount of down time possible. Since we only have a half dozen of these VMs to migrate in total, we looked for some way which already existed for handling all these steps in a familiar way, which is when we looked to the Debian Installer. Assuming a local mirror of core Debian packages (recommended), we could install a skeleton system on a test IP address which was properly partitioned, had LVM configured and bootstrapped in less than 20 minutes. We could then take this skeleton system that we’re sure is an functioning, bootable system and move the files over from the Xen installs, arguably decreasing our risk with each migration and the downtime required.

To get started, you’ll first need to calculate the total size of the current VM and lvcreate a slice of that size on the KVM system, then launch the Debian installer against the image using virt-install, something like:

virt-install --connect qemu:///system -n guest.example.com -r 768 -f /dev/mapper/VolumeGroup-guest.example.com -s 12 -c /var/lib/libvirt/images/debian-506-i386-netinst.iso --vnc --noautoconsole --accelerate --os-type linux --os-variant debianLenny

In this example we assume the debian-506-i386-netinst.iso is in /var/lib/libvirt/images/ and we want 768M of RAM, the information you put here is similar to the information that you would have previously defined in your /etc/xen/guest.example.com.cfg file for Xen.

Then use virt-manager to connect (we actually connected from a remote desktop running virt-manager) to the running install session (the standard Debian installer does not provide serial access) and install Debian, you will need the root password to launch the installer. Proceed to install.

When you get to the step to partition the disks, lay out the partition table to be identical to the VM you want to migrate to it except put it on LVM, put /boot on a separate partition outside of LVM. Complete the install, including installing grub.

Confirm that the system will boot and works on a test IP address and make a copy of your /etc/fstab to the host system, you’ll need this later.

You now have a skeleton install of Debian which runs on KVM with the LVM partitions you need.

To begin the actual data migration, you’ll want to mount the volumes within your new KVM disk, this can be done with the help of a great little mapping tool called kpartx. To map and activate the volume group on the guest, following these steps:

kpartx -av /dev/VolumeGroup/guest.example.com
vgscan
vgchange -ay guest

In this example we assume the Linux Volume on the host is called “guest.example.com” and the Linux Volume within the guest is called “guest”.

Now that the host can see the Volume Group on the guest, and all the Volumes in /dev/mapper/ you’ll want to mount them.

Once mounted, you’ll be able to start your rsync. To incur the least amount of downtime, you’ll want to rsync the large data partitions (like /srv, /home, /var, perhaps /usr) while your production host is still running. Note: All rsyncs completed during this process must be done with the “–numeric-ids” option so the permissions are not inherited from the host!

While you’re rsyncing the data, you’ll want to go into your Xen system and install the following packages (installing these will no impact your Xen system, it doesn’t strictly use them so it will simply ignore them):

  • linux-image-2.6-686
  • grub
  • lvm2

When you’ve completed moving the largest portions of your Xen guest, bring down the production Xen guest (downtime starts now!) and mount the filesystems. And begin rsyncing the data over (preferably over a crossover cable for the fastest transfer, remember to use –numeric-ids in your rsync).

Once the rsync is complete, edit the following files on the KVM host:

  • /mnt/guest/etc/fstab – use the version you saved to the host in a previous step
  • /mnt/guest/etc/inittab – uncomment the ttyS0 line to allow for serial access from the KVM host for virsh
  • /mnt/guest/etc/udev/rules.d/70-persistent-net.rules – comment out eth0 line so eth0 can come up on the new KVM MAC address

Unmount the filesystems on both sides and on the KVM side disable the volume group and use kpartx again to unmap the filesystem from the host:

sudo vgchange -an guest
sudo kpartx -dv /dev/VolumeGroup/guest.example.com

You’re now ready to boot the VM on the KVM side. This can be done with virt-manager or virsh.

Since you just moved the machines to a new server, and probably new MAC addresses, you will probably need to run the arping command to reclaim the IP address of the VM and all its service IP addresses.

Some things to confirm are working:

  • Networking (confirm there are no lingering arp caching issues)
  • Email (where applicable, confirm system messages etc are being sent)
  • All services running (confirm key services, review monitoring dashboard, log in via ssh)
  • Confirm that you have contiguous logs

Now that we’ve completed one of these migrations we have a lot of ideas about how to improve the process, including the possibility of making the whole process more scriptable, but this quick method leveraging the Debian installer for easier disk configuration and bootstrapping worked very well.

Licensing Considerations When Integrating FOSS and Proprietary Software

Friday, October 29th, 2010

By CJ Fearnley

Recently, I was looking for resources to help explain the implications of integrating FOSS (Free and Open Source Software) with proprietary software. This question is important for any organization who might want to build an “embedded” or “dedicated” system or product which might include either their own proprietary software or third party applications. I discovered that most of the information available about FOSS licensing addresses this issue rather obliquely. This post will cover the basics so that such organizations can see how straightforward it would be to include FOSS in their projects. Standard disclaimers about my not being a lawyer apply.

Use of any software system requires understanding the software licensing involved. There are many dozens of FOSS licenses that could apply in your situation, so understanding the details is necessary to assess compatibility. In broad terms, these can be effectively summarized by the Debian Free Software Guidelines and The Open Source Definition. Any software that complies with those specifications will freely (in both the “freedom” and money senses of the word) permit running proprietary and FOSS applications on the same system. This is easier said than done. Since Debian has analyzed most common FOSS and quasi-FOSS licenses, their archive and their license page can be used to assess how the license might work in your situation (for example, anything in “main” would be OK for co-distribution and running in mixed environments). So we always start with Debian’s meticulous and well-documented analysis to assess any license.

From a high-level perspective, the requirements that FOSS licensing will impose on an organization wanting to include it with proprietary software will primarily be in the form of providing appropriate attribution (acknowledgement) and providing the source code for any FOSS software (including any modifications) shipped as part of an integrated system. Typically the attribution requirement can be satisfied by simply referencing the source code of the FOSS components. The source code requirement can be met by providing the source code for all of the FOSS distributed as part of the integrated system, for example, by placing it on a documented ftp site.

There are some subtle issues that may arise during the software development process if proprietary and FOSS code are “linked” together. In such cases it is necessary to ensure that code over which one wants to assert proprietary control is only combined with FOSS code that explicitly supports such commingling. So it is necessary for a company wanting to keep its code proprietary to keep it “separate” from the FOSS code used in the integrated system. The use of the words “linked” and “separate” is intentionally vague because the terms of each FOSS license will need to be examined by legal counsel to understand the precise requirements. In these situations, there are license compliance management systems that can be adopted to help ensure that this separation is maintained.

Those are the basic issues. The references below describe these and related issues involved with Linux & FOSS licensing in much more depth:

Please Document the Shop: On the importance of good systems documentation

Friday, May 21st, 2010

By Laird Hariu

We have all heard this: You need to document the computer infrastructure. You never know when you might be “hit by a bus”. We hear this and think many frightening things, reassure ourselves that it will never happen and then put the request on the back burner. In this article I will expand on the phrase “hit by a bus” and then look at the consequences.

Things do happen to prevent people from coming into work. The boss calls home. Talks to the wife and makes the sad discovery that Mike wont be coming in anymore. He passed away last night in bed. People get sudden illnesses that disable them. Car accidents happen.

More often than these tragedies occur, thank goodness, business conditions change without warning. In reorganizations whole departments disappear, computer rooms are consolidated and moved, companies are bought and whole workforces replaced. I have had the unhappy experience of living through some of this.

Some organizations have highly transient workforces because of the environment that they operate in. Companies located near universities benefit from an influx of eager young, upwardly mobile university graduates. These workers are eager to gain experience but soon find higher paying jobs in the “real world” further away from campus. These companies have real turnover problems. People are moving up so quickly, they don’t have time to write things down.

Even when you keep people in place and maintain a fairly stable environment, people discover that what they have documented in their heads can just fade away. This is getting to be more and more of an issue. Networks and servers and other such infrastructure functions have been around for 20 years in many organizations. Fred the maintainer retired five years ago. Fred the maintainer was transferred to sales. The longer systems are around, the more things can happen to Fred. Fred might be right where he was 20 years ago. He just can’t remember what he did.

What does all this mean? What are the consequences of losing organizational knowledge in a computer organization? To be blunt, it creates a hideous environment for your computer people. The system is a black box to them. They are paralyzed. They are rightfully afraid. Every small move they make can bring down the system in ways they cannot predict. Newcomers take much longer to train. Old-timers learn to survive by looking busy while doing nothing. The politics of the shop and the whole company is made bloody by the various interpretations of the folklore of the black box. He/she who waves their arms hardest rules the day. This is no way for your people to live.

This is no way for the computer infrastructure to live as well. While the games are played the infrastructure evolves more slowly and slowly. Before long the infrastructure is frozen. Nobody dares to touch it. The only way to fix it is to completely replace it at considerable expense. In elaborate infrastructures this is easier said than done. The productive lifetime of the platform is shortened. It was not allowed to grow and evolve to lengthen its lifetime. Think of the Hubble Telescope without all the repairs and enhancements over the years. It would have burned out in re-entry long ago.

Having made my case, I ask again; for your own good, please document the shop. Make these documents public and make them accurate. Record what actually is rather than what you wish it to be. It is better to be a little embarrassed for a short while than to be mislead later on. Update the documentation when changes occur. An out of date document can be as bad as no document at all. Make an effort to record facts. At the same time don’t leave out general philosophies that guided the design and other qualitative information because it helps your successors interpret the facts when ambiguities occur.

Think of what you leave behind. Persuade your boss to make this a priority as well. Hopefully the people at your next workplace will do the same.

The Nature and Importance of Source Code and Learning Programming with Python

Thursday, February 25th, 2010

By CJ Fearnley

Last year a client asked us for advice on getting started with programming. So I thought I’d share some thoughts about programming, its relationship with FOSS (Free and Open Source Software) management and why Python is a good language for learning programming including some great on-line resources. But first I want to make sure our business-oriented readers understand the nature and importance of source code.

The “source” aka “the code” provides a language in which computer users can create or change software. One does not have to be a programmer to work on the code. In fact, every computer user is, ipso facto, a programmer! Menus, web interfaces, and graphical user interfaces (GUIs) are some of the more facile “languages” for computer programming that everyone, even children, can readily learn and use. Of course, building complex software systems requires a more expressive specification language than a web form, for instance, can provide.

Although all computer software is specified with source code, FOSS systems are unique in that the source code is made available with the software. In contradistinction, software lock-in or vendor lock-in describes the unfortunately all too common practice of many organizations to block access to their source code.

Having access to the source code provides huge operational benefits. For one, the source can be used to understand how the software works: it is a form of software documentation (indeed, it is the most definitive form of software documentation possible!). Also, code can be easily changed to add diagnostics or to test a possible solution to a problem or to modify or add functionality. In addition, the source is a language both for specifying features to the computer and for discussing computing with others. So most mature FOSS languages have vibrant support communities in which one can participate, learn and get help.

The source is a tool: a powerful, multi-purpose, critically important tool.

Since LinuxForce focuses on FOSS, we are able to take full advantage of the availability of the code. We are always working with the source! Since most of our work is systems administration, we usually “program” configuration files. However, we also write systems software and scripts and we support software developers extensively, so we have a persistent, deep, and productive relationship with code.

But what to suggest to someone like our customer who wants to learn programming?

I remembered seeing a blurb in Linux Journal referencing an article they published in May 2000 by Eric Raymond entitled "Why Python" which argues persuasively for the virtues of the programming language Python. I had often felt that Perl‘s idiosyncrasies made it difficult to use, so Eric’s critique of Perl and accolades for Python were convincing to me. In addition, I follow FOSS mathematics software and I was aware that Sage is a Python “glue” to more than fifty FOSS math libraries. I’ve been meaning to look into Python so I could use Sage. Another pull comes from my work at LinuxForce where we use a lot of Python-based software including mailman, fail2ban, Plone, and several tools used for virtual machine management such as kvm, virtinst and xen-tools. Python has a huge software repository and community. So one is likely to find good libraries to build upon (thus avoiding the extra learning curve of building everything from scratch). Python is an interpreted language which makes it easier to debug and use so the learning process is smoother.

To finish the recommendation, I just needed to find some on-line resources. First, Kirby Urner suggested these two: Wikieducator’s Python Tutorials and "Mathematics for the Digital Age and Programming in Python". Then, I checked out the Massachusetts Institute of Technology’s (MIT) OpenCourseWare which provides extensive course materials for many of their classes (I’ve already watched the full video set for a couple of MIT’s courses including the legendary Walter Lewin’s "Classical Mechanics" and have been very impressed by the quality and content of their materials). After nearly 30 years of introducing students to programming with Scheme, MIT switched to Python in 2008! The materials for their introductory Python-based course "6.00 Introduction to Computer Science and Programming" are very thorough, accessible and helpful. Their free on-line materials include the full video lectures of the class plus assignments, sample test problems, class handouts, and an excellent Readings section with references to "the Python Tutorial" and a very good free on-line textbook "How to Think Like a Computer Scientist: Learning with Python".

In conclusion, if you or anyone you know wants to learn how to program computers, I recommend starting with Python using MIT’s on-line course materials supplemented with the other on-line resources mentioned above (and summarized in the table below). I’ve now watched more than half of the videos from the MIT 6.00 course and I’ve worked through several of their assignments: this is a great course! Even with nearly three decades experience programming including a couple of college-level courses in the 1980s, I’m finding the class is more than just good review for me: I’ve learned a few new things (in particular, dynamic programming and the knapsack problem). Python’s clean syntax and elegant design will help as one delves into writing code for the first time. Its extensive libraries and repositories will support the application of one’s newly acquired computing skills to solve problems in the area of the student’s special interests whatever they may be … and that’s the way we learn best: by doing something that we personally care about!

Summary of On-Line Resources for Learning Python

Some thoughts on best practices for SMTP blocking of e-mail spam

Thursday, January 21st, 2010

By CJ Fearnley

Blocking e-mail spam at the time of SMTP (Simple Mail Transfer Protocol) transfer has become a best practice. There is no point wasting precious bandwidth & disk space and spending time browsing a huge spambox when most of the incoming flow is clearly spam. At LinuxForce our e-mail hygiene service, LinuxForceMail, makes extensive use of SMTP blocking techniques (using free and open source software such as Exim, Clam AV, SpamAssassin and Policyd-weight). But we are extremely careful to only block sites and e-mails that are so “spammy” that we are justified in blocking it. That doesn’t prevent false positives, but it keeps them to a minimum.

Recently we investigated an incident where one of our users had their e-mail blocked by another company’s anti-spam system. In investigating the problem, we learned that some vendors support an option to block e-mail whose Received header is on a blacklist (in our case it was Barracuda, but other vendors are also guilty). Let me be blunt: this is boneheaded, but the reason is subtle so I can understand how the mistake might be made.

First, blocking senders appearing on a blacklist at SMTP time is good practice. But to understand why blocking Received headers at SMTP time is bad, it is important to understand how e-mail transport works. The sending system opens a TCP/IP connection from a particular IP address. That IP address should be checked against blacklists. And other tests on the envelope can help identify spam. But the message headers including the Received header are not so definite. We shall see that even a blacklisted IP in these headers may be legitimate. So blocking such e-mail incurs unnecessary risks.

The problem occurs when a user of an ISP (Internet Service Provider) sends an e-mail from home, they are typically using a transient, “dynamic” IP address. Indeed it is possible that their IP address has just changed. Since the new address may have been previously used by someone infected with a virus sending out spam, this “new” IP address may be on the blacklists. So, due to no fault of your own, you have a blacklisted IP address (I will suppress my urge to rant for IPv6 when everyone can finally have their own IP address and be responsible for its security).

Now, when you send an e-mail through your ISP’s mail server, it records your (blacklisted) IP as the first Received header. So your (presumably secure) system sending a legitimate message through your ISP’s legitimate, authenticating mail server is blacklisted by your recipients’ overambitious anti-spam system. Ouch. That is why blocking such an e-mail is just wrong. This kind of blocking creates annoying unnecessary complications for the users and admins at both sides. Using e-mail filtering to put such e-mails into a spam folder would be a reasonable way to handle the situation. Filtering is able to handle false positives whereas blocking generates unrecoverable errors.

Do not block e-mail based on the Received header!

Given 250,000 tools on the shelf, how do you manage them?

Monday, December 21st, 2009

By CJ Fearnley

Although I haven’t seen a thoroughly researched study, I figure there must be at least 250,000 FOSS (Free and Open Source Software) tools available to every systems administrator on the planet (230,000 at SourceForge + 15,000 at Launchpad + 12,000 at CodePlex + 5,000 at Google Code and that doesn’t count the Linux kernel or any of the myriad other self-hosted projects). These 250,000+ resources comprise the full “toolbox” that admins can use for building solutions with FOSS; they represent the FOSS equivalent of COTS (Commercial Off-The-Shelf). Of course, if you add open source but non-free or commercial tools, the problem explodes combinatorially.

How can a systems administrator support the largest possible subset of these “on the shelf” resources to best service the next need from a stakeholder (like the boss or a new client)?

First let me emphasize the difficulty of the task with a list of items that systems administrators and systems management firms like LinuxForce are expected to do whenever a stakeholder presents a software need:

  • Find and Evaluate software that can meet the need:
    • Identify several candidate applications that might meet the business requirements for a given project, function, or need
    • Research the options to assess their ability to meet the requirements (actually we, the systems administrators of the world, are actually expected to know which tool is “best of breed”: just from our past experience. The false assumption is, if it isn’t well known it must not be any good. The long tail applies to the 250,000+ FOSS tools also!). In our experience such research is essential, unfortunately, there is rarely enough budget to carefully explore the options.
    • Install the tool(s) in a “sandbox” to allow the stakeholder to “try it out”
    • Select a tool to use or look for more options
  • Put the tool into production
    • Read the docs to identify best practices for the software’s configuration
    • Prepare an installation plan that will address (as best as possible) any upgrade glitches (yes, you have to anticipate them now or suffer the consequences later!) so that you’re prepared for when a security advisory is released (or when the stakeholder starts begging for features from a new release)
    • Figure out a support plan to handle the inevitable questions that will arise during operations
    • Integrate these considerations into the process of either installing a package or using the “make, configure, make install” steps that most FOSS tools provide for installation
    • Carefully document the “as built” configuration including all assumptions and anticipated glitches to help yourself or future admins during the maintenance phase
  • On-Going Maintenance
    • Monitor the software
    • Subscribe to any relevant security mailing lists for the software so that you are apprised when a security (or other major) problem is detected
    • Track general trends relating to the software and its alternatives so that you are ready to respond if the project goes dormant or is eclipsed by newer, superior technology.
    • Upgrade routinely

About 15 years ago I noticed that the explosion of ready to use FOSS tools plus the trend toward general purpose tools and away from custom software was leading to a combinatorial crisis in software maintenance. I saw that it was the systems administrator’s responsibility to address the situation.

It has become apparent to me that the solution would require use of convention, standards and policy to reduce the complexity of the problem to manageable proportions. I searched for the most “standardized” conventions and policy-enforcing environment that would also provide the most flexible access to the most FOSS tools. The solution I found is Debian/GNU Linux, the universal operating system (although Ubuntu and other Debian derivatives also provide most of these benefits as well).

Debian simplifies the software evaluation process (apt-get [search|show]). Debian simplifies installation (apt-get install), security and new version upgrades (apt-get [upgrade|dist-upgrade]). Debian uses conventions and packages to simplify identifying best practices for administering the software (/usr/share/doc/[package]/, /var/lib/dpkg/info/[package].postinst, and wikis, mailings lists, bug reports, etc.). But the key benefit for managing the combinatorial explosion of FOSS tools is the Debian community’s value of striving to configure each package to automatically support the most common use cases while also providing support for unusual configurations (so you save tons of time in configuring the software).

In summary, the Debian/GNU Linux system provides the infrastructure needed to manage the combinatorial explosion of off the shelf FOSS tools cost effectively. If you have to service a lot of users, customers, or clients with challenging, diverse needs, I think Debian is the most cost effective way to meet their needs and deliver quality maintenance on an on-going basis year after year after year.

A FOSS Perspective On Richard Schaeffer’s Three Tactics For Computer Security

Friday, December 11th, 2009

By CJ Fearnley

Federal Computer Week published a great, succinct quote from Richard Schaeffer Jr., the NSA’s (National Security Agency) information assurance director, on three approaches that are effective in protecting systems from security attacks:

We believe that if one institutes best practices, proper configurations
[and] good network monitoring that a system ought to be able to
withstand about 80 percent of the commonly known attack mechanisms
against systems today, Schaeffer said in his testimony. You can
actually harden your network environment to raise the bar such that
the adversary has to resort to much, much more sophisticated means,
thereby raising the risk of detection.”

Taking Schaeffer’s three tactics as our lead, here is a FOSS perspective on these protection mechanisms:

Best practices implies community effort: discussing, sharing and collectively building understanding and techniques for managing systems and their software components. FOSS (Free and Open Source Software) communities develop, discuss and share these best practices in their project support and development forums. Debian’s package management system implements some of these best practices in the operating system itself thereby allowing users who do not participate in the development and support communities to realize the benefits of best practices without understanding or even knowing that they exist. This is one of the important benefits of policy- and package-based operating systems like Debian and Ubuntu.

Proper configuration is the tactical implementation of best practices. Audit is a critical element here. Debian packages can use their postinst scripts (which are run after a package is installed, upgraded, or re-installed) to audit and sometimes even automatically fix configuration problems. Right now, attentive, diligent systems administrators, i.e., human beings, are required to ensure proper configuration as no vendor — not even Debian — has managed to automate the validation let alone automatically fix bad configurations. I think this is an area where the FOSS community can lead by considering and adopting innovations for ensuring proper configuration of software.

Good network monitoring invokes the discipline of knowing what services are running and investigating when service interruptions occur. Monitoring can contribute to configuration auditing and can help focus one’s efforts on any best practices that should be considered. That is, monitoring helps by engaging critical thinking and building a tactile awareness of the network — what it does and what is exposed to the activities of a frequently malicious Internet. So, like proper configurations, monitoring requires diligent, attentive systems administrators to maintain security. LinuxForce’s Remote Responder services builds best practices around three essential FOSS tools for good network monitoring: Nagios, Munin, and Logcheck.

Crossroads in FOSS Projects: Some Business Considerations

Tuesday, November 24th, 2009

By CJ Fearnley

At our Seminar last month, Managing FOSS to Lower Costs and Achieve Business Results, several participants asked about the dynamics of FOSS (Free and Open Source Software) projects that reach a crossroads (a failure, a merger, loss of key personnel, etc). I had not expected that concern because with commercial software, it seems to me, the problem is more severe. When you have the source code and the right to modify and redistribute it, the source gives many more options (and its freedoms provide many more protections) than when commercial software goes bankrupt or gets bought by a competitor for instance.

But the reason for the questions may be a lack of understanding about how FOSS projects work. They involve individual human beings, perhaps just a single person or, more likely, several people from many organizations and even different cultures around the world joined in common purpose. For various cultural reasons, the project may be “owned” by an entity — usually a non-profit, but some are for-profit or even government owned, while others may simply be an “ad hoc initiative”. Some projects have explicit constitutions and defined processes for organizing the work and handling problems others are more informal.

At any time, any human social structure can experience a crossroads that could lead it to fail suddenly or wither on the vine in a gradual descent into “oblivion”. The cause of the failure will shape the results, but a very common situation is that conflicting visions or approaches for the project result in a “fork”. Then a sub-group of the original project takes the source code and starts a “new” project to develop the code in a new direction. Sometimes the original project “dies” and sometimes both continue resulting in two projects. Since multiple FOSS projects serving the same function or market incur inefficiencies due to duplicate development, there is a strong cultural value in the FOSS world to try to find a way to accommodate everyone in the project and prevent forks. When it works, the result is great software that meets everyone’s needs. But the reality is that often it is more effective to have multiple implementations of the same functionality so that each can be optimized for distinct objectives. Frequently one cannot know which approach will be best until many years of development and evaluation have transpired.

I recently learned about a FOSS project that forked when a friend asked me to copy some files to his new “My Book Essential”, a Western Digital product that provides 1TB of USB (Universal Serial Bus) storage. The My Book uses the poorly documented, non-free NTFS (New Technology File System). Linux has three projects that support NTFS: an in kernel driver, ntfsprogs (the Linux-NTFS project), and NTFS-3G. It turns out that all three were available for my Debian Lenny (5.0.3) system. First, I tried the in kernel support and learned that it was still read-only. Then I tried ntfsprogs which failed to mount the My Book:

NTFS-fs error (device sdc1): load_system_files(): Volume is dirty. Mounting read-only. Run chkdsk and mount in Windows.

I realized that since it was a new device it probably did not ship from the factory with a dirty volume. It was probably a bug. So I tried NTFS-3G which worked very well. In my research of the situation I was able to determine that both NTFS-3G and Linux-NTFS are under active development and have features missing from the other. So each has value and I’m glad my distribution included both. In Debian Lenny, the NTFS-3G driver has better support for writing files.

This illustrates one of the benefits of a crossroads in a FOSS project: you can end up with two good tools to add to your toolbox!