Debian

Debian-Powered Drupal Configuration Policy

LinuxForce’s web hosting services are designed to provide our customers with the benefits of strong security, simple on-going administration and maintenance, and support for the web software to work well in a large number of situations including multi-site support for applications.

We achieve these objectives by operating a Debian infrastructure with close adherence to Debian’s policy, web application policy and PHP policy.

In particular, we use the Debian package for Drupal 6. This provides community supported upgrades with a strong, well-documented policy. Like other software which ships in Debian, the software version is often somewhat older than the most recent upstream release, but it is regularly patched for security by the maintainer and the Debian security team.

In addition, this infrastructure also offers:

  • Only one package needs to be upgraded whenever there is a security patch, instead of every site individually
  • Strong separation from site specific themes, modules, libraries, uploaded files helps keep files you want to edit away from the Drupal core files (which you don’t want to edit)
  • The automation the infrastructure provides disk, memory, and sysadmin time to be minimized, thus reducing costs
  • The benefits of code maturity, as the Debian Drupal maintainers have thought through many boundary cases which it would take our staff time and trial-and-error to re-discover

Site layout

Experienced Drupal admins may find some of the file locations for the package confusing at first, so let’s clarify the differences to ease the transition.

Our infrastructure supports multiple sites per server. This is implemented by providing each site with their own dbconfig.php and settings.php files, plus the following directories:

  • files/
  • themes/
  • modules/
  • libraries/

All of these are configurable by the user, and are located in /etc/drupal/6/sites/drupal.example.com/. The site also inherits the contents of these default directories from core Drupal.

Access to these files via FTP is discussed below.

Core Drupal files are located in /usr/share/drupal6/ and are shared between all the Drupal sites on the server. They should not be edited since all changes to these files will be lost upon upgrade of the Debian Drupal package.

Users & Permissions

A jailed userdrupal FTP account is created to manage the files for the Drupal install.

The userdrupal account is the owner of all the configurable files located in /etc/drupal/6/sites/drupal.example.com/

The system-wide www-data user must have access to the following by having the www-data group own them and be given the appropriate permissions:

  • Read access to dbconfig.php
  • Write access to the files/ directory (this is where Drupal typically stores uploaded files)

All files (with the exception of dbconfig.php) should be writable by userdrupal, the userdrupal group itself is enforced by the system but all users should be configured client-side to respect group read-write permissions to maintain strict security, this is a umask of 002.

Additional Users and non-Drupal Directories

If there is a non-developer who needs to have access to the files directory, for instance, a specific FTP user for that use may be created and added to the userdrupal group.

For ease of administration, if multiple users exist and we are able to support an ssh account (static IP from client required) for handling administration, sudo can be configured to allow said user to execute commands on behalf of the userdrupal ftp user. Remember to add “umask 002” to the .bashrc to respect group read-write permissions.

If additional directories outside the Drupal infrastructure are required for a site, they will be placed in /srv/www/example.com/ and a separate jailed ftp user created to manage the files here. If a cgi-bin is required, it will be placed in /srv/www/example.com/cgi-bin/

Caveats and Cautions

Because of shared use for core Drupal files located in /usr/share/drupal6/ you may experience problems with the following:

  • Some drush commands and plugins assume the drupal files are editable files within the same document root as the rest of your files, thus may not work as expected
  • When you upgrade the drupal6 package all sites are upgraded at once without testing, customers who are concerned about the impact of the upgrade changes and require very high uptimes can be accommodated by testing the site with the upgraded versions of PHP and Drupal in a testing VM
  • Since the Debian package name does not change, you cannot install the drupal6 package from an older Debian version alongside one from a current version, a separate virtualized Debian environment may be needed for testing upgrades if support is uncertain (this issue does not exist with upgrades from drupal6 to drupal7, as they can be installed alongside each other)
  • A policy for handling root level .htaccess files should be developed if they wish to be used

Conclusion

Although the Debian way differs from the Drupal tarball approach, it makes it possible to scale the service to many sites saving disk, memory, and sysadmin effort. By leveraging this Drupal infrastructure provided by Debian, Linuxforce provides one-off Debian package deployments to dedicated systems, shared arrangements for small businesses who are running several sites, and infrastructure deployments for businesses who provide hosting services. We also offer a boutique hosting service for select customers on one of our systems.

Posted by Elizabeth Krumbach in Debian, Eternally Regenerative Software Administration, Hosting, Systems Management, 4 comments

File Servers – The Business Case for High Availability

Introduction

You have probably heard of high availability transaction processing servers.  You have most likely read about the sophisticated systems used by the airlines to sell tickets online.  They have to be non-stop because downtime translates to lost orders and revenue.  In this article I will discuss the economics of using non-stop technologies for everyday applications.  I will show that even ordinary file sharing applications can benefit from inexpensive Linux based Pacemaker clustering technology.

Availability Goal

What is our availability goal?  Our goal should be to take prudent and cost effective measures to reduce computer downtime to nil in the required service window.  I’m not talking about 99.999 % (five 9s) up time.  This is the popular (and very expensive) claim made by high availability vendors.  I’m talking about maintaining enough up time to service the application.  Take a simple example, for office document preparation the service time window is office hours (9-5).  The rest of the time the desktop PCs can be turned off, nobody is there to operate them anyway.  You only need the PCs for 5 days a week for 8 hours a day or for 2080 hours per desktop PC per year.  This translates into an up time requirement of 24 percent.   Ideally you want the desktop PCs to be available all the time during office hours but are willing to give up availability for routine maintenance and for the infrequent breakdowns that may occur only once per workstation every five years or so.  Perhaps you have two spare desktop PC workstations for every 100.  This extra capacity allows your office workers to resume their work on a spare while their workstation is being repaired.  In this example the cost of maintaining adequate availability is the cost of maintaining two spare desktop PCs.  You might adjust this cost to account for real world conditions at the work site.  Wide swings in operating temperature or poor quality electricity supply, might dictate that you increase the number of spare PCs.  Sounds like a low stress, straightforward availability solution.

Network Effects

The problem gets more complicated when the desktop PCs are networked together and all the documents are stored on a central file server rather than on each workstation’s hard drive.  There is a multiplicative effect.  If the file server is not available then all 100 document processing PCs are rendered unavailable.  Then you have 98 (remember the 2 spares from above) workers being paid but not producing documents.  A failure during office hours can become expensive.  One hour of downtime can cost as much as $1500 in lost worker wages.  A day of downtime can cost $12,000 of lost worker wages.  How long will it take for a hardware repair person to travel to your site?  How long will it take for spare parts to arrive?  How long will it take the repair person to replace the parts?  How long will it take for damaged files to be replaced from backup by your own people?  A serious but not unlikely failure can take several days to be completely resolved.  Its not unreasonable to assume that such a $24,000 failure can occur once every 5 years.  This is a very simple example.  We are not talking about a complicated order-entry or inventory control system.  We are talking about 98 office workers saving files to a central file share so that they can be indexed and backed up.

The Effects of Time

I’m going to add another wrinkle to our office document processing example.  This file sharing setup has been in use for 4 years.  Time flies.  The hardware is getting old faster than you realize.  Old hardware is more likely to fail.  It has been through more thunderstorms, more A/C breakdowns, people knocking the server by accident and all that.  You’ve been noticing that your hardware maintenance plan is costing more every year.  How long is the hardware vendor going to stock spare parts for your obsolete office equipment?  Please forgive me for playing on your paranoia but the real world can be rude.

Time for an Upgrade

In this scenario you conclude that you are going to have to replace that file server soon.  Its going to be a pain to migrate all the files to a new unit.  I am going to have to upgrade to a new version of Windows server.  How much is that going to cost?  How much has Windows changed?  If I am going to have to go to all this trouble, why not get some new improvement out of it.  I know I can get bigger disks and more RAM (random access memory) for less money than I paid for the old server.  Whoops.  Windows is going to cost more.  I have to pay a charge for every workstation attached to it.  That CAL (Client Access License) price has gone up.  I read something about high availability clustering in Windows.  Enterprise Server does that.  Wow.  Look at the price of that!  Remember that $12,000 per day of downtime cost overhang?  It’s more of an issue now that you are dealing with an old system.

A Debian Cluster Solution

Enough of this already.  Since I asked so many questions and raised so many doubts, I owe you, the reader, some answers.  Debian Linux provides a very nice high availability solution for file servers.  You need two servers with directly attached storage and also a third little server that can be little more than a glorified workstation.  You need Debian Squeeze 64 bit edition that has the Pacemaker, Corosync, drbd and Samba packages installed for each server.  The software is free.  You pay for the hardware and a trustworthy Linux consultant who can set everything up for you.  What you get is a fully redundant quorum cluster with fully redundant storage, multiple CPU cores on each node, much more RAM than you had before and much more storage capacity.

Here are hardware price estimates:

Tie breaker node: Two hard drives, 512MB RAM $500.00
Name brand file server node: 8 2TB SATA drives, 24GB RAM, 1 4 core CPU chip,  3 year on site parts and labor warranty. $6,000.00
Second file server node like above. $6,000.00
Misc parts for storage and control networks. $200.00
Total: $12,700.00

Each file server node has software RAID 5 and each node holds 14 terabytes of disk storage.  Because it is completely redundant across nodes, total cluster storage capacity is 14 terabytes.  Performance of this unit will be much better than the old unit.  It effectively has 4 CPUs per file storage node and much more RAM for file buffering.  Software updates from Debian are free.  You just need someone to apply the security patches and version upgrades.

The best feature is complete redundancy for file processing.  In our file server example, any one of the nodes can completely fail and file server processing will continue.  Based on the lost labor time cost estimates above, this system pays for itself if it eliminates 1 day of downtime in a five year period.  You also have hardware maintenance savings of whatever the yearly charge is for your old system times 3 years because you get 3 years of warranty coverage on the new hardware.  You have the consultant’s charges for converting to the new system, but remember, you were going to have to pay that fee for a new Windows system as well.

Conclusion

I hope I have stirred your interest in Linux Pacemaker based clusters.  I have shown a file server upgrade that pays for itself by reducing downtime.  You also upgrade your file server’s performance while reducing out of pocket expenses for software and hardware maintenance.  Not a bad deal.

Posted by Laird Hariu in Debian, Eternally Regenerative Software Administration, Systems Management, Virtualization, 0 comments

One way to migrate Xen virtual machines to KVM in Debian

There are dozens of virtualization products on the market. When we launched our first high-availability cluster in early 2008 we chose Xen due to it’s ability to run on non-virtualized hardware, support in Debian 4.0 (Etch) and general flexibility. We’ve learned a lot about the upstream of Xen, including the challenges that Debian maintainers face, and we were increasingly drawn to another free and open source virtualization technology, Kernel Based Virtual Machine (KVM). The primary downside to KVM is that it requires special CPU hardware support to run, but this hardware support is now almost ubiquitous on modern servers. KVM has the advantage of being supported upstream in the Linux kernel itself, removing the onus of difficult kernel patching from the Debian Developers and has become the supported virtualization option for Ubuntu, Fedora and Red Hat. Additionally, KVM allows the guests to run unaltered, meaning you don’t need a special kernel and can run many OSes, from Linux to FreeBSD to Windows 7.

We still work on a number of machines which lack hardware virtualization support, but as our customers upgrade hardware we’ve begun moving production virtual machines from Xen to KVM. In tackling the migrations of these production virtual machines we encountered several challenges, the major ones being:

  • In Xen, partitions were created in separate logical volumes on the host and mounted by Xen itself and as a result we didn’t require Logical Volume Manager (LVM) within our Xen guests, in KVM the logical volume on the host for a virtual machine is a single disk image, not separate partitions.
  • In Xen, the kernel package is not installed, in KVM it is required
  • In Xen, you don’t have an independent bootloader on the OS, in KVM you need one to boot

The first step was to create a partition table on the new KVM image which is identical to the one on Xen. We wanted to use LVM within the guest, which required a Matryoshka (Russian) doll approach. First we’d create a volume group on the host to give us the typical flexibility of LVM host-side, and then we’d create one on the disk image giving us the flexibility required within the guest itself to expand any partitions. Finally we’d need to bootstrap the new system and copy the files over.

One way to go about all of this is a manual process. This solution would allow for scripting the procedure but requires a significant investment of time to get everything right so there is the least amount of down time possible. Since we only have a half dozen of these VMs to migrate in total, we looked for some way which already existed for handling all these steps in a familiar way, which is when we looked to the Debian Installer. Assuming a local mirror of core Debian packages (recommended), we could install a skeleton system on a test IP address which was properly partitioned, had LVM configured and bootstrapped in less than 20 minutes. We could then take this skeleton system that we’re sure is an functioning, bootable system and move the files over from the Xen installs, arguably decreasing our risk with each migration and the downtime required.

To get started, you’ll first need to calculate the total size of the current VM and lvcreate a slice of that size on the KVM system, then launch the Debian installer against the image using virt-install, something like:

virt-install --connect qemu:///system -n guest.example.com -r 768 -f /dev/mapper/VolumeGroup-guest.example.com -s 12 -c /var/lib/libvirt/images/debian-506-i386-netinst.iso --vnc --noautoconsole --accelerate --os-type linux --os-variant debianLenny

In this example we assume the debian-506-i386-netinst.iso is in /var/lib/libvirt/images/ and we want 768M of RAM, the information you put here is similar to the information that you would have previously defined in your /etc/xen/guest.example.com.cfg file for Xen.

Then use virt-manager to connect (we actually connected from a remote desktop running virt-manager) to the running install session (the standard Debian installer does not provide serial access) and install Debian, you will need the root password to launch the installer. Proceed to install.

When you get to the step to partition the disks, lay out the partition table to be identical to the VM you want to migrate to it except put it on LVM, put /boot on a separate partition outside of LVM. Complete the install, including installing grub.

Confirm that the system will boot and works on a test IP address and make a copy of your /etc/fstab to the host system, you’ll need this later.

You now have a skeleton install of Debian which runs on KVM with the LVM partitions you need.

To begin the actual data migration, you’ll want to mount the volumes within your new KVM disk, this can be done with the help of a great little mapping tool called kpartx. To map and activate the volume group on the guest, following these steps:

kpartx -av /dev/VolumeGroup/guest.example.com
vgscan
vgchange -ay guest

In this example we assume the Linux Volume on the host is called “guest.example.com” and the Linux Volume within the guest is called “guest”.

Now that the host can see the Volume Group on the guest, and all the Volumes in /dev/mapper/ you’ll want to mount them.

Once mounted, you’ll be able to start your rsync. To incur the least amount of downtime, you’ll want to rsync the large data partitions (like /srv, /home, /var, perhaps /usr) while your production host is still running. Note: All rsyncs completed during this process must be done with the “–numeric-ids” option so the permissions are not inherited from the host!

While you’re rsyncing the data, you’ll want to go into your Xen system and install the following packages (installing these will no impact your Xen system, it doesn’t strictly use them so it will simply ignore them):

  • linux-image-2.6-686
  • grub
  • lvm2

When you’ve completed moving the largest portions of your Xen guest, bring down the production Xen guest (downtime starts now!) and mount the filesystems. And begin rsyncing the data over (preferably over a crossover cable for the fastest transfer, remember to use –numeric-ids in your rsync).

Once the rsync is complete, edit the following files on the KVM host:

  • /mnt/guest/etc/fstab – use the version you saved to the host in a previous step
  • /mnt/guest/etc/inittab – uncomment the ttyS0 line to allow for serial access from the KVM host for virsh
  • /mnt/guest/etc/udev/rules.d/70-persistent-net.rules – comment out eth0 line so eth0 can come up on the new KVM MAC address

Unmount the filesystems on both sides and on the KVM side disable the volume group and use kpartx again to unmap the filesystem from the host:

sudo vgchange -an guest
sudo kpartx -dv /dev/VolumeGroup/guest.example.com

You’re now ready to boot the VM on the KVM side. This can be done with virt-manager or virsh.

Since you just moved the machines to a new server, and probably new MAC addresses, you will probably need to run the arping command to reclaim the IP address of the VM and all its service IP addresses.

Some things to confirm are working:

  • Networking (confirm there are no lingering arp caching issues)
  • Email (where applicable, confirm system messages etc are being sent)
  • All services running (confirm key services, review monitoring dashboard, log in via ssh)
  • Confirm that you have contiguous logs

Now that we’ve completed one of these migrations we have a lot of ideas about how to improve the process, including the possibility of making the whole process more scriptable, but this quick method leveraging the Debian installer for easier disk configuration and bootstrapping worked very well.

Posted by Elizabeth Krumbach in Debian, Systems Management, Tech Notes, Virtualization, 0 comments

Some Initiatives Resulting From DebConf10

I attended DebConf10 in early August at Columbia University in New York City. I thought I’d document some of the initiatives that resulted from that event.

I attended the Debian Policy BoF. It inspired me to start reviewing Debian Policy and led to my submission of a bug report to improve the description of the archive areas in Debian. In the BoF (Birds of a Feather) session, Russ Allbery requested help from everyone including non DDs (Debian Developers) to make policy clearer and more descriptive, so I obliged. I see that the negotiated text has been accepted and included in the git repository for the next release.

During the Debian Science sessions that I attended (see especially the video from the Debian Science Round Table), the idea of engaging upstream providers (software projects that are packaged by integrators like Debian) more effectively was discussed. This led me to draft a proposal that I posted to the Debian Project mailing list: Improving coordination / support for upstreams. There was precious little feedback, but I think it is a profoundly important issue: how do we improve coordination of upstreams with projects like Debian that integrate their software. So far there is very little infrastructure or knowledge about this important issue in the management of FOSS (Free and Open Source Software). How do you think we should start to address this problem?

I helped with the herculean task of getting Sage, a FOSS mathematics system, back into Debian. After discussing the issue during the Debian Science track at DebConf10, I met Lucas Nussbaum in the hacklab and he (with help from Luca Falavigna) managed to get the old buggy version of Sage removed from unstable (apparently, this version was causing support issues for the upstream Sage community, so this was a positive step forward). I also submitted five bugs (#592349, #592354, #592425, #592426, and #592429) about new upstream versions that are needed. I posted two detailed reports of work needed on packaging Sage for Debian to appropriate mailing lists. Getting Sage into Debian is the kind of big FOSS management challenge that I’m excited about. But I will need a lot of help to make progress. Let me know if you are interested in contributing to the effort!

Finally, for the record, here are links to the sessions where I participated in the discussion (video is available): Bits from the DPL, SPI BOF, Enterprise Infrastructure BOF, Mathematical Software in Debian, Overall presentation of Debian Science, and Debian Science Round Table.

Posted by CJ Fearnley in Conference, Debian, 0 comments

Attending Debian Day and DebConf10 Next Week

Since I’ve been involved with Debian GNU/Linux for over 15 years, it is exciting that I will be able to attend the first two and a half days of DebConf10 including Debian Day from Sunday to Tuesday August 1–3.

I am particularly looking forward to the following sessions: Pedagogical Freedom: Debian, Free Software, and Education, Beyond Sharing: Open Source Design What are the challenges for the collaborative design process?, FLOSS Manuals: A Vibrant Community for Documentation Development, Bits from the DPL, The Java Packaging Nightmare, Collaboration between Ubuntu and Debian, How We Can Be the Silver Lining of the Cloud, Enterprise Infrastructure BOF How enterprise technologies such as Kerberos, LDAP, Samba, etc can work better together in Debian, Using Debian for Enterprise Infrastructure Stanford University: A Case Study, and more (see the schedule for each day).

I’m also hoping I can also attend on Thursday when the math and science focused sessions will be held, but I’ll have to see how next week’s schedule works out in the office. If you are coming to DebConf, I’ll see you there!

Posted by CJ Fearnley in Conference, Debian, 0 comments

Attending the Linux Foundation Collaboration Summit 2010

On the heels of the 5th Annual Emerging Technologies for the Enterprise Conference (ETE 2010) in Philadelphia that CJ attended last week, I’ll be attending the 4th Annual Linux Foundation Collaboration Summit tomorrow through Friday in San Francisco.

The Linux Foundation Collaboration Summit is an exclusive, invitation-only summit gathering core kernel developers, distribution maintainers, ISVs, end users, system vendors and other community organizations for plenary sessions and workgroup meetings to meet face-to-face to tackle and solve the most pressing issues facing Linux today.

My attendance will be in my capacity as a member of the Ubuntu Community Council as well as my role as a Debian Systems Administrator. As such, my attention will be split at the summit between community and governance interests, like the FOSSBazaar Workgroup and Josh Berkus’ How to Prevent Community: Making Sure Your Pond Stays Small, and talks and panels like Does Open Source Mean Open Cloud? where Ubuntu founder Mark Shuttleworth will be a panelist, and the Linux Standard Base Workgroup and Virtualization discussions.

It’s shaping up to be an exciting summit, if you are also attending be sure to say “Hello”!

Posted by Elizabeth Krumbach in Conference, Debian, FOSS Community, News, Ubuntu, 0 comments

How and why contributing to FOSS can benefit your organization

At first glance, the ecosystem in the Free and Open Source Software (FOSS) world can seem a bit complicated. There are several ways to get software: project websites where you can download it directly, use a software management tool that your Linux distribution provides, or you may also be able to install a Linux distribution that includes everything you need right out of the box! Once you understand this ecosystem, you can find where your contributions would be most useful, and why contributing is beneficial to your organization and the FOSS community.

So, where does this all begin? FOSS often originates with a project which maintains the source code for the software and provides its own development and support infrastructure.

A Linux distribution is a carefully culled collection of software from these upstream projects which makes a complete operating system and even includes a lot of application software. This collection of software is tested and prepared to run securely and maintainably together. Debian is built upon this model.

Some distributions of Linux use Debian as a source project unto itself. There are a number of Linux distributions based on Debian, including the popular KNOPPIX and Ubuntu distributions. Being “based on Debian” can mean several things, but it primarily means they draw from the software repository at some point in the release cycle, and they use the Advanced Packaging Tool (apt) to manage this software. In these cases Debian is an intermediary between the original FOSS project and the “children” distributions which may also pull from original software projects to expand upon what Debian provides to target their particular focus.

So where in this software ecosystem should your organization contribute? Why would your organization choose to contribute to Debian rather than to the original project (“upstream” of Debian) or a project like Ubuntu (“downstream” of Debian)? It really depends on your goals.

If your organization is interested in using FOSS in a way which requires rapid development, new and diverse features released quickly, or specializations that the distribution may not easily support, you will probably want to work directly on the upstream project. Frequently this requires programming experience, but many projects need other kinds of help such as bug reports in the form of feature requests which they may be able to satisfy in later releases. In these cases, contributing to development in these projects directly is the best way to meet your needs in using and building upon the software.

If your organization needs to use FOSS in a stable, maintainable and secure way, you should probably work directly with Debian. The primary duty of most developers within the Debian community is working on the “packages” which make up the operating system: creating, updating, patching, tracking their security and handling bugs, forwarding details and patches to the upstream projects when applicable. This is what maintains the solid, core operating system that makes up not only Debian, but the child distributions which depend on it, and which could not exist without it. By contributing to Debian you’re also contributing to Ubuntu, Knoppix, and dozens more, improving the tool shelf for everyone (related: Given 250,000 tools on the shelf, how do you manage them?). Contributing to Debian also helps the upstream projects, taking the burden off of them to provide installation documents and support on Debian and placing that upon you, plus making their software more readily available to users through a simple search through the Debian repository.

If the target of one of Debian’s children better meets your organization’s needs which cannot be achieved through Debian directly, then by all means contribute directly to it. Child distributions already exist which focus on everything from being an Open Source LiveCD toolbox (like KNOPPIX) to being a polished desktop operating system (like Ubuntu). As an example, even within Ubuntu’s family there are targeted projects, like Edubuntu, focused on education by packaging and shipping a collection of educational software and a project devoted to making your computer a PVR like TiVo called Mythbuntu which works with the MythTV project to easily deliver their software on a platform. Contributing to projects like these also expands the open source ecosystem and may be the preferred method to reach your organization’s goals.

Understanding the way in which these projects and distributions work together and selecting a place in the workflow for your organization to contribute is the first step. But perhaps a more important question is why you’d want to work on a FOSS project instead of doing in-house development. The benefits for the FOSS community are obvious, they will reap the benefits of having your expertise, from having the packages in Debian and beyond, but are there benefits for your organization?

I believe there are big benefits, which include:

  • Peer review of packages and software now and in the future
  • Processes for asking the community for assistance
  • Bug reporting infrastructure, which may include patches submitted by community members
  • Procedures to become informed about security problems and policy changes
  • Free collaborative resources provided for FOSS projects (Alioth for Debian,  SourceForge, LaunchPad or the Apache Foundation, etc) for development, including development mailing lists and hosted revision control systems like git, bazaar, svn.
  • Opportunity to learn key FOSS development strategies and industry “best practices” via freely available documentation, chat rooms, forums and mailing lists

In short, by putting the time in to releasing software, packaging for Debian or work in children distributions, you not only are doing good for the FOSS community, you get to take advantage of the plethora of tools, resources and people available to assist in the development process.

Posted by Elizabeth Krumbach in Debian, FOSS Community, Ubuntu, 0 comments

Given 250,000 tools on the shelf, how do you manage them?

Although I haven’t seen a thoroughly researched study, I figure there must be at least 250,000 FOSS (Free and Open Source Software) tools available to every systems administrator on the planet (230,000 at SourceForge + 15,000 at Launchpad + 12,000 at CodePlex + 5,000 at Google Code and that doesn’t count the Linux kernel or any of the myriad other self-hosted projects). These 250,000+ resources comprise the full “toolbox” that admins can use for building solutions with FOSS; they represent the FOSS equivalent of COTS (Commercial Off-The-Shelf). Of course, if you add open source but non-free or commercial tools, the problem explodes combinatorially.

How can a systems administrator support the largest possible subset of these “on the shelf” resources to best service the next need from a stakeholder (like the boss or a new client)?

First let me emphasize the difficulty of the task with a list of items that systems administrators and systems management firms like LinuxForce are expected to do whenever a stakeholder presents a software need:

  • Find and Evaluate software that can meet the need:
    • Identify several candidate applications that might meet the business requirements for a given project, function, or need
    • Research the options to assess their ability to meet the requirements (actually we, the systems administrators of the world, are actually expected to know which tool is “best of breed”: just from our past experience. The false assumption is, if it isn’t well known it must not be any good. The long tail applies to the 250,000+ FOSS tools also!). In our experience such research is essential, unfortunately, there is rarely enough budget to carefully explore the options.
    • Install the tool(s) in a “sandbox” to allow the stakeholder to “try it out”
    • Select a tool to use or look for more options
  • Put the tool into production
    • Read the docs to identify best practices for the software’s configuration
    • Prepare an installation plan that will address (as best as possible) any upgrade glitches (yes, you have to anticipate them now or suffer the consequences later!) so that you’re prepared for when a security advisory is released (or when the stakeholder starts begging for features from a new release)
    • Figure out a support plan to handle the inevitable questions that will arise during operations
    • Integrate these considerations into the process of either installing a package or using the “make, configure, make install” steps that most FOSS tools provide for installation
    • Carefully document the “as built” configuration including all assumptions and anticipated glitches to help yourself or future admins during the maintenance phase
  • On-Going Maintenance
    • Monitor the software
    • Subscribe to any relevant security mailing lists for the software so that you are apprised when a security (or other major) problem is detected
    • Track general trends relating to the software and its alternatives so that you are ready to respond if the project goes dormant or is eclipsed by newer, superior technology.
    • Upgrade routinely

About 15 years ago I noticed that the explosion of ready to use FOSS tools plus the trend toward general purpose tools and away from custom software was leading to a combinatorial crisis in software maintenance. I saw that it was the systems administrator’s responsibility to address the situation.

It has become apparent to me that the solution would require use of convention, standards and policy to reduce the complexity of the problem to manageable proportions. I searched for the most “standardized” conventions and policy-enforcing environment that would also provide the most flexible access to the most FOSS tools. The solution I found is Debian/GNU Linux, the universal operating system (although Ubuntu and other Debian derivatives also provide most of these benefits as well).

Debian simplifies the software evaluation process (apt-get [search|show]). Debian simplifies installation (apt-get install), security and new version upgrades (apt-get [upgrade|dist-upgrade]). Debian uses conventions and packages to simplify identifying best practices for administering the software (/usr/share/doc/[package]/, /var/lib/dpkg/info/[package].postinst, and wikis, mailings lists, bug reports, etc.). But the key benefit for managing the combinatorial explosion of FOSS tools is the Debian community’s value of striving to configure each package to automatically support the most common use cases while also providing support for unusual configurations (so you save tons of time in configuring the software).

In summary, the Debian/GNU Linux system provides the infrastructure needed to manage the combinatorial explosion of off the shelf FOSS tools cost effectively. If you have to service a lot of users, customers, or clients with challenging, diverse needs, I think Debian is the most cost effective way to meet their needs and deliver quality maintenance on an on-going basis year after year after year.

Posted by CJ Fearnley in Debian, Systems Management, 1 comment

Seven Observations On Software Maintenance And FOSS

The November 2009 issue of Communications of the ACM (CACM) has a very interesting article by Paul Stachour and David Collier-Brown entitled “You Don’t Know Jack About Software Maintenance”. The authors argue energetically for using versioned data structures and “continuous upgrading” to improve the state of the art of software maintenance.

The piece got me thinking about FOSS (Free and Open Source Software) and “continuous upgrading”. Here are seven observations on FOSS software maintenance that occurred to me as I reflected on the CACM article:

  1. FOSS projects “continuously” apply bug fixes and feature enhancements at no additional cost to their users. By applying these improvements “continuously”, the user reaps a steady stream of “interest payments” providing ever-improving security, performance, and functionality.
  2. Since FOSS incurs no licensing or license management costs, upgrading FOSS is not hindered by capital expenses.
  3. Typically support in FOSS projects is focused on the current stable version. Therefore, upgrading to the current stable version is the preferred way to receive the best support from FOSS communities.
  4. One of the key reasons behind Debian‘s strong track record of “continuous upgrading” is its way of handling the tricky issues involved with dependent library upgrades (such as libc6, libssl.so.0.9.8, & etc). The chapter on Shared Libraries in the Debian Policy Manual details a proven method to effectively handle library upgrade issues (including its sophisticated handling of versions).
  5. When upgrading is applied routinely and “continuously”, it becomes crucial to support customizations across upgrades which can be one of the biggest obstacles to a smooth upgrade (see my earlier post on customization and upgradeability). One reason for Debian’s effectiveness in this regard is its robust configuration file handling policy.
  6. It is worth noting that the “continuous” implied here is not the one emphasized in dictionaries (which takes its nuances from the mathematical / physics concept of “no interruptions” and the epsilon-delta definition that students of Calculus learn). That concept of “continuous” is impossible in systems administration which is necessarily discrete as are all computer operations. The connotation required here is, perhaps, “unending”, or “eternal” or somesuch.
  7. The “right” frequency for “continuous” upgrades is a complex tradeoff between business requirements and upgrade infrastructure maturity. Debian and Ubuntu provide vary mature support for “continuous upgrading”. They support the upgrade of production servers through release after release after major release with minimal downtime or risk of a glitch that could affect users. Their current release frequency of about 2 years may be the best we can do given the current state of the art of software maintenance. I hope we can learn to increase the frequency as better engineered upgrade policies are developed.

I prefer the name “eternally regenerative software administration” over “continuous upgrading”. It avoids the philosophical problems with the word “continuous” and emphasizes the active, “ecological” approach needed to envision the engineering of “regenerativity” in software. By that I mean software maintenance should involve building the system so each new version enables installation of the next while facilitating management of any customizations and integration with other software (including libraries and other “helper” applications). Regenerativity is the process of growth and change used by Nature itself. Software maintenance needs to follow similar principles.

Posted by CJ Fearnley in Debian, Eternally Regenerative Software Administration, Ubuntu, 0 comments