Archive for the ‘Systems Management’ Category

Please Document the Shop: On the importance of good systems documentation

Friday, May 21st, 2010

By Laird Hariu

We have all heard this: You need to document the computer infrastructure. You never know when you might be “hit by a bus”. We hear this and think many frightening things, reassure ourselves that it will never happen and then put the request on the back burner. In this article I will expand on the phrase “hit by a bus” and then look at the consequences.

Things do happen to prevent people from coming into work. The boss calls home. Talks to the wife and makes the sad discovery that Mike wont be coming in anymore. He passed away last night in bed. People get sudden illnesses that disable them. Car accidents happen.

More often than these tragedies occur, thank goodness, business conditions change without warning. In reorganizations whole departments disappear, computer rooms are consolidated and moved, companies are bought and whole workforces replaced. I have had the unhappy experience of living through some of this.

Some organizations have highly transient workforces because of the environment that they operate in. Companies located near universities benefit from an influx of eager young, upwardly mobile university graduates. These workers are eager to gain experience but soon find higher paying jobs in the “real world” further away from campus. These companies have real turnover problems. People are moving up so quickly, they don’t have time to write things down.

Even when you keep people in place and maintain a fairly stable environment, people discover that what they have documented in their heads can just fade away. This is getting to be more and more of an issue. Networks and servers and other such infrastructure functions have been around for 20 years in many organizations. Fred the maintainer retired five years ago. Fred the maintainer was transferred to sales. The longer systems are around, the more things can happen to Fred. Fred might be right where he was 20 years ago. He just can’t remember what he did.

What does all this mean? What are the consequences of losing organizational knowledge in a computer organization? To be blunt, it creates a hideous environment for your computer people. The system is a black box to them. They are paralyzed. They are rightfully afraid. Every small move they make can bring down the system in ways they cannot predict. Newcomers take much longer to train. Old-timers learn to survive by looking busy while doing nothing. The politics of the shop and the whole company is made bloody by the various interpretations of the folklore of the black box. He/she who waves their arms hardest rules the day. This is no way for your people to live.

This is no way for the computer infrastructure to live as well. While the games are played the infrastructure evolves more slowly and slowly. Before long the infrastructure is frozen. Nobody dares to touch it. The only way to fix it is to completely replace it at considerable expense. In elaborate infrastructures this is easier said than done. The productive lifetime of the platform is shortened. It was not allowed to grow and evolve to lengthen its lifetime. Think of the Hubble Telescope without all the repairs and enhancements over the years. It would have burned out in re-entry long ago.

Having made my case, I ask again; for your own good, please document the shop. Make these documents public and make them accurate. Record what actually is rather than what you wish it to be. It is better to be a little embarrassed for a short while than to be mislead later on. Update the documentation when changes occur. An out of date document can be as bad as no document at all. Make an effort to record facts. At the same time don’t leave out general philosophies that guided the design and other qualitative information because it helps your successors interpret the facts when ambiguities occur.

Think of what you leave behind. Persuade your boss to make this a priority as well. Hopefully the people at your next workplace will do the same.

The Nature and Importance of Source Code and Learning Programming with Python

Thursday, February 25th, 2010

By CJ Fearnley

Last year a client asked us for advice on getting started with programming. So I thought I’d share some thoughts about programming, its relationship with FOSS (Free and Open Source Software) management and why Python is a good language for learning programming including some great on-line resources. But first I want to make sure our business-oriented readers understand the nature and importance of source code.

The “source” aka “the code” provides a language in which computer users can create or change software. One does not have to be a programmer to work on the code. In fact, every computer user is, ipso facto, a programmer! Menus, web interfaces, and graphical user interfaces (GUIs) are some of the more facile “languages” for computer programming that everyone, even children, can readily learn and use. Of course, building complex software systems requires a more expressive specification language than a web form, for instance, can provide.

Although all computer software is specified with source code, FOSS systems are unique in that the source code is made available with the software. In contradistinction, software lock-in or vendor lock-in describes the unfortunately all too common practice of many organizations to block access to their source code.

Having access to the source code provides huge operational benefits. For one, the source can be used to understand how the software works: it is a form of software documentation (indeed, it is the most definitive form of software documentation possible!). Also, code can be easily changed to add diagnostics or to test a possible solution to a problem or to modify or add functionality. In addition, the source is a language both for specifying features to the computer and for discussing computing with others. So most mature FOSS languages have vibrant support communities in which one can participate, learn and get help.

The source is a tool: a powerful, multi-purpose, critically important tool.

Since LinuxForce focuses on FOSS, we are able to take full advantage of the availability of the code. We are always working with the source! Since most of our work is systems administration, we usually “program” configuration files. However, we also write systems software and scripts and we support software developers extensively, so we have a persistent, deep, and productive relationship with code.

But what to suggest to someone like our customer who wants to learn programming?

I remembered seeing a blurb in Linux Journal referencing an article they published in May 2000 by Eric Raymond entitled "Why Python" which argues persuasively for the virtues of the programming language Python. I had often felt that Perl’s idiosyncrasies made it difficult to use, so Eric’s critique of Perl and accolades for Python were convincing to me. In addition, I follow FOSS mathematics software and I was aware that Sage is a Python “glue” to more than fifty FOSS math libraries. I’ve been meaning to look into Python so I could use Sage. Another pull comes from my work at LinuxForce where we use a lot of Python-based software including mailman, fail2ban, Plone, and several tools used for virtual machine management such as kvm, virtinst and xen-tools. Python has a huge software repository and community. So one is likely to find good libraries to build upon (thus avoiding the extra learning curve of building everything from scratch). Python is an interpreted language which makes it easier to debug and use so the learning process is smoother.

To finish the recommendation, I just needed to find some on-line resources. First, Kirby Urner suggested these two: Wikieducator’s Python Tutorials and "Mathematics for the Digital Age and Programming in Python". Then, I checked out the Massachusetts Institute of Technology’s (MIT) OpenCourseWare which provides extensive course materials for many of their classes (I’ve already watched the full video set for a couple of MIT’s courses including the legendary Walter Lewin’s "Classical Mechanics" and have been very impressed by the quality and content of their materials). After nearly 30 years of introducing students to programming with Scheme, MIT switched to Python in 2008! The materials for their introductory Python-based course "6.00 Introduction to Computer Science and Programming" are very thorough, accessible and helpful. Their free on-line materials include the full video lectures of the class plus assignments, sample test problems, class handouts, and an excellent Readings section with references to "the Python Tutorial" and a very good free on-line textbook "How to Think Like a Computer Scientist: Learning with Python".

In conclusion, if you or anyone you know wants to learn how to program computers, I recommend starting with Python using MIT’s on-line course materials supplemented with the other on-line resources mentioned above (and summarized in the table below). I’ve now watched more than half of the videos from the MIT 6.00 course and I’ve worked through several of their assignments: this is a great course! Even with nearly three decades experience programming including a couple of college-level courses in the 1980s, I’m finding the class is more than just good review for me: I’ve learned a few new things (in particular, dynamic programming and the knapsack problem). Python’s clean syntax and elegant design will help as one delves into writing code for the first time. Its extensive libraries and repositories will support the application of one’s newly acquired computing skills to solve problems in the area of the student’s special interests whatever they may be … and that’s the way we learn best: by doing something that we personally care about!

Summary of On-Line Resources for Learning Python

Some thoughts on best practices for SMTP blocking of e-mail spam

Thursday, January 21st, 2010

By CJ Fearnley

Blocking e-mail spam at the time of SMTP (Simple Mail Transfer Protocol) transfer has become a best practice. There is no point wasting precious bandwidth & disk space and spending time browsing a huge spambox when most of the incoming flow is clearly spam. At LinuxForce our e-mail hygiene service, LinuxForceMail, makes extensive use of SMTP blocking techniques (using free and open source software such as Exim, Clam AV, SpamAssassin and Policyd-weight). But we are extremely careful to only block sites and e-mails that are so “spammy” that we are justified in blocking it. That doesn’t prevent false positives, but it keeps them to a minimum.

Recently we investigated an incident where one of our users had their e-mail blocked by another company’s anti-spam system. In investigating the problem, we learned that some vendors support an option to block e-mail whose Received header is on a blacklist (in our case it was Barracuda, but other vendors are also guilty). Let me be blunt: this is boneheaded, but the reason is subtle so I can understand how the mistake might be made.

First, blocking senders appearing on a blacklist at SMTP time is good practice. But to understand why blocking Received headers at SMTP time is bad, it is important to understand how e-mail transport works. The sending system opens a TCP/IP connection from a particular IP address. That IP address should be checked against blacklists. And other tests on the envelope can help identify spam. But the message headers including the Received header are not so definite. We shall see that even a blacklisted IP in these headers may be legitimate. So blocking such e-mail incurs unnecessary risks.

The problem occurs when a user of an ISP (Internet Service Provider) sends an e-mail from home, they are typically using a transient, “dynamic” IP address. Indeed it is possible that their IP address has just changed. Since the new address may have been previously used by someone infected with a virus sending out spam, this “new” IP address may be on the blacklists. So, due to no fault of your own, you have a blacklisted IP address (I will suppress my urge to rant for IPv6 when everyone can finally have their own IP address and be responsible for its security).

Now, when you send an e-mail through your ISP’s mail server, it records your (blacklisted) IP as the first Received header. So your (presumably secure) system sending a legitimate message through your ISP’s legitimate, authenticating mail server is blacklisted by your recipients’ overambitious anti-spam system. Ouch. That is why blocking such an e-mail is just wrong. This kind of blocking creates annoying unnecessary complications for the users and admins at both sides. Using e-mail filtering to put such e-mails into a spam folder would be a reasonable way to handle the situation. Filtering is able to handle false positives whereas blocking generates unrecoverable errors.

Do not block e-mail based on the Received header!

Given 250,000 tools on the shelf, how do you manage them?

Monday, December 21st, 2009

By CJ Fearnley

Although I haven’t seen a thoroughly researched study, I figure there must be at least 250,000 FOSS (Free and Open Source Software) tools available to every systems administrator on the planet (230,000 at SourceForge + 15,000 at Launchpad + 12,000 at CodePlex + 5,000 at Google Code and that doesn’t count the Linux kernel or any of the myriad other self-hosted projects). These 250,000+ resources comprise the full “toolbox” that admins can use for building solutions with FOSS; they represent the FOSS equivalent of COTS (Commercial Off-The-Shelf). Of course, if you add open source but non-free or commercial tools, the problem explodes combinatorially.

How can a systems administrator support the largest possible subset of these “on the shelf” resources to best service the next need from a stakeholder (like the boss or a new client)?

First let me emphasize the difficulty of the task with a list of items that systems administrators and systems management firms like LinuxForce are expected to do whenever a stakeholder presents a software need:

  • Find and Evaluate software that can meet the need:
    • Identify several candidate applications that might meet the business requirements for a given project, function, or need
    • Research the options to assess their ability to meet the requirements (actually we, the systems administrators of the world, are actually expected to know which tool is “best of breed”: just from our past experience. The false assumption is, if it isn’t well known it must not be any good. The long tail applies to the 250,000+ FOSS tools also!). In our experience such research is essential, unfortunately, there is rarely enough budget to carefully explore the options.
    • Install the tool(s) in a “sandbox” to allow the stakeholder to “try it out”
    • Select a tool to use or look for more options
  • Put the tool into production
    • Read the docs to identify best practices for the software’s configuration
    • Prepare an installation plan that will address (as best as possible) any upgrade glitches (yes, you have to anticipate them now or suffer the consequences later!) so that you’re prepared for when a security advisory is released (or when the stakeholder starts begging for features from a new release)
    • Figure out a support plan to handle the inevitable questions that will arise during operations
    • Integrate these considerations into the process of either installing a package or using the “make, configure, make install” steps that most FOSS tools provide for installation
    • Carefully document the “as built” configuration including all assumptions and anticipated glitches to help yourself or future admins during the maintenance phase
  • On-Going Maintenance
    • Monitor the software
    • Subscribe to any relevant security mailing lists for the software so that you are apprised when a security (or other major) problem is detected
    • Track general trends relating to the software and its alternatives so that you are ready to respond if the project goes dormant or is eclipsed by newer, superior technology.
    • Upgrade routinely

About 15 years ago I noticed that the explosion of ready to use FOSS tools plus the trend toward general purpose tools and away from custom software was leading to a combinatorial crisis in software maintenance. I saw that it was the systems administrator’s responsibility to address the situation.

It has become apparent to me that the solution would require use of convention, standards and policy to reduce the complexity of the problem to manageable proportions. I searched for the most “standardized” conventions and policy-enforcing environment that would also provide the most flexible access to the most FOSS tools. The solution I found is Debian/GNU Linux, the universal operating system (although Ubuntu and other Debian derivatives also provide most of these benefits as well).

Debian simplifies the software evaluation process (apt-get [search|show]). Debian simplifies installation (apt-get install), security and new version upgrades (apt-get [upgrade|dist-upgrade]). Debian uses conventions and packages to simplify identifying best practices for administering the software (/usr/share/doc/[package]/, /var/lib/dpkg/info/[package].postinst, and wikis, mailings lists, bug reports, etc.). But the key benefit for managing the combinatorial explosion of FOSS tools is the Debian community’s value of striving to configure each package to automatically support the most common use cases while also providing support for unusual configurations (so you save tons of time in configuring the software).

In summary, the Debian/GNU Linux system provides the infrastructure needed to manage the combinatorial explosion of off the shelf FOSS tools cost effectively. If you have to service a lot of users, customers, or clients with challenging, diverse needs, I think Debian is the most cost effective way to meet their needs and deliver quality maintenance on an on-going basis year after year after year.

A FOSS Perspective On Richard Schaeffer’s Three Tactics For Computer Security

Friday, December 11th, 2009

By CJ Fearnley

Federal Computer Week published a great, succinct quote from Richard Schaeffer Jr., the NSA’s (National Security Agency) information assurance director, on three approaches that are effective in protecting systems from security attacks:

We believe that if one institutes best practices, proper configurations
[and] good network monitoring that a system ought to be able to
withstand about 80 percent of the commonly known attack mechanisms
against systems today, Schaeffer said in his testimony. You can
actually harden your network environment to raise the bar such that
the adversary has to resort to much, much more sophisticated means,
thereby raising the risk of detection.”

Taking Schaeffer’s three tactics as our lead, here is a FOSS perspective on these protection mechanisms:

Best practices implies community effort: discussing, sharing and collectively building understanding and techniques for managing systems and their software components. FOSS (Free and Open Source Software) communities develop, discuss and share these best practices in their project support and development forums. Debian’s package management system implements some of these best practices in the operating system itself thereby allowing users who do not participate in the development and support communities to realize the benefits of best practices without understanding or even knowing that they exist. This is one of the important benefits of policy- and package-based operating systems like Debian and Ubuntu.

Proper configuration is the tactical implementation of best practices. Audit is a critical element here. Debian packages can use their postinst scripts (which are run after a package is installed, upgraded, or re-installed) to audit and sometimes even automatically fix configuration problems. Right now, attentive, diligent systems administrators, i.e., human beings, are required to ensure proper configuration as no vendor — not even Debian — has managed to automate the validation let alone automatically fix bad configurations. I think this is an area where the FOSS community can lead by considering and adopting innovations for ensuring proper configuration of software.

Good network monitoring invokes the discipline of knowing what services are running and investigating when service interruptions occur. Monitoring can contribute to configuration auditing and can help focus one’s efforts on any best practices that should be considered. That is, monitoring helps by engaging critical thinking and building a tactile awareness of the network — what it does and what is exposed to the activities of a frequently malicious Internet. So, like proper configurations, monitoring requires diligent, attentive systems administrators to maintain security. LinuxForce’s Remote Responder services builds best practices around three essential FOSS tools for good network monitoring: Nagios, Munin, and Logcheck.

Crossroads in FOSS Projects: Some Business Considerations

Tuesday, November 24th, 2009

By CJ Fearnley

At our Seminar last month, Managing FOSS to Lower Costs and Achieve Business Results, several participants asked about the dynamics of FOSS (Free and Open Source Software) projects that reach a crossroads (a failure, a merger, loss of key personnel, etc). I had not expected that concern because with commercial software, it seems to me, the problem is more severe. When you have the source code and the right to modify and redistribute it, the source gives many more options (and its freedoms provide many more protections) than when commercial software goes bankrupt or gets bought by a competitor for instance.

But the reason for the questions may be a lack of understanding about how FOSS projects work. They involve individual human beings, perhaps just a single person or, more likely, several people from many organizations and even different cultures around the world joined in common purpose. For various cultural reasons, the project may be “owned” by an entity — usually a non-profit, but some are for-profit or even government owned, while others may simply be an “ad hoc initiative”. Some projects have explicit constitutions and defined processes for organizing the work and handling problems others are more informal.

At any time, any human social structure can experience a crossroads that could lead it to fail suddenly or wither on the vine in a gradual descent into “oblivion”. The cause of the failure will shape the results, but a very common situation is that conflicting visions or approaches for the project result in a “fork”. Then a sub-group of the original project takes the source code and starts a “new” project to develop the code in a new direction. Sometimes the original project “dies” and sometimes both continue resulting in two projects. Since multiple FOSS projects serving the same function or market incur inefficiencies due to duplicate development, there is a strong cultural value in the FOSS world to try to find a way to accommodate everyone in the project and prevent forks. When it works, the result is great software that meets everyone’s needs. But the reality is that often it is more effective to have multiple implementations of the same functionality so that each can be optimized for distinct objectives. Frequently one cannot know which approach will be best until many years of development and evaluation have transpired.

I recently learned about a FOSS project that forked when a friend asked me to copy some files to his new “My Book Essential”, a Western Digital product that provides 1TB of USB (Universal Serial Bus) storage. The My Book uses the poorly documented, non-free NTFS (New Technology File System). Linux has three projects that support NTFS: an in kernel driver, ntfsprogs (the Linux-NTFS project), and NTFS-3G. It turns out that all three were available for my Debian Lenny (5.0.3) system. First, I tried the in kernel support and learned that it was still read-only. Then I tried ntfsprogs which failed to mount the My Book:

NTFS-fs error (device sdc1): load_system_files(): Volume is dirty. Mounting read-only. Run chkdsk and mount in Windows.

I realized that since it was a new device it probably did not ship from the factory with a dirty volume. It was probably a bug. So I tried NTFS-3G which worked very well. In my research of the situation I was able to determine that both NTFS-3G and Linux-NTFS are under active development and have features missing from the other. So each has value and I’m glad my distribution included both. In Debian Lenny, the NTFS-3G driver has better support for writing files.

This illustrates one of the benefits of a crossroads in a FOSS project: you can end up with two good tools to add to your toolbox!

Xen Virtualization: Migrating 32-bit domUs to 64-bit dom0s

Friday, October 9th, 2009

By Elizabeth Krumbach

Virtualization provides the facility to run multiple isolated computer operating systems on one piece of computing hardware. There has been a huge increase of interest in virtualization technology because recent advances in multi-core technology provide significantly more computing power in each machine with ever decreasing costs. Virtualization is one of the best ways to take advantage of these big changes in hardware.

Currently, Xen is the most mature FOSS (Free and Open Source Software) virtualization technology. Although we love the idea of KVM, since it requires a special processor extension on X86 systems, it cannot work on older hardware. So for at least another few years, we think Xen is the more flexible choice for FOSS virtualization projects.

The Xen infrastructure consists of the Xen hypervisor which “runs the show”, a domain 0 (dom0) which runs a special, privileged version of the operating system (typically Linux, but NetBSD and Solaris are also supported), and one or more domain U (domU) “guest” (or “User”) operating systems. We have found that Xen is easy to configure in many situations, but we encountered some complications in running a domU on a dom0 with a different architecture.

We recently migrated some 32-bit domUs running Debian Etch (4.0) from a 32-bit dom0 to a newer 64-bit dom0 running Debian Lenny (5.0). We did a direct move (using rsync) of the Logical Volume Manager (LVM) slices from the 32-bit dom0 to the 64-bit dom0. This means we’d now be running our 32-bit Etch domUs on a 64-bit Lenny dom0.

The first question was whether this would be possible. Absolutely! 32-bit domUs have no trouble running on 64-bit dom0s, we could even use the 64-bit Xen kernel in these 32-bit systems to avoid additional kernel installations we’d need to maintain on the dom0. The second question was whether we could properly load the 64-bit kernel modules inside our domU. Again, yes! But with a caveat: the domUs were 32-bit Etch, so the 64-bit Lenny kernel modules were not simply installable via apt. We realized that copying over the .deb package for the kernel modules and running dpkg -i --force-architecture linux-modules-2.6.26...deb would not be a maintainable way to handle the kernel module updates moving forward. So we weighed our options:

1. Serve these modules via a network file system (such as NFS) to each domU on bootup.

2. Deploy a script that would notify the domU and copy the new kernel modules .deb to it for installation. We could then install the new module package at our discretion.

We decided that the first option violated our strict security policy which calls for running as few services on the dom0 as possible. Since the second solution is scriptable and therefore automatable, it fit our vision of having easily maintainable systems regardless of the underlying complexity. So we installed the 64-bit modules prior to migration so that all the proper modules would be loaded as soon as we brought up the domU on the new Dom0. The result was a flawless migration of our 32-bit domUs to the new 64-bit dom0.