Author Archive

Learn Open Source Database Tools from Stanford for Free

Thursday, January 19th, 2012

By CJ Fearnley

I recently finished Stanford’s excellent free on-line course Introduction to Databases with Jennifer Widom. The course is a broad survey of database technology including XML, Relational Database Management Systems (RDBMS) from many angles (SQL forms the centerpiece of the course), OLAP (OnLine Analytical Processing) and NoSQL.

I was very impressed with the breadth of Widom’s approach to the subject: it was a major reason I decided to spend time on the course. Another strength is its nuts-n-bolts approach: some theoretical topics are covered but for the most part this is a course for practitioners. Finally, I particularly appreciated the extensive use of FOSS (Free and Open Source Software) in the course.

Why study databases? I will merely say that data is a core tool pervading the information resources of modern civilization. Databases are where data is housed. For example, the data constituting this blog is stored in a database and the same can be said for much (if not all) of the Internet. Databases are a profoundly vital, big picture subject.

Widom’s course is still open for enrollment in “archival mode” meaning you can watch the videos, work through the exercises, quizzes and exams, and track your progress, but the deadlines have expired and no more “Statements of Accomplishment” will be awarded (at least until the course is offered again). To complete the simple enrollment go to db-class.org and start learning about databases with FOSS today!

Although the course is broadly useful for anyone wanting to learn the basics of databases from a broad perspective, I found it to be particularly good for learning about the FOSS tools that can support database systems. So let’s start there.

FOSS Tools Covered

For traditional RDBMS, the course uses PostgreSQL, SQLite, and MySQL. Widom mentions some limitations of each database (DB) in regards to the SQL (Structured Query Language) standard including important distinctions about using each system with triggers, transactions (PostgreSQL has the best support), Views (MySQL uses updatable views whereas PostgreSQL & SQLite use triggers to modify views), recursion (only PostgreSQL and only in newer versions), and OLAP (only MySQL supports with rollup).

On the XML side, I learned xmllint and SAXON for XML validation, querying and transformation. The course covers the basics of DTDs, XML Schema, XPath, XQuery and XSLT. I used xmllint, saxonb-xquery, and saxonb-xslt to work through the exercises (the searchable Q&A forum provides usage details).

Finally, for NoSQL there are two videos which survey the state of the art. There is some depth on the Map Reduce framework provided by Hadoop. Several other FOSS systems are briefly discussed: Cassandra, Voldemort, CouchDB, MongoDB, and more. The NoSQL portion of the course is a good overview of the technology, but there are no exercises and hence little depth of a concrete nature.

From a FOSS perspective, the course is exquisite: FOSS utilities were front and center for the duration and some guidance in using these tools is provided. Help was readily available: I answered a few questions in the Q&A forum to help people overcome hurdles and I used its search feature to overcome some of my own. In sum, studying this course will give you the lay of the land for FOSS database technology including some advice about the limitations and strengths of its best database tools.

Stanford’s innovative platform for free on-line video courses

I am a big fan of so-called Open Educational Resources (OER) including free on-line video courses. Stanford’s Databases course is the 12th I’ve completed, but only the second in which I did a “deep dive” by reinforcing learning with exercises, quizzes and exams. In general, I use OER video courses as edutainment as I usually find the extra work too time-consuming: my goal is to broadly understand how the world works, not to build expertise in every subject I study! So, conceptually, I prefer the traditional form of video courses pioneered by MIT’s OpenCourseWare which in contrast with Stanford’s new approach might be called archived courses. Archived courses make the material available without (m)any social tools. So, working through the materials in traditional OER courses usually requires extra self-discipline and commitment (unless you just watch the videos for fun as I often do).

Stanford’s OER system, online at coursera.org, builds on the basic idea of OER video courses by adding deadlines, interactive feedback from automatically evaluated work, and some, including the Databases course, offer the ability to earn a “Statement of Accomplishment” for demonstrating basic proficiency. It is precisely these social enhancements that makes Stanford’s initiative so noteworthy. Together these social tools provide a shared experience with a clear set of tasks for a cohort of students working through the course at the same time.

The extra interactivity and the focus of deadlines give the Stanford approach to OER a special excitement and sense of goal accomplishment which is absent in archived courses. Even though I prefer the archived courses whose videos can be more entertaining than Stanford’s tutorial-focused approach, I have to admit I was enthralled by the deadlines: they kept me focused. It should be emphasized that Stanford’s courses like the more traditional OER archival courses can be pursued at a pace that suits your time and interest: there’s no imperative to follow the deadlines or earn kudos for accomplishments.

How I used the course materials

Although I have been using databases professionally for many years, I had not read nor studied the subject in any depth previously. I decided to take this opportunity for a deep dive. Widom’s Databases course includes a simple enrollment process, tutorial-style videos (for download or in a browser with Flash support), automatically graded exercises, quizzes, and exams each with hard deadlines, a Course Materials section with many goodies, Optional Exercises, a FAQ, a Q&A forum, and a “Statement of Accomplishment” from the instructor if you completed a substantial portion of the coursework by the deadline (6,513 of the 91,000 students enrolled in the Fall 2011 Databases course earned one; here is mine).

First, I watched each video twice taking detailed handwritten notes on the second viewing (22 pages worth!). I then checked the Flash version of the videos which often included inline questions that were very useful (Stanford’s Flash video viewer is the best I’ve seen: it even supports speeding up the video by 1.2 or 1.5 times while automatically adjusting the pitch!).

Then I worked through the quizzes and exercises. One nice feature of both was that you could attempt them many times. Different variants were provided to many of the quiz questions to make it harder to apply a blind trial and error approach and you can continue to work on them after getting them all correct (which might be useful as a way to practice for the exams or to see if you remember anything of the course when you check back in 1 or 10 years). I found the quizzes and exams to be very challenging and not so rewarding. Of course some mastery is required and acquired from the quizzes and for other learning styles they may prove more valuable than they were for me: judge for yourself.

The course included many supplementary exercises to provide extra practice. I used them for Relational Algebra and they were very helpful. However due to time constraints, I was unable to use them further. Most of my time was spent working through the exercises. I did all the exercises in an xterm window running sqlite, psql, xmllint, saxonb-xquery, or saxonb-xslt and pasting the results into the query workbench. This allowed me to really experience the FOSS tools “in the wild” which gave me a strong sense of their ins and outs. For me the interactive exercises were fantastic: they really helped me learn the material by directly engaging my problem solving faculty. They were like a real project with deadlines! Although I occasionally got ruffled with some of the difficult ones, they were engaging and fun!

Although the course web site is very simple and well-designed, it was still possible to have difficulty finding some of the gems provided for the students. For example, it took me awhile to find the code used in the demos (which was extremely useful by the way): the Course Materials section of the site has all the goodies you need but you have to mouse over the icons to see treasures that appear hidden at first (remember to right click to download). Also look carefully at the prescripts or postscripts affixed to some sections: more treasures.

The Q&A forum was helpful for finding things that were not at first apparent. A couple of times I scoured the Internet or Wikipedia looking for other angles on the material to understand a point I was struggling with. All work for the class is “open book”, so I only prepared for the exams by simply reviewing my notes.

Advice for students

I recommend taking the course now even though the deadlines have expired. Feel free to skip any part of the course that your interests, time constraints, and patience warrant. If the course is ever offered again, you will already know much of the material which should help you earn a “Statement of Accomplishment”. If not, you will better understand a broadly useful, important and interesting subject.

If you want to do a “lite” version of the course, I recommend skipping the quizzes and exams. In addition, many of the topics can be skipped if you are short on time or find them uninteresting. To her credit Jennifer Widom recommended as much in her screen side chats which provided a wonderful human dimension to the course. Although some of the material is cumulative, there are several parts of the course for which skipping is a real option. For example, the Relational Algebra (I thoroughly enjoyed doing those exercises!!!) and the Relational Design Theory topics are, I think, less important especially if you just want to acquire basic DBA skills. I deal with enough XML, that I found that part of the course extremely useful, but I can imagine someone who just needs SQL might skip those parts. This is a free course: you can be creative in how you use the materials so that you get what you want out of it!

I recommend doing the exercises associated with each topic (some did not have exercises). Some of them are quite challenging and some took me quite a bit of time. If necessary, skip some of the harder ones. If time is really pressing, just watch the videos that particularly interest you: remember this is a free resource: you can tailor your work on the course in whatever way suits your interests and time.

I did not buy nor borrow a textbook for the course (I was very impressed that Widom prepared reading assignments for four separate texts in the Course Materials sections of the site. Wow, that must have been a lot of work!). Having been through the course, I think a text is unnecessary for most students. You may find a few topics that are hard for you or for which the videos were insufficient to master the material. Since textbooks are more comprehensive and more detailed, they could help fill in the gaps. In particular diligent students may want a text. I prefer to learn iteratively, that is, I would prefer a shallow course today and another later that goes in more depth (I might even prefer to take two lite courses to build my mastery of a subject by degrees). But if you want a more complete experience now, then by all means get a textbook and dig in!

Other OER Database Resources

In addition to the OER resources below, I found occasion to reference Widom’s course site for Stanford students in the allied CS145 Introduction to Databases and her colleague Jeff Ullman’s offering of CS145 from Autumn 2002. MIT OCW has 6.814/6.830 on Database Systems which looks a bit too advanced for an introductory course and MIT OCW has 1.264J/ESD.264J Database, Internet, and Systems Integration Technologies which I found useful to supplement Widom’s course (especially for Relational Design Theory). Finally, the Indian Institutes of Technology (IIT) has a complete video course (with 43 videos totaling 40 hours and 17 minutes) on Database Management Systems or watch its YouTube playlist (I did not look at these videos, but I’ve seen other IIT material and they are usually very informative and accurate).

Conclusion

Stanford’s exciting new system for on-line courses is remarkable in its use of social tools to engage students. This is a boon to the OER movement! In today’s rapidly changing world, refreshing and expanding one’s skills is essential to apprehending the needs and opportunities that abound if you are curious enough!

I’m hooked! Although I still love and will continue to use archived courses, I will be checking to see if Stanford has any courses of interest regularly from now on! I’ve signed up for several of Stanford’s offerings (there are 16 of them!) for Winter term with the intention of “dropping” or doing a “shallow dive” (maybe just watch a few videos and do some exercises as interest permits). But I am eying the Model Thinking course for a possible deep dive. To see the full list of offerings go to Class Central: Summary of Stanford’s online course offerings and plan your learning for the Winter term which starts next Monday, January 23rd!

Stanford’s Fall 2011 edition of Introduction to Databases was a great course! Kudos to Jennifer Widom and everyone at Stanford who made this possible.

If you haven’t started or finished it yet, head over to db-class.org now and get to it! It is one of the best resources on the Internet for learning about databases. Moreover, it includes the special benefit of covering a broad range of important FOSS database tools.

Other Reviews of Widom’s Databases Course

Elizabeth Krumbach Keynoting at Fosscon in Philadelphia Saturday 23 July 2011

Monday, July 18th, 2011

By CJ Fearnley

LinuxForce Systems Administrator, Elizabeth Krumbach will deliver the keynote address this Saturday, 23 July 2011 at FOSSCON. Her talk entitled “Make a Difference for Millions: Getting Involved with FOSS” will help attendees better understand how to contribute to the greater good by participating more actively in FOSS (Free and Open Source Software).

FOSSCON will be held at Basekamp, 723 Chestnut street, 2nd floor; Philadelphia, PA. The doors open at 8 AM on Saturday. Elizabeth’s talk starts at 10 AM. The other talks follow her keynote. A listing of speakers at FOSSCON and the schedule is on-line here. Register to attend FOSSCON for free (no charge!) to learn about the excitement of FOSS in Philadelphia!

I am looking forward to the event. I hope to see you there!

LinuxForce is a sponsor of FOSSCON.

VIDEO: Elizabeth Krumbach on the SDForum Panel Discussion on Women and Open Source

Monday, April 25th, 2011

By CJ Fearnley

On March 31, LinuxForce’s Elizabeth Krumbach participated in the SDForum panel discussion on “Tech Women: Women and Open Source”. There was a wide ranging discussion starting with a basic introduction to “open source” and how to get involved in open source. Other issues covered included special issues with the involvement of women, mentoring, business, and entrepreneurship. The Google Open Source Blog also reviewed the session.

Video of the panel discussion is online. Part 1 is 52 minutes.

Part 2 is 21 minutes.

Slides for my talk on “Automating X11 Keystrokes”

Friday, March 18th, 2011

By CJ Fearnley

X11 is the graphical user interface most widely used on Linux operating systems. My slides and video demo for a short talk given at the Philadelphia area Linux Users Group (PLUG) on March 2nd are on-line. The slides briefly cover xrandr (which can also be used to set the screen resolution), xset, xwd / xwud, xdotool, and xautomation including xte. You can get the slides and watch the video at my page on Automating X11 Keystrokes.

Finding Help in Ubuntu (SCALE, Feb 25-7 in LA); Ubuntu Diversity; Debian Squeeze AKA 6.0

Thursday, February 24th, 2011

By CJ Fearnley

Here are some news items:

• Elizabeth Krumbach will give a talk on “Finding Help in Ubuntu” at UbuCon in Los Angeles

Tomorrow, February 25th at 9AM, LinuxForce Remote Responder Administrator Elizabeth Krumbach will give the opening talk at UbuCon at the Hilton Los Angeles Airport Hotel. Her talk is entitled Finding Help in Ubuntu. UbuCon is part of SCALE: Southern California Linux Expo which runs 2/25-2/27 2011.

• Elizabeth Krumbach interviewed in Linux Pro Magazine

Linux Pro Magazine interviewed LinuxForce Remote Responder Administrator Elizabeth Krumbach in an article on Ubuntu Increasing Its Diversity. Elizabeth is helping with the effort to increase diversity at the Ubuntu Developer Summit in Budapest in May.

• A new stable version of Debian known as Squeeze or 6.0 was released

The Debian project announced the release of Squeeze on 6 February 2011. eWeek reviewed the new release. We have already upgraded 3 systems and installed several new systems running Debian Squeeze. It is very reliable as we’ve come to expect from Debian.

In addition, the Debian web sites received a new design, a new “look”, that was released together with Squeeze. Check it out at http://www.debian.org.

One way to bring Innovative FOSS and Linux solutions to your organization

Friday, November 5th, 2010

By CJ Fearnley

Here is an effective way to try out Linux or FOSS (Free and Open Source Software) to help grow your organization. Hat tip to CIO magazine for

Adam Hartung’s short little InnovationZone article “Outsource for Growth”.

Mr. Hartung recommends outsourcing to grow your organization … to do new things … to innovate … to be more flexible. Since your IT staff is probably too busy to take on growth projects like these, it makes a lot of sense to use consulting experts, like LinuxForce, to develop, test, and provision innovative FOSS infrastructure for you. How could your organization benefit from the outsourcing for innovation approach to build out a FOSS or Linux-based solution to grow your business?

Licensing Considerations When Integrating FOSS and Proprietary Software

Friday, October 29th, 2010

By CJ Fearnley

Recently, I was looking for resources to help explain the implications of integrating FOSS (Free and Open Source Software) with proprietary software. This question is important for any organization who might want to build an “embedded” or “dedicated” system or product which might include either their own proprietary software or third party applications. I discovered that most of the information available about FOSS licensing addresses this issue rather obliquely. This post will cover the basics so that such organizations can see how straightforward it would be to include FOSS in their projects. Standard disclaimers about my not being a lawyer apply.

Use of any software system requires understanding the software licensing involved. There are many dozens of FOSS licenses that could apply in your situation, so understanding the details is necessary to assess compatibility. In broad terms, these can be effectively summarized by the Debian Free Software Guidelines and The Open Source Definition. Any software that complies with those specifications will freely (in both the “freedom” and money senses of the word) permit running proprietary and FOSS applications on the same system. This is easier said than done. Since Debian has analyzed most common FOSS and quasi-FOSS licenses, their archive and their license page can be used to assess how the license might work in your situation (for example, anything in “main” would be OK for co-distribution and running in mixed environments). So we always start with Debian’s meticulous and well-documented analysis to assess any license.

From a high-level perspective, the requirements that FOSS licensing will impose on an organization wanting to include it with proprietary software will primarily be in the form of providing appropriate attribution (acknowledgement) and providing the source code for any FOSS software (including any modifications) shipped as part of an integrated system. Typically the attribution requirement can be satisfied by simply referencing the source code of the FOSS components. The source code requirement can be met by providing the source code for all of the FOSS distributed as part of the integrated system, for example, by placing it on a documented ftp site.

There are some subtle issues that may arise during the software development process if proprietary and FOSS code are “linked” together. In such cases it is necessary to ensure that code over which one wants to assert proprietary control is only combined with FOSS code that explicitly supports such commingling. So it is necessary for a company wanting to keep its code proprietary to keep it “separate” from the FOSS code used in the integrated system. The use of the words “linked” and “separate” is intentionally vague because the terms of each FOSS license will need to be examined by legal counsel to understand the precise requirements. In these situations, there are license compliance management systems that can be adopted to help ensure that this separation is maintained.

Those are the basic issues. The references below describe these and related issues involved with Linux & FOSS licensing in much more depth:

Some Initiatives Resulting From DebConf10

Thursday, September 30th, 2010

By CJ Fearnley

I attended DebConf10 in early August at Columbia University in New York City. I thought I’d document some of the initiatives that resulted from that event.

I attended the Debian Policy BoF. It inspired me to start reviewing Debian Policy and led to my submission of a bug report to improve the description of the archive areas in Debian. In the BoF (Birds of a Feather) session, Russ Allbery requested help from everyone including non DDs (Debian Developers) to make policy clearer and more descriptive, so I obliged. I see that the negotiated text has been accepted and included in the git repository for the next release.

During the Debian Science sessions that I attended (see especially the video from the Debian Science Round Table), the idea of engaging upstream providers (software projects that are packaged by integrators like Debian) more effectively was discussed. This led me to draft a proposal that I posted to the Debian Project mailing list: Improving coordination / support for upstreams. There was precious little feedback, but I think it is a profoundly important issue: how do we improve coordination of upstreams with projects like Debian that integrate their software. So far there is very little infrastructure or knowledge about this important issue in the management of FOSS (Free and Open Source Software). How do you think we should start to address this problem?

I helped with the herculean task of getting Sage, a FOSS mathematics system, back into Debian. After discussing the issue during the Debian Science track at DebConf10, I met Lucas Nussbaum in the hacklab and he (with help from Luca Falavigna) managed to get the old buggy version of Sage removed from unstable (apparently, this version was causing support issues for the upstream Sage community, so this was a positive step forward). I also submitted five bugs (#592349, #592354, #592425, #592426, and #592429) about new upstream versions that are needed. I posted two detailed reports of work needed on packaging Sage for Debian to appropriate mailing lists. Getting Sage into Debian is the kind of big FOSS management challenge that I’m excited about. But I will need a lot of help to make progress. Let me know if you are interested in contributing to the effort!

Finally, for the record, here are links to the sessions where I participated in the discussion (video is available): Bits from the DPL, SPI BOF, Enterprise Infrastructure BOF, Mathematical Software in Debian, Overall presentation of Debian Science, and Debian Science Round Table.

Attending Debian Day and DebConf10 Next Week

Thursday, July 29th, 2010

By CJ Fearnley

Since I’ve been involved with Debian GNU/Linux for over 15 years, it is exciting that I will be able to attend the first two and a half days of DebConf10 including Debian Day from Sunday to Tuesday August 1–3.

I am particularly looking forward to the following sessions: Pedagogical Freedom: Debian, Free Software, and Education, Beyond Sharing: Open Source Design What are the challenges for the collaborative design process?, FLOSS Manuals: A Vibrant Community for Documentation Development, Bits from the DPL, The Java Packaging Nightmare, Collaboration between Ubuntu and Debian, How We Can Be the Silver Lining of the Cloud, Enterprise Infrastructure BOF How enterprise technologies such as Kerberos, LDAP, Samba, etc can work better together in Debian, Using Debian for Enterprise Infrastructure Stanford University: A Case Study, and more (see the schedule for each day).

I’m also hoping I can also attend on Thursday when the math and science focused sessions will be held, but I’ll have to see how next week’s schedule works out in the office. If you are coming to DebConf, I’ll see you there!

Beyond the Cloud: The Comprehensive Flexibility of FOSS May Bring Clearer Skies

Tuesday, June 29th, 2010

By CJ Fearnley

Last week’s InformationWeek has a good article on cloud computing, Cloud ROI: A Grounded View.  It seems that even with all the hype (or because of it?) most are not “running blindly” to adopt “the cloud”.  I must admit the cloud metaphor has a powerful poetic charm to it.  That is probably why it has grabbed the attention of so many over the past few years. Everything in our world is ephemeral, so there is an aptness to the concept of a “cloud”. Moreover, I too like and use cloud analogies. But I am now looking for clearer skies!  Here is a short list of my gripes about "the cloud":

  • What does “cloud computing” mean? It isn’t at all clear! Here is some data: CIO magazine cites a Forrester report that says "the number one challenge in cloud computing today is determining what it really is". CIO also reported on a McKinsey study that "found 22 separate definitions of cloud computing"! And that leads to my second point:
  • The word “cloud” is so … vacuous and amorphous …  ”A cloud:  it looks like Zeus!” only to transform in shape before your very eyes “Wait, now it looks like Aphrodite!” … and then its gone!  Is this the kind of model people should entrust with their business data?  It has no structural stability:  inherently:  it is just rapidly moving gases … far out of reach … away in the sky. What kind of business model is that?
  • Although RADLab (Reliable Adaptive Distributed Systems Laboratory) has put out some interesting papers, I was a little surprised when I read their acknowledgements in the CACM article A View of Cloud Computing.  It reads like a who’s who in cloud computing: Amazon, Google, Microsoft, Sun, eBay, Cloudera, Facebook, and so on. The original Berkeley paper has a shorter list of cloud companies funding their work. I’m sure they are maintaining their academic integrity, but it does show that they are not wholly independent. Remember what Kitty Foyle said:

    I’ve taught myself a lesson, or I hope I have: when I find myself thinking something I stop a minute and ask myself, Now who had it all figured out beforehand that was the way they wanted me to think?
    — From Christopher Morley’s novel “Kitty Foyle

  • Although the Berkeley papers raise a number of very interesting issues, none of them requires vacuous meaningless jargon to further obscure the subtleties and complexities of emerging technology trends! So my final gripe is that the name “cloud” tends to obscure what is really important even when I agree with “the cloud thinkers”.

Perhaps the most important issue “the cloud people” are missing is what might be called comprehensive flexibility. As a user of software technology, I want my computing functionality everywhere … in every imaginable format. For example, I’d like to be able to use the software that I’ve invested the time to learn to be available on my desktop (32-bit, 64-bit, Mac, Windows, or Linux), and I’d like it to work whether the Net is working or not, on my cell phone and other portable devices (again with network or not), in the data center (clustered or not), in the WAN (Wide Area Network, note that the Internet is our shared, global WAN), perhaps distributed among several hosting providers, and perhaps even provided by “utilities” (to save the trouble of maintenance and scaling costs). But I think software should be so flexible that it can live in each of those environments. Talk about utility computing: wouldn’t software have so much more utility if it worked everywhere instead of being beholden to whatever your software provider offers or what hardware you happen to have in front of you right now?

Fortunately this type of flexible software does exist. It is called Free and Open Source Softare (FOSS) and it is becoming ubiquitous. In fact, whether you know it or not, you are using FOSS software: Apache, the FOSS web server, runs this web site and indeed the majority of all web sites. WordPress, the blogging software we use here is also “everywhere” and you can purchase it from “cloud” utility providers or install, run, and modify it yourself. The list of important FOSS software goes on and on and this blog is dedicated to helping elucidate its importance as well as the issues involved in managing it.

So I would argue that instead of letting our heads go to the “clouds” we need to ask how can we make software that works in all environments, on all hardware, and for all people? … how can we make software that is comprehensively flexible?