Software appliances: rPath

By joe

December 20, 2006 - 10 minutes read - 2051 words

Now that I have complained aloud about conary, which is the package management bit in rPath, let me praise the idea behind rPath.

No, no one prompted me. No nastygrams. My major issue is with Conary, the distribution builder, and the decisions that must have gone into it. Punchline: rPath works, though conary is proving to be more of a pain than RPM, it is even less apparant to me how to build an appliance than I would like it to be, and integrating things we need to integrate in, such as lots of perl modules, lots of other bits, is a non-starter due to the issues in dealing with conary and their distribution build system. This is a shame, as I think rPath could be really good, and if it were able to easily incorporate what we need in there, and I don’t mind doing the packaging, just our stuff has a large dependency radius which makes it somewhat harder to build a conary package (which we don’t seem to be able to do anyway) easily. Now here is the history, and what I like about rPath. In 1996 my then employer asked me to solve a problem. They wanted an easy method to load Irix onto many workstations at once. The constituted technical authorities in the company could not understand why anyone wanted to do anything other than walk around with a CD, and the customers could not understand why we couldn’t do anything like Sun’s JumpStart. 4-5 lines of typing and off it went. I developed something that later was re-developed into SGI’s Roboinst. We could load thousands of machines by running a 3 line shell script, with two of the 3 lines being shell loop control structures. The customer liked it, and used it (over Roboinst!) until they ditched Irix a few years ago. It was also open source, way before this was considered “cool”. During this time, I became enamored of minimal installations. I wanted to find the minimum footprint I could install on a disk, to have it be completely functional, and yet avoid extraneous garbage. Minimal is good. Less possibility for nasty things to interact, for collisions, and so on. It was hard, but I eventually got Irix 6.2 down to about a 400 MB footprint. From there the customer loaded their own stuff, and off they went. I had thought of the machine as a blank slate upon which to put packages. Just what was needed for the task. Nothing more, though if they wanted more, I wanted it to be really easy to add more. BTW: as much as I complained about conary, SGI’s package format was far worse. But this is a digression. The idea that stuck with me was as indicated, load the minimum possible on each machine to do the work you need. Make it an appliance if at all possible. The idea wasn’t popular when I described it in the late 90s, but the IT folks who used the installer that implemented this, appreciated not having to go through a myriad of conflicts, and a dependency hell. Any of that sound familiar? Packages come with dependencies. If you list out all of the first level dependencies that a package depends upon, e.g. the products that it cannot do without, you will find a second set of packages that your package depends upon. Unfortunately you need to recurse this algorithm to find out what you need to install in order to make sure your needed package runs on a “base system”. A “base system” is a system is a minimally configured OS, with enough to boot, find itself on the net, talk to and listen to others, talk to and control its hardware. Base systems should be small footprints. Tiny really. A dependency radius is how many additional packages you need to install to satisfy your requirements for your package to install properly and function. A small dependency radius is always better than a large one, if for no other reason, a smaller number of “moving parts”, and a smaller number of potential conflicts. A conflict arises when there is an overlap of the provisioned files or controls that two packages want; the more you have to install, the higher the probability of such things. But wait, you say, what has this to do with rPath? Let me get there, I am laying the groundwork for why I believe the concept of rPath is brilliant. Sort of a side effect of what they claim to do, it also has lots of interesting management and support benefits. Smaller dependency radii mean fewer package installs, less software to maintain, fewer potential security holes (with fewer packages). It also means that you can generally customize the system to focus upon what you want it to focus upon. You can build it to be an appliance. This is what rPath touts. Put in the packages you need, and only the ones you need, and it will build a custom distribution for you with just what you need to support your mission. The custom is not really custom, as it is reusing lots of components. Think of it as object oriented distribution management. Only include the objects you need. This is what they say is the brilliant part. I disagree. This is a good part. The brilliant part is that it does something that many of us have wanted to do for a while to gain the benefits listed below:

  * reduce the disk footprint of a distribution
  * reduce the package radius of a distribution
  * remove ancillary things that users can get into big trouble with if they play with
  * lower the installation time due to a smaller image
  * gain control over what actually gets onto disk

Now here is the brilliant part. This concept allows us to

  * Lower support costs by supporting less stuff
  * Lower support costs by reducing potential conflicts
  * **Easily** play "what-if" scenarios, and have controlled branches off baseline configs as business requirements dictate
  * Ship pre-configured _really working and pretested_ environments to end users and customers
  * Lower hacker risks by not simply protecting resources with blanket firewalls, but physically avoiding the installation of potentially compromisable systems
  * Raise security levels by again avoiding installing additional potential entryways into a system
  * Raise security levels by allowing the unit to run in a VMWare session as part of the build

That is, while rPath allows you to ship an appliance, the concept brings along lots of baggage, and most of it is very good. So why did I “dis” rPath in my article yesterday? Well, I didn’t “dis” it, we use rPath in our OpenFiler based products. I do have two problems with rPath though: Conary, and their package build system. I cannot wrap my mind about either one, and I don’t seem to be able to generate a working system. This is enormously frustrating. They have a wizard there which I have used, and it generates something that looks like a distribution, but I have no fine grain control, or do not understand how to get fine grain control of what gets in there. We have a project we are trying to build there (xluster) that I am really not having much luck with it. I want to incorporate several specific OSS packages for which I need to have conary packages built. I have tarballs, but no conrete examples of conary packages, no “do this that and the other thing” documentation. Add to this that conary packages are python source; while I don’t have a visceral dislike of python, I find many things about the language extremely counter-intuitive … I can and have easily destroyed a Python program structure with a simple block paragraph reformatting, which I regularly do with Perl, C, and others. This should never happen with a modern programming language. That in and of itself usually has me running screaming away from such languages. Having grown up with Fortan 77 compilers and other formatting woes in the past, I vowed “never again”. Having block-flowed large chunks of fortran code in the past, I developed a dislike of rigidly formatted languages that continues unabated to this day. Basing Conary atop Python, as the language one writes a package in, as compared to XML, YAML, … and having conary be the thing which parses/interprets the file is IMO, wrong. I am aware of lots of good reasons to add programmability to package management, I believe in this, it is a good thing, and something RPM does horribly horribly wrong. There is a need for clarity and control, for transparency. Yesterday I had to ask on the RPM list how I could stop RPM from stripping symbols out of a kernel module we were building an RPM for. Turns out most of the posted solutions did not work, I had to work around RPM’s “generous help” here by compressing my module before packaging, and uncompressing it in the %post section. This is wrong, I shouldn’t have to do this. That said, it is not apparant to me how to package up my perl modules with Conary, or how to take an existing conary “trove” edit it, change its config, and return it to our repository. It isn’t clear how to take the baseline kernel, build modules against it which are inserted into the kernel and initrd, needed for some of the booting we are planning on doing. In short, there is still much that is missing from this, using their package tools, using their distribution build tools. Yes, some people have successfully built a number of interesting appliances. I would love to see how they packaged up their unique content. A nice example would help. And step by step instructions. Last I checked (within the last few weeks) no such thing existed. I don’t have time to reverse engineer their system to figure out how it works, I have stuff I have to deliver. For the moment we are using stripped down OpenSuSE distributions: I can get everything into about 700 MB of space after install. I would use RedHat, though as noted, RedHat is missing xfs, modern kernels and lots of other bits. Could use Fedora, though as it is the testbed for RedHat, it is changing so rapidly, and I am reluctant to suggest customers use a moving target for a supportable system. I would would likely switch from using OpenSuSE as our primary platform, if I could make the rPath do what I want. So to wrap this up, I like the ideas behind rPath. Some of the benefits are the real value I would suggest they talk about, the appliance bit is a nice element for developers, but the developers need to sell the product to customers, and customers are concerned with ease of support, security, and so on. The OS is simply something to run the application. This is what lots of people forget, and rPath has got that right. What I think they have not gotten right is the packaging system, and the build system. I make a point about great technologies to lots of people telling me how wonderful they are. The point I make is that if you erect barriers of any sort to usage, you are going to reduce the utilization of your product, and you are helping your competitors. Barriers include high prices (FPGA parts and development environments), painful development tools, poor security models and implementations, risk enhancing operations, and related. From an ISV perspective, we want to ship appliances, not for lock in, but for ease of support. The barrier to using their system for us is the conary tool and the system builder. I think they are solvable problems, and it is quite likely that they disagree that these are problems. Technologists love building things. They fall in love what they build. They grasp what they build. They may not grasp that others don’t quite understand how to use what they have built, often need working examples to beat on and play with, and that frustrates the others, who will then go away and use something else. And that is sad, as the concept behind rPath is just so good.