Today I read an essey by Sean Russell called “RPM Hell. A Perfect Example of Good Software Crippled by Bad Design“. In this essey Sean is explaining why he does not like RPM to the level when he is prepared to change the Linux distribution he is using.
I, on the other hand, do like RPM. It has it’s problems but what doesn’t. Some of Sean’s arguments I disagree with and hence this post. Before reading it, please read the original essey by Sean, since I am quoting only the relevant parts.
P.S.: I have notified Sean Russell via email about this post.
This is a rant. A long, bitter rant about the inequities of life and an unjust, unfeeling Universe.
My post is a rant too. It is a rant about people who should learn their software before complaining and about people who misuse and/or misunderstand the purpose of software they are talking about.
It is a rant about how poorly thought-out software can kill a really great Linux distribution, and turn away proponents.
All Linux distributions are based roughly on the same software. Minor things differ with packaging being the main one. RPM is used by few distributions, some of which are really really successful. Like Red Hat Linux and Fedora Linux.
I’ve started articles on Why RPM Sucks a dozen times. Each time, I’ve never finished the article, in part because I didn’t have an answer for RPM, and when I’m a good boy, I avoid complaining about things when I don’t have a solution. Eventually, I realized that there are some inherent problems with modern operating systems that cause problems for all package management systems. Linux suffers from these problems as much as any other operating system, and I haven’t seen any OS that solves these particular problems. Perhaps there is no solution.
Perhaps you are misunderstanding the purpose of the package management system.
I’ve finally written this paper because, over the course of seven months, my Mandrake-based laptop system has slowly degraded to the point where it is broken, and will be easier to fix by re-installing the operating system than by trying to fix the RPMs.
This part suggests that you didn’t really learned and understood what RPM is, what it does, and how it does it. If you did so, then your system would be clean and shining as many systems out there that do use RPM to keep them clean and shining.
For those of you who use RPM and think you like it, stop and think for a moment, and ask yourself if this sounds familiar:
* You want to install some software package. You look for a package, and find several dozen packages, of varying versions, for different Linux distributions. From experience, you know that none of the RPMs built for other distributions will work on your system. In fact, if it isn’t built specifically for the version of the distribution you have installed, it ain’t gonna work.
Why are you even considering RPMs packaged for other distributions or other versions of your distribution? I find it rather illogical to attempt installing RPMs for Fedora Linux on Mandrake Linux and then complain that they don’t work. Did anyone promised that they will work? How did you come up with that idea anyway?
The reason that there are so many Linux distributions is that people like choice. They like having things differently. Some people prefer installing additional software into /usr/local, others do like /opt. Some people prefer web sites to live in /home/httpd, others do think that /var/www is more appropriate, yet some other people are sure that /srv/httpd is the right place.
Maybe you are confused about the fact that several distributions use RPM for packaging. Don’t be. Just think about it. For example, traditional way of distributing and installing software for UNIX systems was downloading a compressed tar archive, extracting files from it and building it with “./configure && make && make install” routine. Noone was even reading installation instructions for some time. It just worked. Now, would you try to do the same while you are on a Windows machine? I am pretty sure you won’t. But there is a way, of course, to make a lot of software work by compiling it. You just have to install tar, gzip, configure, make, and a C compiler. Would you call it a Tar.gz Hell now?
Use the packages that were built for the system that you are using and you will be OK. You will have satisfied dependencies and you system will stay clean and shining.
* You manage to find the exact RPM you need. You try to install it, and discover that you have to either install some other software, or upgrade some part of your system.
If you are using the package which was designed for your system, you’ll see this rarely. But it is possible. For example, if you don’t have the complete system installed. In that case it is quite easy to add the missing packages. In another scenario, authors of the software that you want to install decided to use the component which is not the part of your system. Good practice in this case is to point users to the packaged version that is design for the appropriate version of the distribution again. Alternatively, you can fallback to source building. But this all doesn’t little to do with RPM. If authors of the software that you need decided to not package it properly for the specific distribution, you do have a chance to run into trouble. This can happen with any package management software.
You go find the other pieces that you need, and download and try to install them. You discover that you need yet some more packages upgraded or installed to satisfy dependencies on those RPMs.
Again, this is not the problem of the packaging. This is the problem of software distribution. It is very inefficient to search for packages satisfying the dependencies manually. That’s why there are programs that take care of it. For example, apt and yum. The reason that these two are separated is because they have to handle two different tasks.
Package manager must provide ways to query the package database for the information about which packages are installed, which files are installed by which packages, wheather the files on the disk are the same as those that were installed by the packages, etc. Package manager should also provide ways to query package files about what they provide and require.
On the other hand, software like apt and yum (I call them distribution managers just for simplicity) must provide ways for easy updating and upgrading of the system. Distribution managers should solve dependencies when installing new software or upgrading old one. They should decide on the best options when few are available. And most of them do it. Nicely.
* Repeat the previous step ad nauseam. Applications such as urpmi can alleviate this, but they can’t fix the following problem, which is:
Is it just a way to make four points out of three? :)
* Eventually, if you’re lucky, you get the software installed. More commonly, you’ve broken some other piece of your system — “A” depends on “B” version 2, but “C” depends on “B” version 1, and there isn’t a newer version of “C” that is happy with “B” version 2. You have to choose between “A” and “C”…
This happens because you try to install software package that was not packaged for your distribution. Or for the version of distribution that you are using. Use distribution manager such as apt or yum as I said above.
…Even more commonly, you find yourself installing software that you never wanted or needed, but is a required install because of a bad dependency tree. “A”, a purely text-based application, depends on “B”, which depends on “C”, which depends on “D”, which depends on X11. This happens so often, it can only be seen as an architectural design flaw in RPM.
And again you are on the wrong track. Main reason for package managers is to provide simplicity and integrity. Installation of the software that you don’t need is done to maintaint simplicity. If you don’t like the choices of the package, then you are always welcome to package software yourself or use source distributions. Usually people who want simplicity and people who want everything to fit their choices fall into two different groups of users.
Again, RPM has nothing to do with this. If you don’t like the dependencies that are braught by RPM packaged at Red Hat, you can repackage the software as you like. It will be a different RPM that fits you perfectly. Does it have any problem now? Nope.
All this is fine — it provides enough infrastructure to build a system that is reasonably robust and efficient. There hasn’t been an OS that doesn’t have dynamically linked libraries in a long time, and they all tend to work pretty much the same way. The problem is that most operating systems are set up so that they can’t take advantage of this. Linux is no exception, and much of that is the fault of the Linux Standard Base, or LSB.
I will get to LSB in a moment. What you are talking about here is control. This is to be expected from the Java guy (no offense). The only way to make everything work is to tightly control it. Exactly like Sun Microsystems control Java. Better place to read about it is Eric Raymond’s “Cathedral and Bazaar“. You are talking about cathedral while there is a good reason things are done in bazaar way. Operating system provides the basics. The bare minimum. Whatever else you need is your problem to solve. It is ugly in some sense, yes. But you do have a way of solving the problem. With the cathedral thing, you will die quickly. There is simply no way for a single entity to satisfy all needs. And when there is no way for people to satisfy their needs you are left only with people who’s needs you are satisfying. Which is usually much less of what you can get with the bazaar.
The LSB is intended to be a way of standardizing how Linux software distribution file systems are laid out. The goal is to provide an common environment so that application authors can build software that will install on multiple target distributions. Unfortunately, the LSB is working from a bad foundation, the standard Unix file system layout. The LSB itself is a fine project; given what they’re working with, they’re doing the right thing. However, they’re just adding support to a bad design — it may work, but it isn’t right.
LSB is much wider than just a file system structure. You are confusing LSB with one of its parts – FHS – Filesystem Hierarchy Standard. Here is a quote of the introduction to the LSB version 2 candidate 2, taken from here:
The LSB defines a binary interface for application programs that are compiled and packaged for LSB-conforming implementations on many different hardware architectures. Since a binary specification shall include information specific to the computer processor architecture for which it is intended, it is not possible for a single document to specify the interface for all possible LSB-conforming implementations. Therefore, the LSB is a family of specifications, rather than a single one.
This document should be used in conjunction with the documents it references. This document enumerates the system components it includes, but descriptions of those components may be included entirely or partly in this document, partly in other documents, or entirely in other reference documents. For example, the section that describes system service routines includes a list of the system routines supported in this interface, formal declarations of the data structures they use that are visible to applications, and a pointer to the underlying referenced specification for information about the syntax and semantics of each call. Only those routines not described in standards referenced by this document, or extensions to those standards, are described in the detail. Information referenced in this way is as much a part of this document as is the information explicitly included here.
It is long known that pure file system locations are not enough for software interroperatibility. Many more things must be defined, like versions of the libraries, API specifications, data structures, etc.
* RPM specs are complicated and are difficult to build correctly. As a consequence, almost nobody does build them correctly. The most commonly used switch god Linux admins use is “–badreloc”.
What is so difficult about RPM specs? My guess is that it is flexibility that you find difficult. But if that is so, then programming languages are difficult to use too. And unless you do know at least one programming language you have no reason what-so-ever to use RPM specs. RPM specs are just programs. The goal of these programs is to compile your software properly and package the resulting files. As with any other program there are several “correct” ways to do it. As with any other program you should go from simple to complex. Study the documentation and examples from other packages – as usual, it is all there.
And I do still have to see a “god Linux admin” that uses “–badreloc” option…
* * has only one type of dependency: hard. Much software has optional support for some features. For example, PostgreSQL can be compiled with some GUI tools. On a headless server, these GUI tools may not be desired. With RPM, your options are binary: you can either build the RPM with a dependency on the GUI components, or not. There is no option at install time to resolve the dependencies and “do the right thing” with the install. As a result, packages such as PostgreSQL are distributed as a bunch of separate RPMs, each providing a different feature. This quickly becomes unwieldy.
RPM packages contain binary files, which were already compiled by the packager with the certain set of options. If certain functionality is compiled into a program, users can be pretty much expected to use it. Therefor RPM package must require anything additional that is needed for the functionality to work. I have always considering it as a Good Thing™.
Splitting of complex software into several packages is also a Good Thing™ in my opinion, since it minimizes the amount of unneeded software that you need to install. This is something you have complained about too, remember? For example, why should I install PostgresSQL server if I only need a client? Why should I install development stuff, when I am not going to develop and compile anything? Seems pretty clear to me.
Still, if you are stubborn and you know what are doing, RPM does provide a couple of options for you to mess around. One is called “–nodeps” and, as you can guess, it will ignore all the dependancy checks and will just install the software. Another is called “–force”. With this option RPM will overwrite any important files and do other crazy things. But you get what are asking for.
* * has only rudimentary dependency resolution. If package A depends on package B and package B depends on C, when you try to install A, it only tells you about the dependency on B
How does this matter? Distribution managers take care of the dependancy checks. Why do you care what this programs tell to each other?
* * has a low-grain dependency version mechanism. An RPM spec builder can’t, for instance, say that package A requires some version of package B, where the version is greater that 2.0 but less than 3.0, and not version 2.3. In fact, you can’t specify that any version less than 2.0 is acceptable.
You are wrong again. Here is a quote from the book “Maximum RPM: Taking the Red Hat Package Manager to the Limit” by Edward C. Bailey of Red Hat Inc, taken from here (You can read the whole book for free and online here):
Adding Version Requirements
When a package has slightly more stringent needs, it’s possible to require certain versions of a package. All that’s necessary is to add the desired version number, preceded by one of the following comparison operators:
- Requires package with a version less than the specified version.
- Requires package with a version less than or equal to the specified version.
- Requires package with a version equal to the specified version.
- Requires package with a version equal to or greater than the specified version.
- Requires package with a version greater than the specified version.
In fact, the same chapter in the book describes even tighter control for situations when version number is not so easy to compare (e.g.: 7.6a vs. 7.6).
As a result of this weak dependency management, package builders either choose the simplest dependency possible, or build their own complex, verbose dependency rules into RPM. However, no matter how nice your RPM dependencies are specified, the whole system can fail if just one package you depend on has a bad dependency specification. In short, RPM dependencies are fragile.
There are ways to generate dependency lists automatically. This is a great help to developers when dealing with complex software packages.
As to the fragility of the RPM dependencies, they were made so on purpose. It was done because software dependancies themselves are very fragile. Better safe, then sorry.
* RPM has no built in dependency resolution mechanism. That is, RPM doesn’t know where to go to get packages to resolve dependencies. Now, there are tools that sit on top of RPM to do this; urpmi, rupdate, and so on. However, in practice, this lack of awareness of package resolution in RPM shows up as strongly crippling any third-party resolution tool. This is because:
As I suggested before, distribution managers liek urpmi, apt, and yum should be used for resolution of dependencies. This is the part of the bazaar. And that is in fact a Good Thing™.
* RPM packages are monolithic. That is to say, the package specifications are part of the entire package. Why is this a problem? Because to be able to resolve dependency trees, you have to have access to all of the packages in the tree. Here is the main failure of RPM: you can’t get intelligent queries out of it. The O() number of packages you need to query any dependency tree is all of the RPM packages in the entire world.
That is why, once again, you should use distribution managers. These programs work with repositories. Repositories do provide full information about all available packages separately. Distribution managers know exactly which packages they need to download before they start downloading them.
* RPM makes it hard to install multiple versions of the same package on a system. This isn’t the sole fault of RPM; the LSB — the legacy left to Linux by Unix contributes much to this. Since two versions of the same package tend to install the same files in the same place, conflicts are common.
Again, package managers are designed for simplicity and integrity. Flexibility is always a second choice. But still you do have it. If you are in the rare situation (usually you have one version of a program installed) when you need two versions of the program installed, you can rebuild a package with different installation path. Or you can install it into different RPM database (for example, in some user home directory).
* RPM is fundamentally a global software installation mechanism. It can’t be used (not without a lot of pain) by non-admin users. This makes it useless for a distribution mechanism for shared systems.
Once again you are wrong. You can have as many RPM databases on the system as you want. You can manage normal user software with RPM. That is very trivial. All you have to do is create a certain (well-documented, by the way) directory structure and a configuration file. In fact, there are many people who use it. A huge part of RPM packaging is done with regular user access rights. Administrator (superuser, root) access rights are required for building a small share of programs. And usually those programs require administrator access rights to be used anyway. So what’s the problem?
* Finally, RPM is stupid. If a dependency isn’t in the database, it doesn’t exist. For example, if I have library X version Y installed, RPM will insist that it isn’t installed merely because it isn’t in the database. I’m sorry, but this is just retarded. It would be trivial to check ldconfig and see if the library exists.
Programs are supposed to be stupid. They must do only what they were told to do. With your example, there can be easily made a case when you would complain that RPM is too smart, because it will be sure that you have something, when in fact you don’t (a broken installation, development version, etc). Again, for comlicated cases you have a “–justdb” option, which can come handy.
The end result is that RPM-based installations go one of two routes, which both end up at the same place. Either they get upgraded regularly, slowly degrading as various pieces of the system fall out of RPM sync, until an entire re-install of a newer version is needed; or, they don’t get any extra software installed on them until, at some point, the system is so old that it needs to be re-installed with a new version. In both cases, woe to the systems admin who has to go back and re-install all of the third-party software that was previously on the system.
Your conclusion is based on false arguments. And it is incorrect. The only way to mess up an RPM-based system is to do things which are documented as “do not do”. When you ask for a mess, you get a mess. There is nothing wrong with it.
I have skipped the Portage part of your essay since I haven’t worked with it and I cannot comment on it. That part also has nothing controversial to everything that has been said, as far as I understand it.
RPM does have its problems. Everything does. But RPM’s problems are not the ones that you mentioned. I think I have demonstrated that you haven’t got yourself familiar with RPM enough to complain about it. I do understand that you might like and prefer other systems to RPM. I respect that. But, please, do not complain about things not being there while they are. And do not complain about things RPM cannot do when it was never supposed to do them. You might accidentally confuse someone. :)
I am open to discussion on the issue, so if you feel like it, please let me know.