No more print for Encyclopaedia Britannica

New York Times reports, somewhat sadly, that Encyclopaedia Britannica will not continue with the printed version any more.

After 244 years, the Encyclopaedia Britannica is going out of print.

[…]

In an acknowledgment of the realities of the digital age — and of competition from the Web site Wikipedia — Encyclopaedia Britannica will focus primarily on its online encyclopedias and educational curriculum for schools. The last print version is the 32-volume 2010 edition, which weighs 129 pounds and includes new entries on global warming and the Human Genome Project.

Via Slashdot.

New University of Cyprus library

Cyprus Mail reports that the construction of the new University of Cyprus library has begun.  It will take a while, so the doors are expected to open some time in September, 2014.  While reading through the article, one particular paragraph took me a while to understand.

The library’s collection, which will be housed in an impressive dome-shaped building holding  around 600,000 books, more than 30,000 magazines and 40,000 books all in digital format plus 10,000 audio books and 150 databases. Its contents will be accessible to all Cypriots.

My first thought was that the library will hold 600,000 books in digital format and that the new building is being constructed to accommodate that storage.  I thought that was a bit excessive.  After all, I used to have an e-book library of more than a 1,000 titles and they were living nicely on a single hard disk.  Digital storage is cheap these days and the size of drives keep growing.  How much space does one need to store 600,000 books in digital form? – I thought.

The size of books in my collection are somewhere between 500 kilobytes to a couple of megabytes.  Let’s assume 1 megabyte for an average book.  How much space is there on a modern hard drive?  I’ll assume 2 TB (terabytes).  How many average books can we store on such a disk? 2 TB / 1 MB = 2,000,000,000,000 / 1,000,000 = 2,000,000.  I know, I’m approximating things a lot with terabytes, megabytes, and average book sizes.  But with a single 2 TB disk holding 2,000,000 books, give or take, I don’t think a new building is in order.  3 TB and 4 TB hard disks exist already.  By September 2014 we’ll probably have way more than that.  Even a few of those connected together for backups, “150 databases” and such will provide a lot of storage, while being the size of a device that is easy to hide at home.  New building? Really?

Of course, once I re-read the paragraph a few times, I realized that I’m on a totally wrong foot here.  It read more like:

  • 600,000 books (print)
  • 30,000 magazines and 40,000 books (digital)
  • 10,000 audio books (digital)
  • 150 databases (digital?)

While the digital part of that library will easily fit on one or two hard drives, the 600,000 printed books collection does indeed need some storage space.

I am all for knowledge and education, and I’m glad that this effort is being taken and that all these books will be available to all Cypriots.  But if I was to express a wish, I’d say : please push for digitizing all those books and make them available on-line.  Cyprus is good, but why not share with the rest of the world?  Especially now that we do have the technology.

Learning about Markov chain

I’ve been hearing about “Markov chain” for long enough – it was time I learned something. Wikipedia seemed like a good starting point. I have to warn you though, be careful with scrolling on that page, because you can easily end up looking at something like this:

partial Markov chain

If you aren’t a rocket scientist or someone who solves integrals for fun, by all means, use the contents menu or jump directly to the Applications section.That’s where all the fun is. Here are some quotes for you to get interested and for me to remember.

Physics:

Markovian systems appear extensively in physics, particularly statistical mechanics, whenever probabilities are used to represent unknown or unmodelled details of the system, if it can be assumed that the dynamics are time-invariant, and that no relevant history need be considered which is not already included in the state description.

Testing:

Several theorists have proposed the idea of the Markov chain statistical test, a method of conjoining Markov chains to form a ‘Markov blanket’, arranging these chains in several recursive layers (‘wafering’) and producing more efficient test sets — samples — as a replacement for exhaustive testing.

Queuing theory:

Claude Shannon’s famous 1948 paper A mathematical theory of communication, which at a single step created the field of information theory, opens by introducing the concept of entropy through Markov modeling of the English language. Such idealised models can capture many of the statistical regularities of systems. Even without describing the full structure of the system perfectly, such signal models can make possible very effective data compression through entropy coding techniques such as arithmetic coding. They also allow effective state estimation and pattern recognition

Internet applications:

The PageRank of a webpage as used by Google is defined by a Markov chain.

and

Markov models have also been used to analyze web navigation behavior of users. A user’s web link transition on a particular website can be modeled using first or second order Markov models and can be used to make predictions regarding future navigation and to personalize the web page for an individual user.

Statistical:

Markov chain methods have also become very important for generating sequences of random numbers to accurately reflect very complicated desired probability distributions – a process called Markov chain Monte Carlo or MCMC for short. In recent years this has revolutionised the practicability of Bayesian inference methods.

Gambling:

Markov chains can be used to model many games of chance. The children’s games Snakes and Ladders, Candy Land, and “Hi Ho! Cherry-O”, for example, are represented exactly by Markov chains. At each turn, the player starts in a given state (on a given square) and from there has fixed odds of moving to certain other states (squares).

Music:

Markov chains are employed in algorithmic music composition, particularly in software programs such as CSound or Max. In a first-order chain, the states of the system become note or pitch values, and a probability vector for each note is constructed, completing a transition probability matrix

Markov parody generators:

Markov processes can also be used to generate superficially “real-looking” text given a sample document: they are used in a variety of recreational “parody generator” software

Markov chains for spammers and black hat SEO:

Since a Markov chain can be used to generate real looking text, spam websites without content use Markov-generated text to give illusion of having content.

This is one of those topics that makes me feel sorry for sucking at math so badly. Is there a “Markov chain for Dummies” book somewhere? I haven’t found one yet, but Google provides quite a few results for “markov chain” query.