BayesDB – a Bayesian database table for querying the probable implications of data

BayesDB – a Bayesian database table for querying the probable implications of data

BayesDB, a Bayesian database, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.

BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.

Bayes theorem history

A fascinating read on the Bayes theorem history:

The German codes, produced by Enigma machines with customizable wheel positions that allowed the codes to be changed rapidly, were considered unbreakable, so nobody was working on them. This attracted Alan Turing to the problem, because he liked solitude. He built a machine that could test different code possibilities, but it was slow. The machine might need four days to test all 336 wheel positions on a particular Enigma code. Until more machines could be built, Turing had to find a way for reducing the burden on the machine.

He used a Bayesian system to guess the letters in an Enigma message, and add more clues as they arrived with new data. With this method he could reduce the number of wheel settings to be tested by his machine from 336 to as few as 18. But soon, Turing realized that he couldn’t compare the probabilities of his hunches without a standard unit of measurement. So, he invented the ‘ban’, defined as “about the smallest change in weight of evidence that is directly perceptible to human intuition.” This unit turned out to be very similar to the bit, the measure of information discovered using Bayes’ Theorem while working for Bell Telephone.

If the whole thing is too much for you, at least read the “Bayes at War” section.