RethinkDB: why we failed

Startups are born and gone every single day.  Much more often so in technology sector.  Most of these just disappear into the ether.  RethinkDB at least leaves the useful trace of analysis of what happened and why they failed.

When we announced that RethinkDB is shutting down, I promised to write a post-mortem. I took some time to process the experience, and I can now write about it clearly.

In the HN discussion thread people proposed many reasons for why RethinkDB failed, from inexplicable perversity of human nature and clever machinations of MongoDB’s marketing people, to failure to build an experienced go-to-market team, to lack of numeric type support beyond 64-bit float. I aggregated the comments into a list of proposed failure reasons here.

Some of these reasons have a ring of truth to them, but they’re symptoms rather than causes. For example, saying that we failed to monetize is tautological. It doesn’t illuminate the reasons for why we failed.

In hindsight, two things went wrong – we picked a terrible market and optimized the product for the wrong metrics of goodness. Each mistake likely cut RethinkDB’s valuation by one to two orders of magnitude. So if we got either of these right, RethinkDB would have been the size of MongoDB, and if we got both of them right, we eventually could have been the size of Red Hat[1].

Thank you, guys.  There are valuable lessons in there.  And three points, of course:

If you remember anything about this post, remember these:

  • Pick a large market but build for specific users.
  • Learn to recognize the talents you’re missing, then work like hell to get them on your team.
  • Read The Economist religiously. It will make you better faster.

Ekisto – visualizing online habitats

Slashdot is linking to Ekisto – a project to visualize online communities like if they were cities.  So far there are only GitHub, StackOverflow and Friendfeed (really? Friendfeed?).  I’ve seen plenty of data visualization, especially for GitHub, but I have to say that this is one of the most interesting ones ever.

github visualization


Here is a quote from the About page that explains how it works:

Ekisto comes from ekistics, the science of human settlements.

Ekisto is an interactive visualization of three online communities: StackOverflow, Github and Friendfeed. Ekisto tries to imagine and map our online habitats using graph algorithms and the city as a metaphor.

A graph layout algorithm arranges users in 2D space based on their similarity. Cosine similarity is computed based on the users’ network (Friendfeed), collaborate, watch, fork and follow relationships (Github), or based on the tags of posts contributed by users (StackOverflow). The height of each user represents the normalized value of the user’s Pagerank (Github, Friendfeed) or their reputation points (StackOverflow).

BayesDB – a Bayesian database table for querying the probable implications of data

BayesDB – a Bayesian database table for querying the probable implications of data

BayesDB, a Bayesian database, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.

BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.

BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.

Reportr – Your life’s personal dashboard

Reportr – Your life’s personal dashboard

Reportr is a complete application which works like a dashboard for tracking events in your life (using a very simple API). With a simple interface, it helps you track and display your online activity (with trackers for Facebook, Twitter, GitHub, …) or your real-life activity (with hardware trackers or applications like Runkeeper).

The project is entirely open source and you can host your own Reportr instance on your own server or Heroku.

TEDxNicosia speakers report

TEDxNicosia 2013 is just a few short hours away.   As I mentioned previously, I am very excited, and I keep thinking about it.  One particular thought was bugging me all day today – how are the speakers being selected, and is there anything common among them? Do they share any specific knowledge or experience, or personal characteristics?  Not knowing any of the speakers personally, I decided to go for some fun PHP scripting rather than any serious research.  It’s Friday after all!

The result is this little project.  I basically took the 12 speaker profiles directly from the TEDxNicosia speakers page, and used it as my source data.  Each profile is saved into a text file with the name of the speaker.  Then I ran some simple analysis on those text files.   First, I wanted to see if their profile texts were sharing any common words.  That would be an indication, right?  Obviously, I had to filter out some words like ‘as’, ‘and’, and ‘he’ (see a full list of filtered out words).  For the rest, here is the top 20 most common words (by the way, the script reports the names of speakers as well, but I took it out for clarity and simplicity):

  1. Cyprus, shared by 11 out of 12 profiles;
  2. years, shared by 10 / 12;
  3. university, shared by 8 / 12;
  4. international, shared by 7 / 12;
  5. world, shared by 7 / 12;
  6. work, shared by 6 / 12;
  7. national, shared by 5 / 12;
  8. media, shared by 5 / 12;
  9. currently, shared by 5 / 12;
  10. including, shared by 5 / 12;
  11. well, shared by 5 / 12;
  12. all, shared by 5 / 12;
  13. life, shared by 5 / 12;
  14. first, shared by 5 / 12;
  15. people, shared by 5 / 12;
  16. USA, shared by 5 / 12;
  17. development, shared by 4 / 12;
  18. London, shared by 4 / 12;
  19. business, shared by 4 / 12;
  20. experience, shared by 4 / 12;

Interesting, isn’t it?  The easiest to notice for me is geography.  The most shared word is Cyprus, which is not surprising, because the TEDxNicosia event is happening, here, in Cyprus, and because most of the speakers either live here, or were born here, or moved here.  the other two geographical highlights are the USA and UK (London specifically).  These are the most influential, however there are indications of other travel (national, international, world).

One other thing which stands out is hard work.  It is suggested by work, all, life, development, business, and experience.  It sounds like all these people know what they are talking about.  Especially if you throw in university in there.  Also, first is indicative of either trying new things or of leading somewhere.

The rest might also mean something, but they don’t stand out so much.  At least not to me.   Except maybe if I put together media and people.  Then there is a sort of social suggestion.

After reading speakers’ profiles, I think the above is pretty accurate.  Even if it wasn’t, accuracy wasn’t exactly the point.  The whole thing is more of technical entertainment piece.  Oh, by the way, that reminds me.  What does TED stand for?  Technology, Entertainment, Design.  While we are looking at speaker profile words, why don’t we try and see if the TED words are in there too.  A bit more of coding, and here is what I get:

  • technology is represented by 3 out of 12 speakers;
  • entertainment is not represented by anyone;
  • design is represented by 2 out of 12;

Doesn’t sound too good?  Well, that’s because these numbers have very little to do with the actual speakers.  The source data were speaker profiles, which are only a few words long.  If these were worded even slightly different, the results would be completely different.  Just to give you an indication – even though the word ‘entertainment’ haven’t been used, a few other words, such as ‘music’, ‘dance’, ‘film’, ‘book’ were used plenty, and these can easily be used near entertainment.

Now that Friday night is quickly turning into Saturday morning, I think I should grab a few hours of sleep and drive out to Nicosia.  See you all there, or see you all after!

Economic impact of open source on small business

Economic impact of open source on small business

Here are a few of the findings we derived from Bluehost data (an EIG company) and follow-on research:

  • 60% of web hosting usage is by SMBs, 71% if you include non-profits. Only 22% of hosted sites are for personal use.
  • WordPress is a far more important open source product than most people give it credit for. In the SMB hosting market, it is as widely used as MySQL and PHP, far ahead of Joomla and Drupal, the other leading content management systems.
  • Languages commonly used by high-tech startups, such as Ruby and Python, have little usage in the SMB hosting market, which is dominated by PHP for server-side scripting and JavaScript for client-side scripting.
  • Open source hosting alternatives have at least a 2:1 cost advantage relative to proprietary solutions.

Given that SMBs are widely thought to generate as much as 50% of GDP, the productivity gains to the economy as a whole that can be attributed to open source software are significant. The most important open source programs contributing to this expansion of opportunity for small businesses include Linux, Apache, MySQL, PHP, JavaScript, and WordPress. The developers of these open source projects and the communities that support them are truly unsung heroes of the economy!

Via Matt Mullenweg.

Android global market share is at 48%

Canalysis did a world-wide study of mobile markets and published their results.  Make sure to read the whole article – there are many other numbers and trends.

Canalys today published its final worldwide country-level Q2 2011 smart phone market estimates, showing substantial market growth in all regions. Globally, the market grew 73% year-on-year, with in excess of 107.7 million units shipping in the second quarter of 2011. Of the 56 countries Canalys tracks around the world, Android led in 35 of them and achieved a global market share of 48%. Asia Pacific (APAC) remained the largest regional market, with 39.8 million units shipping there, compared with 35.0 million in Europe, the Middle East and Africa (EMEA), and 32.9 million in the Americas.

Android, the number one platform by shipments since Q4 2010, was also the strongest growth driver this quarter, with Android-based smart phone shipments up 379% over a year ago to 51.9 million units. Growth was bolstered by strong Android product performances from a number of vendors, including Samsung, HTC, LG, Motorola, Sony Ericsson, ZTE and Huawei. The final country-level data delivered to clients today shows there were particularly strong performances from Android devices in APAC countries, such as South Korea, where Android holds an 85% platform share, and Taiwan, where it has 71%.

With shipments of 20.3 million iPhones and a market share of 19%, iOS overtook Nokia’s Symbian platform during the quarter to take second place worldwide. In doing so, Apple also became the world’s leading individual smart phone vendor, stripping Nokia of its long-held leadership position.