Charlottesville was a lot in the news recently. I didn’t pay much attention, but now I see why. This is crazy. It almost feels unreal, like a really long trailer or a promotion video to a new movie. But it’s not. It’s real life and it’s happening now.
It’s far from funny, but standup comedians are often some of the smartest people, with excellent observation skills and the unbeatable use of words. So here’s Jim Jefferies take on this, with which I agree wholeheartedly.
This GitHub repository contains a list of the 10,000 most common English words, sorted by frequency, as seen by the Google Machine Translation Team.
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google’s datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there’s no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more – resulting in a training corpus of one trillion words from public Web pages.
We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That’s why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.
There are a few variations of the list – with and without the swear words and such. I took a quick look at it and was surprised to find that “cyprus” is at position 4,993 (pretty high), immediately after the word “emails“. Weird!
Every election matters and every vote counts. The American democracy relies on everyone’s participation in the political process. This November, Americans all across the country will line up at the polls to cast their ballots for the President of the United States.
It sounds like a true effort to make things better and enhance democracy and what not. But in practice, is it really an election? One by one, the candidates are falling of the ballot. Day by day it becomes more obvious that Hillary Clinton will be the next president of the USA.
The more tools and technologies we have to enhance our lives, the worst the content on which we can apply those tools becomes. The better the home cinemas became, the worse the movies got. The better audio systems we have, the worse the music gets. And politics just follow the same trend, unfortunately.
National Cancer Institute has an interesting update on cannabis … Basically, marijuana is not yet universally approved as a medical treatment for cancer (only in a few states for now), but quite a few large studies suggest that not only it’s not harmful, but quite helpful for both cancer treatment and post-treatment relief.
I think this is a good step in the direction of “the world is not black and white”. We’ve been tagging everything as just good or bad for way too long. It’s time to start looking at benefits and side effects in a bit more detail.
Here is something that I don’t need now, but I’m sure the day will come when I’ll be looking for a resource like this – 800-Numbers. It’s a categorized listing of a whole lot of companies with their 1-800 toll free numbers.