Steven Black hosts files

StevenBlack/hosts repository:

Extending and consolidating hosts files from a variety of sources like,,,,, and potentially others. You can optionally invoke extensions to block additional sites by category.

Categories include: adware, malware, gambling, porn, and social networks.

Google Street View vs. captcha

Google Online Security Blog shares the news on the innovation in image recognition technology used in Google Street View:

Translating a street address to an exact location on a map is harder than it seems. To take on this challenge and make Google Maps even more useful, we’ve been working on a new system to help locate addresses even more accurately, using some of the technology from the Street View and reCAPTCHA teams.

This technology finds and reads street numbers in Street View, and correlates those numbers with existing addresses to pinpoint their exact location on Google Maps. We’ve described these findings in a scientific paper at the International Conference on Learning Representations (ICLR). In this paper, we show that this system is able to accurately detect and read difficult numbers in Street View with 90% accuracy.

Here are some examples of correctly identified street numbers – quite impressive!

street numbers

What’s even more interesting that pushing this technology for good uses also empowers the evil side of things:

Turns out that this new algorithm can also be used to read CAPTCHA puzzles—we found that it can decipher the hardest distorted text puzzles from reCAPTCHA with over 99% accuracy.


Sendy – Amazon SES-based bulk email software

Sendy – Amazon SES-based bulk email software

Sendy is a self hosted email newsletter application that lets you send trackable emails via Amazon Simple Email Service (SES). This makes it possible for you to send authenticated bulk emails at an insanely low price without sacrificing deliverability.

The RedHat of Drupal

The RedHat of Drupal

Matt Mullenweg shares a piece of hilarious SPAM he received.  This. Is.  CLASSIC!

I apologize for the cold email. I was researching Automattic , Inc. and wanted to ask you if there was any gaps/pains within your CMS and website. I work for the “Redhat of Drupal”, (Acquia) and we have seen an explosion of Drupal use in the Media, News, and Entertainment Industry.

Some companies using Drupal/Acquia include Warner Music, Maxim, NBC Universal, and NPR.
If you are evaluating your current system or are looking into new web projects, I would love to connect and discuss Drupal as an option.

Would it make sense to connect on this? If there is someone better at Automattic , Inc. to speak with, perhaps you could point me in the right direction?

Money scam via Skype

It’s been a long while since someone tried to scam me online.  But today I got lucky.  Someone knocked at my Skype door and I opened it.  Here is the full transcript of the conversation.  Pardon me for having some fun in the process.

[2:26:57 PM EEST] micheal2455: hello
[2:27:06 PM EEST] micheal2455: how are you

Before, when most of my online friends were technical people, a username with numbers in it pretty much guaranteed that you are talking to a spammer or scammer of some sort.  But in recent years a lot of non-technical people got online and all bets are off.  So, I allowed a person in.

[2:27:09 PM EEST] micheal2455: my name is micheal ofori,a regional manager of almal bank limited i discover a domant account what of 5.6MILLION UNITED STATE DOLLARS.Iam looking for a honest person who can help me to move this money out of were i kept it in self keeping custdy.i agree to give you 20% for your mutual help.i do not want my c0-worker to raise eyebrow toward this fund.

That’s a very standard, direct, to the point, proposition.  That’s all you need to see to know with absolute certainty that you are being scammed.  You have two options from now on.  Either end the discussion immediately and block the person from every talking to you again, or try to scam them back, for fun, and see what they have to say.  I’ve chosen the scam path.

Continue reading “Money scam via Skype”

Spam Clock shares shocking numbers

Spam Clock runs the counter of the SPAM websites that were created since January 1st, 2011.  The data is provided by blekko search engine.  And the numbers are staggering.  Every hour a million of new SPAM pages is created. And there I was, thinking that we mostly have a problem with email, where, any ISP in the world will tell you, SPAM messages account for roughly 99.99% of all emails.

Via Download Squad.

SPAM : It should be opt in, not opt out

Cyprus Mail reports that environmental commissioner turned his attention towards piles of SPAM – advertising leaflets distributed by numerous companies to people’s house, mailboxes, and cars.  The initiative to regulate this is very welcome.  However:

Theopemptou insists that a law should be passed to regulate leaflet distribution in streets, cars and post boxes in order to protect the public and prevent the pile-up of waste. One possible measure he recommended was the creation of a special stamp that people could put on their cars, which would indicate that they do not wish to receive advertising material.

I think that SPAM should be opt in, not opt out.  In other words, it’s the people who WISH to receive the advertising leaflets should indicate that they want to, not the other way around.  You can see how well it works in email vs. RSS and Twitter.  In emails, people just send you loads of junk with an option to unsubscribe from it.  First of all, you already received the junk. Secondly, you need to receive the junk to get an option to unsubscribe.  That’s just not fair.   It doesn’t work.  Opt out.  In RSS and Twitter it’s the opt in.  You don’t get anything until you actually subscribe or follow.  Which is all up to you.  And that’s how it should be.

Dear Mr.Spammer

Dear Mr.Spammer,

you are the ugliest and stinkiest piece of human trash, abusing the technology, annoying people, and occupying their valuable time with your silly activities.  Please stop. Now.  If you feel like  you have nothing much else to do, feel free to go to the darkest corner of our  planet, and die!  I hate you.  People hate you.  Computers hate you.  The universe hates you. So, pretty please, with sugar on top, cease to exist.

P.S.: None of your comments will make it to my blog.  Thanks to Akismet, Gmail, and PHP and Perl.

P.P.S: I really hate you very much.

Fighting the spinning spammers

Lorelle has an excellent post covering spinning spam – “Spinning Spammers Steal Our Blog Content“.  As always, the article is full of useful links and insightful quote.

Here is a quote from a linked article – “Protecting Your Content From the Spinning Spammers” – describing the issue:

 […] process of modifying the content before reposting it is often called “spinning”. Spinning a work before republication has several advantages, the largest of which is that Google is less likely to detect the work as a duplicate and, thus rank it higher. However, almost equally important is that it is much harder for victims of plagiarism to detect and follow up on the misuse, making this kind of abuse much harder to stop […]

Here are some helpful tips for detecting the stolen content:

  1. Digital Fingerprinting: Digital fingerprinting is a process by which you append a unique word or phrase to the end of your posts in your RSS feed. If the feed is scraped, so is the fingerprint and searching for that string of characters tells you which sites have taken your content. Since fingerprints don’t have easy translations or synonyms, they remain intact through the spinning process. Plugins such as the Digital Fingerprint Plugin and Copyfeed can automate the process.
  2.  Trackback Monitoring: As was the case with Tony’s original post, spam blogs often leave links in the scraped post intact, even as they modify the copy. They often send trackbacks to those URLs in a bid to get extra incoming links to the spam blog. If you link to your own articles when writing, you can watch the trackbacks and get an idea for who is using your content, even if it is spun.
  3.  FeedBurner Tracking: FeedBurner offers a very powerful “uncommon uses” feature that tracks where your feed is published. Since FeedBurner does not depend upon the post content to track the feed, spinning the text will not fool the system.

I tried digital fingerprinting coupled with monitoring a few times and I have to say it works pretty good.  The way I was doing it though, was on a per article base, not for the whole feed.  I noticed that when my content is stolen, usually just a few articles are taken – presumably those with high ranking keywords.

So, what I do sometimes is invent a new word (wordativity anyone? blogalerting?), stick it in the post, and then setup Google Alerts for this word.  The moment Google indexes something with this word, I am notified either via an RSS feed or an email.  (If you feel really paranoid, you can create a new Twitter account, pipe the RSS feed to that account, and folow it with your main account, so that you get an SMS when stealing occurs.)

Anyway, check the above links for more information about the problem, some insight into legal point of view, as well as how to handle the cases when this happens.  And spread the word too.

All software has bugs

Anyone who had ever wrote more than 3 lines of code will tell you any time that all software has bugs. That’s just the way it is.

And while I don’t need any reminders of this fact (mainly due to me writing a lot of code at any given week), I got one special today.

A SPAM comment was posted to this blog, although you haven’t seen it because it went to moderation, that was clearly a result of a bug in SPAM software. The message contained a long list of phrases like ‘Thank you’, ‘Very interesting’, and ‘I bookmarked your blog’. Obviously these are intended for link SPAM. But they were supposed to be used one at a time. Oops.