Fighting the spinning spammers

Lorelle has an excellent post covering spinning spam – “Spinning Spammers Steal Our Blog Content“.  As always, the article is full of useful links and insightful quote.

Here is a quote from a linked article – “Protecting Your Content From the Spinning Spammers” – describing the issue:

 […] process of modifying the content before reposting it is often called “spinning”. Spinning a work before republication has several advantages, the largest of which is that Google is less likely to detect the work as a duplicate and, thus rank it higher. However, almost equally important is that it is much harder for victims of plagiarism to detect and follow up on the misuse, making this kind of abuse much harder to stop […]

Here are some helpful tips for detecting the stolen content:

  1. Digital Fingerprinting: Digital fingerprinting is a process by which you append a unique word or phrase to the end of your posts in your RSS feed. If the feed is scraped, so is the fingerprint and searching for that string of characters tells you which sites have taken your content. Since fingerprints don’t have easy translations or synonyms, they remain intact through the spinning process. Plugins such as the Digital Fingerprint Plugin and Copyfeed can automate the process.
  2.  Trackback Monitoring: As was the case with Tony’s original post, spam blogs often leave links in the scraped post intact, even as they modify the copy. They often send trackbacks to those URLs in a bid to get extra incoming links to the spam blog. If you link to your own articles when writing, you can watch the trackbacks and get an idea for who is using your content, even if it is spun.
  3.  FeedBurner Tracking: FeedBurner offers a very powerful “uncommon uses” feature that tracks where your feed is published. Since FeedBurner does not depend upon the post content to track the feed, spinning the text will not fool the system.

I tried digital fingerprinting coupled with monitoring a few times and I have to say it works pretty good.  The way I was doing it though, was on a per article base, not for the whole feed.  I noticed that when my content is stolen, usually just a few articles are taken – presumably those with high ranking keywords.

So, what I do sometimes is invent a new word (wordativity anyone? blogalerting?), stick it in the post, and then setup Google Alerts for this word.  The moment Google indexes something with this word, I am notified either via an RSS feed or an email.  (If you feel really paranoid, you can create a new Twitter account, pipe the RSS feed to that account, and folow it with your main account, so that you get an SMS when stealing occurs.)

Anyway, check the above links for more information about the problem, some insight into legal point of view, as well as how to handle the cases when this happens.  And spread the word too.

How to handle web rudeness?

Web Worker Daily asks an interesting question – “How do you handle web rudeness?

This is probably a nobrainer for people who have been on the web for some time, but newcomers, especially those to the blogosphere and the world of forums, are often puzzled.  It’s very easy to insult someone on the web.  Pick a forum or a blog.  Write a comment.  You’re done.  And sad fact of life is that many people do just that.

Having a blog (albeit not the most popular) for a few years now, I’ve seen some of that rudeness and some of those insults.  Both private and public.  And here is how I go about them.

First of all, I treat both private and public insults equally. I don’t differentiate.  If I can think of the way to make fun of it, I respond publicly. If I can’t, I just delete and ignore the comment.  If I get two or more insults in a row  from the same person, I ban, blacklist, and filter the originating username, IP address, and email.  And then I don’t care.

My thinking is that there is enough crap going on already, to take some more from the Web.  I consider the Internet to be the best thing since… since… since… since the beginning of times.  If something bad is coming out  of it, I either convert it into good (humor, smile, good mood), or I totally get rid of it.  It’s as simple as that.

P.S.: Just to make something crystal clear – with non-insulting comments I respond in the same scope.  If the message was private, I reply in private. If it was public, I reply in public.  Sometimes, if I feel like the public can benefit from a private discussion, I’d ask the permission of the other party to publish the conversation.  To be on the safe side, I’d often forward the preview of the post too, to clarify what exactly will be published.  Insulting comments never get a private reply – it’s either nothing, or a public joke.

The Nerd Handbook – quick guide to the unknown

Via Mark Fletcher’s post came across The Nerd Handbook.  This is a really nice post explaining a few things about nerds.  While details may vary from person to person, the overall picture is pretty accurate of so many people I’ve seen in the IT industry and in some science related areas (mathematics, physics).  Here are a few quotes:

A nerd has a mental model of the hardware and the software in his head. While the rest of the world sees magic, your nerd knows how the magic works, he knows the magic is a long series of ones and zeros moving across your screen with impressive speed, and he knows how to make those bits move faster.

Your nerd lives in a monospaced typeface world. Whereas everyone else is traipsing around picking dazzling fonts to describe their world, your nerd has carefully selected a monospace typeface, which he avidly uses to manipulate the world deftly via a command line interface while the rest fumble around with a mouse.
The reason for this typeface selection is, of course, practicality. Monospace typefaces have a knowable width. Ten letters on one line are same width as ten other letters, which puts the world into a pleasant grid construction where X and Y mean something.

Your nerd loves toys and puzzles. The joy your nerd finds in his project is one of problem solving and discovery. As each part of the project is completed, your nerd receives an adrenaline rush that we’re going to call The High. Every profession has this — the moment when you’ve moved significantly closer to done. In many jobs, it’s easy to discern when progress is being made: “Look, now we have a door”. But in nerds’ bit-based work, progress is measured mentally and invisibly in code, algorithms, efficiency, and small mental victories that don’t exist in a world of atoms.

This post is a better written piece, which is also more accurate than most of those endless lists “You are a nerd if …“.  If you know somebody really weird, working in IT or scientific research, I strongly recommend to read the article.

Java chapter in Android story

Blogosphere keeps providing more and more insights into the Google Android story.  As I mentioned in my previous post, Android platform has a lot to do with Java.  In fact, many people consider the level to which Java is integrated into the platform to be the “big news”, unique and all.  Here is a quote from Simon Brocklehurst’s post titled “Putting The Android SDK In Perspective” (read the whole piece, it’s very good):

Android has integrated the Java platform deeply into the phone. In other words, it’s a native application platform for Android phones. No-one has done this before, and it will allow new types of application to be developed (Google has set aside $10M to give away to developers to stimulate development of such software – I hope young entrepreneurs use this opportunity, some great little companies could be started by following this path). It should be noted that Sun’s forthcoming mobile OS platform, JavaFX Mobile, is based around almost exactly the same concept.

After I read the last sentence, I realized that the story is even deeper than I thought.  Google is jumping into competition with Sun, using Sun’s own Java technology.  How is that possible?  Sun was never known for its generosity.  Did it suddenly change?  And what about Microsoft, who invest heavily into both Java and mobile industry?  How did they let this happen?  And what about all those licenses, alliances, and competition?

Google Blogoscoped has an insightful post titled “How Google Android Routes Around Java Restrictions” which explains a few things.  Here are a few quotes to get you started:

Sun released their “free java” source code under the GPLv2 to both win the free software crowd and capture peripheral innovation and bug fixing from the community. For the java standard edition (aka “the cat is out of the bag”) there is an exception to the GPLv2 that makes it “reciprocal” only for the Java platform code itself but not for the user code running on it (or most people wouldn’t even dare touching it with a pole).
But such exception to the GPLv2 is not there for the mobile edition (aka “where the money is”).
This brilliant move allows Sun to play “free software paladin” on one hand and still enjoy complete control of the licensing and income creation for the Java ME platform on mobile and embedded devices on the other

Dalvik is a virtual machine, just like Java’s or .NET’s.. but it’s Google’s own and they’re making it open source without having to ask permission to anyone

Android uses the syntax of the Java platform (the Java “language”, if you wish, which is enough to make java programmers feel at home and IDEs to support the editing smoothly) and the java SE class library but not the Java bytecode or the Java virtual machine to execute it on the phone (and, note, Android’s implementation of the Java SE class library is, indeed, Apache Harmony’s!)

So, here we are: Apple makes the iPhone, incredibly sweet, slick and game-changing and yet incredibly locked. Google makes Android and not only unlocks development abilities on the mobile phone but also unlocks millions of potential Java mobile programmers from Sun’s grip on it.

This is fascinating stuff.  Even if a bit technical for non-IT audience, still fun to read through…

Going for Fedora 8

A new version of my favorite Linux distribution has been released recently – Fedora 8.  I got my hands on the installation DVD (thanks bro!) and tried it straight away.

It didn’t go very well – the installation was hanging up during dependencies check.  I thought maybe it was something simple to fix and checked it with strace, which showed that the installation was looping constantly creating some temporary files and then removing them.  I tried to create these files by hand, but they were immediately removed.  I asked around on #fedora IRC channel, but it was over a weekend and it was rather empty.  No tips were given.

Then I came across Michael’s post that reminded me that I could do an upgrade using Yum package manager, bypassing the installation altogether.   Following the steps in the guide was simple and soon yum started downloading the new packages.  But my Internet connection is pretty slow, it would have taken me about two days just to get the files.   Not much fun to wait.  Instead I decided to copy files from the DVD to /var/cache/yum/fedora/packages/ directory and restart the upgrade process.  Now all I needed to download were the updates that were released since the distribution went public.

A couple of hours later I rebooted into Fedora 8, running the new tick-less kernel (the biggest reason for me to upgrade).  I also noticed that a few fonts packages were updated – fonts are sharper and cleaner.  NetworkManager was upgraded.  And a few other things improved.

I’ve heard a lot of people complaining about sound problems due to a new sound server, but I didn’t have a chance to test it yet.  Other than this though everything seems to be running just fine.