Fighting the spinning spammers

Lorelle has an excellent post covering spinning spam – “Spinning Spammers Steal Our Blog Content“.  As always, the article is full of useful links and insightful quote.

Here is a quote from a linked article – “Protecting Your Content From the Spinning Spammers” – describing the issue:

 […] process of modifying the content before reposting it is often called “spinning”. Spinning a work before republication has several advantages, the largest of which is that Google is less likely to detect the work as a duplicate and, thus rank it higher. However, almost equally important is that it is much harder for victims of plagiarism to detect and follow up on the misuse, making this kind of abuse much harder to stop […]

Here are some helpful tips for detecting the stolen content:

  1. Digital Fingerprinting: Digital fingerprinting is a process by which you append a unique word or phrase to the end of your posts in your RSS feed. If the feed is scraped, so is the fingerprint and searching for that string of characters tells you which sites have taken your content. Since fingerprints don’t have easy translations or synonyms, they remain intact through the spinning process. Plugins such as the Digital Fingerprint Plugin and Copyfeed can automate the process.
  2.  Trackback Monitoring: As was the case with Tony’s original post, spam blogs often leave links in the scraped post intact, even as they modify the copy. They often send trackbacks to those URLs in a bid to get extra incoming links to the spam blog. If you link to your own articles when writing, you can watch the trackbacks and get an idea for who is using your content, even if it is spun.
  3.  FeedBurner Tracking: FeedBurner offers a very powerful “uncommon uses” feature that tracks where your feed is published. Since FeedBurner does not depend upon the post content to track the feed, spinning the text will not fool the system.

I tried digital fingerprinting coupled with monitoring a few times and I have to say it works pretty good.  The way I was doing it though, was on a per article base, not for the whole feed.  I noticed that when my content is stolen, usually just a few articles are taken – presumably those with high ranking keywords.

So, what I do sometimes is invent a new word (wordativity anyone? blogalerting?), stick it in the post, and then setup Google Alerts for this word.  The moment Google indexes something with this word, I am notified either via an RSS feed or an email.  (If you feel really paranoid, you can create a new Twitter account, pipe the RSS feed to that account, and folow it with your main account, so that you get an SMS when stealing occurs.)

Anyway, check the above links for more information about the problem, some insight into legal point of view, as well as how to handle the cases when this happens.  And spread the word too.

Productivity tip

Here is a productivity tip from the you-don’t-want-to-do-this department:

You don’t want to wait for a filesystem check to finish, when it’s working on a 200 GByte partition.

Hopefully, this tip explains the 5-hour downtime that the server experienced from today’s morning and until now.

RegExp reminder

I was just reminded about this small thing, which is so easy to forget – regular expressions that have markers of line start (^) and/or line end($) are so much faster than those regexps that don’t have these markers. The thing is that with line start/end marker regexp engine needs to make only one match/substution, whereas when there is no such markers, it has to repeat the match/substitution operation at every character of the string.

In practice, it’s unbelievable how much difference this can make. Especially when using complex regular expressions over large data sets.

P.S.: I understand that it is not always possible to use these markers, but I think that they can be used much more often than they are. Everywhere.

Preventing hangovers

Curing hangovers is one of the most popular issues to talk and read about ever. I did my part of the research too, but it was as helpful as for everyone else. The fact is, if you’ve got yourself a nice hangover, there isn’t much you can do to get rid of it. You can make it slightly easier on you, by choosing to follow one of the billion advices.

Luckily, I found something better. In stead fighting the hangover, I simply prevent it. It turned out to work much better, and simplier too. How do I do it? Two simple steps.

  1. Try not to mix different drinks. If there is no choice, than always drink the stronger one next. In other words – if you have to drink both beer and vodka, than drink beer first, and than follow it up with vodka. This way you won’t get as drunk, and you won’t have as terrible of the hangover in the morning. But this step is a minor one compared to the next.
  2. Drink lots (and I really mean LOTS) of water before going to bed. The more water you drink, the better you will feel in the morning. Dihydration is the main component of the hangover, and you can’t fix it while you’re sleeping. So, just take care of it before you go to sleep, and the water you drank will last you through the night.

That’s it. These are basically the only two things I care about when I drink alcohol. And I haven’t had a hangover in years now. Except that one time, when I didn’t have any water before falling in bed. And that was one of the most horrible days of my life.

Yesterday, I had more than half a litter of vodka (with friends and food). When I came back home, I think I drank up the whole plastic bottle (1.5L) of water. Five hours later when I went for a walk with Maxim, I didn’t feel a thing. Like I wasn’t even drinking the day before. It was so good that it actually felt weird.

Best shell alias ever

I came across the best shell alias ever:

alias up="cd .."

This is one of those things that make me go “Why didn’t I thought of it earlier? And myself?”.

In order to add some value to this post, here are my two mostly used aliases:

alias pd="perldoc"
alias pdf="perldoc -f"