I’ve been hearing a lot about typosquatting recently. Typosquattting is a method bad guys use to make money on the Internet. What they do is they get a list of popular domain names, like Google.com and Yahoo.com, then figure out which are the most common ways people mistype these addresses, and then they register those mistyped domain names and use them for making money by displaying advertising banners and redirecting to other web sites.
If you think about it for a second, there are a few types of typing mistakes which are easier to make. Missing a character, typing a couple of characters in the wrong order (‘teh’ instead of ‘the’), typing a sticky character (‘nn’ instead of ‘n’), or hitting a wrong key on the keyboard (‘u’ instead of ‘i’). All these mistakes are easy to predict and, thus, use for typosquatting.
While I was thinking about it, I decided to try it out – write a small script that will check how many mistyped domains are there and how many of them are already registered. It turned out, the script was extremely easy to write – I started with it with my morning coffee and finished it before the coffee got cold. It took about altogether about 8 minutes, so don’t jump too hard on it.
domain_finder.pl
Requirements
You won’t need a lot to try it out – perl interpreter, Net::Domain::ExpireDate
module (get it from CPAN), and Internet connection.
How to use
In the simplest form you can just run the script like this:
./domain_finder.pl google
You’ll see a whole bunch of variations on how to mistype “google”, and the status of .com domain for each of these variations.
For more control, check the script’s source code. You can easily make it more silent or more verbose, check domains in other TLDs, and create your own rules for typing mistakes.
How does it work
The script takes a single parameter – the domain that you want to check, without the TLD part. It then creates all variations of this domain with the following mistakes:
- Missing character. For each character in the domain, the script will generate a variant without it.
- Swapped characters. For each character in the domain, the script will generate a variant with this character and next character changing positions.
- Sticky character. For each character in the domain, the script will generate a variant with this character entered twice in a row.
- Wrong keyboard key. For each character in the domain, the script will generate variants with all characters-neighbors on a QWERTY keyboard.
All these variants will be sorted and dups removed. After that, each variant will be checked with (pre-configured) TLD part appended to it. If the resulting domain is registered, than the expiration date will be printed out. If the domain is not registered, it will be indicated as such.
Conclusion
With this tool in my hands, I tried a whole bunch of domains – from “google” to “mamchenkov”. What can I say? I suspected that typosquatting is a big problem, but I could never imagine how big it was.
Here are some numbers to give you an idea (we all love stats, don’t we?):
- “google” generates 48 variants. All registered.
- “yahoo” generates 41 variants. All registered.
- “microsoft” generates 78 variants. All registered.
- “slashdot” generates 68 variants. 42 registered.
- “digg” generates 33 variants. All registered.
- “cnn” generates 17 variants. All registered.
- “wikipedia” generates 80 variants. 78 registered.
- “linux” generates 39 variants. 28 registered.
- “blogging” generates 62 variants. 24 registered.
- “cyprus” generates 51 variants. 18 registered.
NOTE: I’ve been checking these only in .com TLD and I used pretty simple typing mistakes. For example, hax0r-style typing is not included in my rules.
The tool turned out to be quite handy. I might even convert it into a web service, so that domain owners could easily check if they are victims of typesquatting or not (yet).
Feel free to use the script for good causes.