HTML5 splits into two standards

Just when web developers got a little bit of hope, Slashdot reports on the bad news.

Until now the two standards bodies working on HTML5 (WHATWG and W3C ) have cooperated. An announcement by WHATWG makes it clear that this is no longer true. WHATWG is going to work on a living standard for HTML which will continue to evolve as more technologies are added. W3C is going the traditional and much more time consuming route of creating a traditional standard which WHATWG refers to as a ‘snapshot’ of their living standard. Of course now being free of W3C’s slower methods WHATWG can accelerate the pace of introducing new technologies to HTML5. Whatever happens, the future has just become more complicated — now you have to ask yourself ‘Which HTML5?’

Even if it sounds good, it is actually really bad.  HTML5 is already complicated enough, and all major browsers support a different subset of it, and even those things which are supported do differ in the way of how.  Splitting the standard just complicated things further.  The fact that this is not exactly new, doesn’t really matter.  Saying that it won’t be harmful, is silly.  As is the whole point of a “living standard”.  Like a few people mentioned in Slashdot comments, “living standard” is an oxymoron. The whole point of standard is to provide a static point of reference.  Splitting is not a solution to the problem.  It’s quite the opposite.  Consider this xkcd comics for illustration, which is nothing but the truth.

boilerpipe – Boilerplate Removal and Fulltext Extraction from HTML pages

boilerpipe – Boilerplate Removal and Fulltext Extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

Ultimate guide for CSS support in email clients

Ultimate guide for CSS support in email clients

Designing an HTML email that renders consistently across the major email clients can be very time consuming. Support for even simple CSS varies considerably between clients, and even different versions of the same client.

We’ve put together this guide to save you the time and frustration of figuring it out for yourself. With 24 different email clients tested, we cover all the popular applications across desktop, web and mobile email.

As the number of email clients continues to grow, we’ve decided to simplify the web-based version of the guide to focus on the 10 most popular email clients on the market. For the complete report on all 24 email clients across the desktop, web and mobile email world, download the complete guide in PDF format.

Linking to favicons

Favicons have been around for a few years now.  But they were mostly used by the browsers – in multi-tab environments and in bookmark managers.  Recently I’ve noticed the trend to use favicons in web design – next to external links or near the blog comment’s author, etc.

Adding a favicon to the design is a simple thing for the designer.  But a totally different story for the web developer.  Favicons can be either dropped into the root folder of the site or linked to from the page’s HTML.  On top of that, the times of the single favicon.ico format are long gone too.  These days you could get a GIF or PNG image.

So, how would reliably finda favicon of a site?  It turns out, you don’t really have to work too hard, since someone has already solved your problem.  From comments to this article (in Russian) I’ve learned of the Google web service.  So, all you’ll need to do is this (with whatever domain name that you need):

<img src="http://www.google.com/s2/favicons?domain=mamchenkov.net">

Works and sound good, right?  Wrong!  As I mentioned already, there is a way to link to favicons from HTML.  And this service doesn’t seem to take that into account.  Well, not to worry anyway.  There is another one that does – getFavicon.  This one works in a very similar way, but supports the full URL as a parameter.  For example:

<img src="http://g.etfv.co/https://mamchenkov.net/wordpress/">

On top of that, you can include properly encoded GET parameters, and avoid browser’s per-server connection limit, by using multiple sub-domains.  Brilliant, I say.