Archiving web sites

LWN runs an interesting article, covering different ways of archiving a website.  It sounds trivial, but it’s not.  Even the simplest of ways – wget – will probably take you a few dozen attempts to figure out the following:

$ wget --mirror --execute robots=off --no-verbose --convert-links \
       --backup-converted --page-requisites --adjust-extension \
       --base=./ --directory-prefix=./ --span-hosts \
       --domains=www.example.com,example.com http://www.example.com/

There a few other interesting tools (like pywb) mentioned.

Leave a Comment