LWN runs an interesting article, covering different ways of archiving a website. It sounds trivial, but it’s not. Even the simplest of ways – wget – will probably take you a few dozen attempts to figure out the following:
$ wget --mirror --execute robots=off --no-verbose --convert-links \
--backup-converted --page-requisites --adjust-extension \
--base=./ --directory-prefix=./ --span-hosts \
There a few other interesting tools (like pywb) mentioned.
GitHub to MySQL is a handy little app in PHP that pulls labels, milestones and issues from GitHub into your local MySQL database. This is useful for analysis and backup purposes.
There are a few example queries provided that show issues vs. pull requests, average number of days to merge a pull request over the past weeks, average number of pull requests open every day, and total number of issues.
I think this tool can be easily extended to pull other information from GitHub, such as release notes, projects, web hooks. Also, if you are using multiple version control services, such as BitBucket and GitLab, extending this tool can help with merging data from multiple sources and cross-referencing it with the company internal tools (bug trackers, support ticketing systems, CRM, etc).
This is not something I’ll be doing now, but I’m sure the future is not too far away.
As I am reading this story – GitLab.com melts down after wrong directory deleted, backups fail and these details – every single hair I have, moves … I don’t (and didn’t) have any data on GitLab, so I haven’t lost anything. But as somebody who worked as a system administrator (and backup administrator) for years, I can imagine the physical and psychological state of the team all too well.
Sure, things could have been done better. But it’s easier said than done. Modern technology is very complex. And it changes fast. And businesses want to move fast too. And the proper resources (time, money, people) are not always allocated for mission critical tasks. One thing is for sure, the responsibility lies on a whole bunch of people for a whole bunch of decisions. But the hardest job is right now upon the tech people to bring back whatever they can. There’s no sleep. Probably no food. No fun. And a tremendous pressure all around.
I wish the guys and gals at GitLab a super good luck. Hopefully they will find a snapshot to restore from and this whole thing will calm down and sort itself out. Stay strong!
And I guess I’ll be doing test restores all night today, making sure that all my things are covered…
Update: you can now read the full post-mortem as well.
Back in my college days, I had a professor who frequently used Andrew Tanenbaum‘s quote in the networking class:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
I guess he wasn’t the only one, as during this year’s Amazon re:Invent 2016 conference, the company announced, among other things, a AWS Snowmobile:
Moving large amounts of on-premises data to the cloud as part of a migration effort is still more challenging than it should be! Even with high-end connections, moving petabytes or exabytes of film vaults, financial records, satellite imagery, or scientific data across the Internet can take years or decades. On the business side, adding new networking or better connectivity to data centers that are scheduled to be decommissioned after a migration is expensive and hard to justify.
In order to meet the needs of these customers, we are launching Snowmobile today. This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks (you can get more than one if necessary). Designed to meet the needs of our customers in the financial services, media & entertainment, scientific, and other industries, Snowmobile attaches to your network and appears as a local, NFS-mounted volume. You can use your existing backup and archiving tools to fill it up with data destined for Amazon Simple Storage Service (S3) or Amazon Glacier.
Thanks to this VentureBeat page, we even have a picture of the monster:
100 Petabytes on wheels!
I know, I know, it looks like a regular truck with a shipping container on it. But I’m pretty sure it’s VERY different from the inside. With all that storage, networking, power, and cooling needed, it would be awesome to take a pick into this thing.
If you are running a Magento-based website, make sure you add the database maintenance script to the cron. For example, append this to the /etc/crontab:
# Magento log maintenance, as per
0 23 * * 0 root (cd /var/www/html/mysite.com && php -f shell/log.php clean)
Thanks to this page, obviously. You’ll be surprised how much leaner your database will be, especially if you get any kind of traffic to the site. Your database backups will also appreciate the trim.