Here goes the story of me learning a few new swear words and pulling out nearly all my hair. Grab a cup of coffee, this will take make a while to tell…
First of all, here is a diagram to make things a little bit more visual.
As you can see, we have an office network with NAT on the gateway. We have an Amazon VPC with NAT on the bastion host. And then there’s the rest of the Internet.
The setup is pretty straight forward. There are no outgoing firewalls anywhere, no VLANs, no network equipment – all of the involved machines are a variety of Linux boxes. The whole thing has been working fine for a while now.
A couple of weeks ago we had an issue with our ISP in the office. The Internet connection was alive, but we were getting extremely high packet loss – around 80%. The technician passed by, changed the cables, rebooted the ADSL modem, and we’ve also rebooted the gateway. The problem was fixed, except for one annoying bit. We could access all of the Internet just fine, except our Amazon VPC bastion host. Here’s where it gets interesting.
Continue reading “WTF with Amazon and TCP”
httpdiff – perform the same request against two HTTP servers and diff the results
I’ve read this story a while ago, but this is a beautiful piece of the system administration reality, so here it goes again.
“We’re having a problem sending email out of the department.”
“What’s the problem?” I asked.
“We can’t send mail more than 500 miles,” the chairman explained.
I choked on my latte. “Come again?”
“We can’t send mail farther than 500 miles from here,” he repeated. “A
little bit more, actually. Call it 520 miles. But no farther.”
More stories here.
morgue – post mortem tracker
Tools of the Trade – a huge collection of tools (mostly software as a service) for all kinds of web work: development, troubleshooting, project management, testing, emails, etc.
Sentry – an event logging platform focused on capturing and aggregating exceptions. Most of the code is Open Source (except for a few proprietary plugins), in case you want to run your own hosted version.
Easylogging++ – single header only, extremely light-weight high performance logging library for C++ applications
sysdig – system troubleshooting for Linux
Six Ways to Make Your Production Logs More Useful
- Log structured data in a readable format
- Add a dash of color
- Logs let your app communicate with you and your team
- Seriously though, don’t put exception stack traces in your logs!
- Log URLs for easy access to more context
- Add emotional context to your logs
Most of these are somewhat expected, but I emotional context in logs was definitely new to me. I wonder why I’ve never even thought of this.
HTTP Archive Viewer – a handy tool for troubleshooting web pages. Here is how to use it:
- Open Google Chrome browser (new tab).
- Press F12 to open Developer Tools.
- Switch to Network tab.
- Load any page in the tab.
- Right-click anywhere over network requests to get a menu.
- Select ‘Save as HAR with content’.
- Choose the location for the HAR file.
Now you can drag-n-drop this file into the HTTP Archive Viewer and study how the page loaded, which requests were made, how much time was spent and how it was spent. This is particularly useful for the following scenarios:
- You are about to make some changes to your site, and you want to compare ‘before’ and ‘after’.
- You are troubleshooting a session of a non-technical user, who can’t provide you access to his desktop environment.