Exporting messages from Gmail with fetchmail and procmail

One of the projects that I am involved in has a requirement of importing all the historical emails from a number of Gmail accounts into another system.  It’s not the most challenging of tasks, but since I spent a bit of time on it, I figured I should blog it here too, just in case a similar need will arise in the future.

In my particular case, I need two different solutions.  One for exporting all of the messages from all folders of all Gmail accounts in question (Gmail for Work).  And the other is for exporting only the messages from the “Sent Mail” folder, which were sent on specific dates.

The solution that I derived is based on the classic tools for this purpose – fetchmail and procmail.  Fetchmail is awesome at fetching emails using all kinds of protocols.  Procmail is amazing at sorting, filtering, and otherwise processing the email messages.

So, here we go.  First of all, we need to tell fetchmail where to get the messages from.  I didn’t want to create to separate configurations for each of my tasks, so I left only the options common between them in the configuration file, and the rest I will be passing as command line arguments, depending on scenario.

Note that I’ve been running these tests from a dedicated environment, where I only had the root user.  You don’t have to run it as root – it’ll work as any other just fine.  Also, keep in mind that I used “/root/fetchmail-test/” folder for my test runs.  You might need to adjust the paths if you have it any different.

Here’s my fetchmail.rc file, which I used to test a single mailbox.  A new “poll” section will go into this file later, for each mailbox that I’ll need to export.

poll imap.gmail.com proto imap:
username "someuser@gmail.com" is root here
password "somepass"
fetchall
keep
ssl

If you are not root, you might need to adjust the second line, replacing “root” with your username. Also, for testing purposes, you can use “fetchlimit 1” instead of “fetchall“.

Now, we need two configuration files for procmail.  The first one is super simple – I’ll use this for simply pushing all downloaded messages into a single giant mbox file.  Here’s the procmail-all.rc:

VERBOSE=0
DEFAULT=/root/fetchmail-test/fetchmail.all.mbox

As you can see, it only defines the verbosity level and the default mailbox.  The second configuration file is a bit more complicated.  I’ll use it for the sent items only.  The sent items folder limit will be done with fetchmail.  But I want to do further is disregard all messages, which were not sent on a specific date.  Here is my procmail-sent.rc:

VERBOSE=0
DEFAULT=/dev/null
:0
* ^Date: .*28 Jul 2016.*|\
^Date: .*27 Jul 2016.*
/root/fetchmail-test/fetchmail.sent.mbox

Again, we have the verbosity level and the default mailbox to save messages to.  Since I want to disregard them unless they match a certain condition, I specify /dev/null.   Then, I specify my condition, which is simply a bunch of regular expressions for the Date header.  Usually, Date header is a not very reliable as different MUAs (Mail User Agents) use different formats, time zones, etc.  In this particular case test results seemed consistent (maybe Gmail fixes the header), and I didn’t have any other more reliable criteria to use.

As you can see, I use a very basic condition for date matching. So, if the Date header matches either “28 Jul 2016” or “27 Jul 2016“, the message is saved in the mbox file, rather than being thrown into the default mailbox.

Now, all I need is a way to tie fetchmail and procmail together, as well as provide some additional options.  For that I created the two one-liner shell scripts, just so that I won’t need to figure out the command line arguments if I look at this whole thing six month later.

Here is the check-all.sh script (multi-line for readability):

#!/bin/bash
fetchmail -f fetchmail.rc \
-r "[Gmail]/All Mail" \
--mda "procmail /root/fetchmail-test/procmail-all.rc"

and here is the check-sent.sh script (multi-line for readability):

#!/bin/bash
fetchmail -f fetchmail.rc \
-r "[Gmail]/Sent Mail" \
--mda "procmail /root/fetchmail-test/procmail-sent.rc"

If you run either one of these scripts, you’ll see the output similar to this:

$ ./check-all.sh
fetchmail: WARNING: Running as root is discouraged.
410 messages for someuser@gmail.comat imap.gmail.com (folder [Gmail]/All Mail).
reading message someuser@gmail.com@gmail-imap.l.google.com:1 of 410 (446 header octets) (222 body octets) not flushed
reading message someuser@gmail.com@gmail-imap.l.google.com:2 of 410 (869 header octets) (230 body octets) not flushed
reading message someuser@gmail.com@gmail-imap.l.google.com:3 of 410 (865 header octets) (230 body octets) not flushed
...

Here are a few resources that you might find helpful:

WTF : The Inner JSON Effect

I’ve seen my share of horrible systems, but I haven’t seen anything this bad:

“So you have ‘customers.json’ and ‘customers.js’. The JSON file is the metadata and the JS file has all the code. So the list of functions in the JSON file tells JDSL to look up those revisions of the JS file to find what functions are available. In this case the actual code is in revisions 568, 899, 900, 901, and so on.”

Although I’ve seen a system before that breaks when adding code comments to certain files (as it was parsing source code with regular expressions, rather then with the language parser):

“Well, yes. I added a few code comments, trying to–”

“You can’t use comments in JDSL!” Tom shouted. “THAT’S WHAT BROKE IT!!”

Jake stayed silent, trying to process how code comments could wipe out a customer database. Tom continued after a pause. “I haven’t added comment support to JDSL, so the runtime executes comments like normal code! You must have had database updates in some comments?!”

“Well, yeah, I put a couple short syntax examples in a comment to clarify–”

Tom burst to his feet. “I knew it! You BROKE IT!” He turned to face the VPs. “I can’t deal with coders who don’t understand the system! You will either fire Jake…or I quit!” And he stormed out of the room.

21st century is finally here with PrimeTel Fibernet

The apartment building where I live in for the last few years had some cabling issues.  That prevented me from joining the rest of the world in the 21st century, when it comes to home Internet connectivity.  Here’s what I’ve been on until today:

PrimeTel (before)

Today, I’ve got my connection updated.  PrimeTel Fibernet, which is currently only available to select buildings, brought the modern age of technology into my house.  Here’s how it looks:

PrimeTel (after)

Yup, that’s a 50 Mbps download with 8 Mbps upload connection.  Nearly a 10x speed increase, but not only that.  Have a look at 1 ms ping now vs. 35 ms ping before.  And that all is for the same price.  And nothing else had to change – I still have the same TV channels and the same landline number.  Ah, no, wait, my home IP address changed, but who cares about that, right?

This thing is so far indeed, that to fully utilize it I need to use the Ethernet cable.  Gladly, that’s how both my PlayStation 3 and the home media server are connected.  With my laptop’s WiFi, I get the numbers like this:

PrimeTel (WiFi)

I’m not yet sure why, but I’ll probably need to look into my wireless card drivers or something.

Anyways, WiFi or not, it’s way faster than it used to be, both in bandwidth and latency.  Which are amazing news!

P.S.: Thanks to SpeedTest.net for cool graphics and years in service too.

Serverlessconf 2016 – New York City: a personal report

Serverlessconf 2016 – New York City: a personal report – is a fascinating read.  Let me get you hooked:

This event left me with the impression (or the confirmation) that there are two paces and speeds at which people are moving.

There is the so called “legacy” pace. This is often characterized by the notion of VMs and virtualization. This market is typically on-prem, owned by VMware and where the majority of workloads (as of today) are running. Very steady.

The second “industry block” is the “new stuff” and this is a truly moving target. #Serverless is yet another model that we are seeing emerging in the last few years. We have moved from Cloud (i.e. IaaS) to opinionated PaaS, to un-opinionated PaaS, to DIY Containers, to CaaS (Containers as a Service) to now #Serverless. There is no way this is going to be the end of it as it’s a frenetic moving target and in every iteration more and more people will be left behind.

This time around was all about the DevOps people being “industry dinosaurs”. So if you are a DevOps persona, know you are legacy already.

Sometimes I feel like I am leaving on a different planet.  All these people are so close, yet so far away …