Exporting messages from Gmail with fetchmail and procmail

One of the projects that I am involved in has a requirement of importing all the historical emails from a number of Gmail accounts into another system.  It’s not the most challenging of tasks, but since I spent a bit of time on it, I figured I should blog it here too, just in case a similar need will arise in the future.

In my particular case, I need two different solutions.  One for exporting all of the messages from all folders of all Gmail accounts in question (Gmail for Work).  And the other is for exporting only the messages from the “Sent Mail” folder, which were sent on specific dates.

The solution that I derived is based on the classic tools for this purpose – fetchmail and procmail.  Fetchmail is awesome at fetching emails using all kinds of protocols.  Procmail is amazing at sorting, filtering, and otherwise processing the email messages.

So, here we go.  First of all, we need to tell fetchmail where to get the messages from.  I didn’t want to create to separate configurations for each of my tasks, so I left only the options common between them in the configuration file, and the rest I will be passing as command line arguments, depending on scenario.

Note that I’ve been running these tests from a dedicated environment, where I only had the root user.  You don’t have to run it as root – it’ll work as any other just fine.  Also, keep in mind that I used “/root/fetchmail-test/” folder for my test runs.  You might need to adjust the paths if you have it any different.

Here’s my fetchmail.rc file, which I used to test a single mailbox.  A new “poll” section will go into this file later, for each mailbox that I’ll need to export.

poll imap.gmail.com proto imap:
  username "someuser@gmail.com" is root here
  password "somepass"
  fetchall
  keep
  ssl

If you are not root, you might need to adjust the second line, replacing “root” with your username. Also, for testing purposes, you can use “fetchlimit 1” instead of “fetchall“.

Now, we need two configuration files for procmail.  The first one is super simple – I’ll use this for simply pushing all downloaded messages into a single giant mbox file.  Here’s the procmail-all.rc:

VERBOSE=0
DEFAULT=/root/fetchmail-test/fetchmail.all.mbox

As you can see, it only defines the verbosity level and the default mailbox.  The second configuration file is a bit more complicated.  I’ll use it for the sent items only.  The sent items folder limit will be done with fetchmail.  But I want to do further is disregard all messages, which were not sent on a specific date.  Here is my procmail-sent.rc:

VERBOSE=0
DEFAULT=/dev/null
:0
* ^Date: .*28 Jul 2016.*|\
  ^Date: .*27 Jul 2016.*
/root/fetchmail-test/fetchmail.sent.mbox

Again, we have the verbosity level and the default mailbox to save messages to.  Since I want to disregard them unless they match a certain condition, I specify /dev/null.   Then, I specify my condition, which is simply a bunch of regular expressions for the Date header.  Usually, Date header is a not very reliable as different MUAs (Mail User Agents) use different formats, time zones, etc.  In this particular case test results seemed consistent (maybe Gmail fixes the header), and I didn’t have any other more reliable criteria to use.

As you can see, I use a very basic condition for date matching. So, if the Date header matches either “28 Jul 2016” or “27 Jul 2016“, the message is saved in the mbox file, rather than being thrown into the default mailbox.

Now, all I need is a way to tie fetchmail and procmail together, as well as provide some additional options.  For that I created the two one-liner shell scripts, just so that I won’t need to figure out the command line arguments if I look at this whole thing six month later.

Here is the check-all.sh script (multi-line for readability):

#!/bin/bash
fetchmail -f fetchmail.rc \
          -r "[Gmail]/All Mail" \
          --mda "procmail /root/fetchmail-test/procmail-all.rc"

and here is the check-sent.sh script (multi-line for readability):

#!/bin/bash
fetchmail -f fetchmail.rc \
          -r "[Gmail]/Sent Mail" \
          --mda "procmail /root/fetchmail-test/procmail-sent.rc"

If you run either one of these scripts, you’ll see the output similar to this:

$ ./check-all.sh 
fetchmail: WARNING: Running as root is discouraged.
410 messages for someuser@gmail.comat imap.gmail.com (folder [Gmail]/All Mail).
reading message someuser@gmail.com@gmail-imap.l.google.com:1 of 410 (446 header octets) (222 body octets) not flushed
reading message someuser@gmail.com@gmail-imap.l.google.com:2 of 410 (869 header octets) (230 body octets) not flushed
reading message someuser@gmail.com@gmail-imap.l.google.com:3 of 410 (865 header octets) (230 body octets) not flushed
...

Here are a few resources that you might find helpful:

Global email in Gmail. Bad idea.

Gmail blog reports that Google is working on a more global email.  The first step is internationalized email addresses, like this:

internationalized_email_address

As someone who worked in international environments for years, I strongly dislike this idea.  There is a whole array of issues related to this: readability of the email address (yes, read it!), display issues (do you have the font with all the necessary characters?), writing email address (searching through the addressbook, for example), or even copy-pasting an email address (have you tried copy-pasting something English strings from Hebrew or Arabic documents?  Now you’ll be copy-pasting international email addresses from English documents – so much fun!).  On top of that, all the usual things related to SPAM filters, trust issues (is this a company, free email hosting, or a personal domain?), etc.  Can you spell out this email address over a phone?  How about typing it on the mobile phone?  Do you even know in which language it is?

Using non-accented Latin characters is a pain for all those people who don’t speak English.  But it worked nonetheless for the last few decades.  Now we are heading towards the future, where that pain won’t be limited to those who don’t read English, but to everyone.  As you can’t really learn all the languages of the world, or control which language email addresses are making it into your inbox.  Remember, that just because the email address is in a given language, it doesn’t mean that the content of the email is in the same language.

On top of that, we’ve tried that already with the international URLs.  See how well that worked out.  Yeah, some people sure use them.  But try copy-pasting this URL around and I guarantee you’ll end up with a whole bunch of long and cumbersome escaped strings.  The same or similar fate will hit the emails…

Google introduces Gmail API

Google is introducing the new Gmail API:

While IMAP is great at what it was designed for (connecting email clients to email servers in a standard way), it wasn’t really designed to do all of the cool things that you have been working on, which is why this week at Google I/O, we’re launching the beta of the new Gmail API.

This is somewhat expected:

Designed to let you easily deliver Gmail-enabled features, this new API is a standard Google API, which gives RESTful access to a user’s mailbox under OAuth 2.0 authorization. It supports CRUD operations on true Gmail datatypes such as messages, threads, labels and drafts.

As a standard Google API, you make simple HTTPS calls and get your responses in JSON, XML or Google Protobuf formats.

This is a nice bonus:

In contrast to IMAP, which requires access to all of a user’s messages for all operations, the new API gives fine-grained control to a user’s mailbox. For example, if your app only needs to send mail on behalf of a user and does not need to read mail, you can limit your permission request to send-only.

To keep in sync, the API allows you to query the inbox change history, thereby avoiding the need to do “archaeology” to figure out what changed.

They are also saying that it’s fast.  These are very welcome news indeed.

Safe display of external images in Gmail

Official Gmail Blog lets us know that the latest update to Gmail now safely shows external images.  Most other email programs and services disable image show by default, because these can either contain all kinds of malware, or they can be used for tracking.  Gmail solves it now by downloading those images and serving them to users from its own servers.

But thanks to new improvements in how Gmail handles images, you’ll soon see all images displayed in your messages automatically across desktop, iOS and Android. Instead of serving images directly from their original external host servers, Gmail will now serve all images through Google’s own secure proxy servers.

So what does this mean for you? Simple: your messages are more safe and secure, your images are checked for known viruses or malware, and you’ll never have to press that pesky “display images below” link again. With this new change, your email will now be safer, faster and more beautiful than ever.

I’m not the biggest fan of HTML emails, but since I have not much choice in this area, I’d rather receive emails with images – at least I won’t be trying to make sense of empty layouts with no text anymore.

Download your Gmail and Google Calendar data … soon or now

I am a well known Google fan.  But even those who call it an Evil Corporation and a Global Spy, can’t argue with the awesomeness of these news:

Starting today we’re rolling out the ability to export a copy of your Gmail and Google Calendar data, making it easy to back up your data or move to another service.

You can download all of your mail and calendars or choose a subset of labels and calendars. You can also download a single archive file for multiple products with a copy of your Gmail, Calendar, Google+, YouTube, Drive, and other Google data.

gmail data export

Most of the 20 GB of data I store on Google Drive is actually my email archive.  I’ve imported email into my Gmail from as early as 1998 – much, much earlier than Gmail was even born.  Having a way to export them all out in one go, without using clunky POP or IMAP is much appreciated.

Actions in Gmail

gmail actions

I think this is the greatest innovation in web-based email since Gmail’s own release of large mailboxes (what was it? 1 GB?).   Web mail has all the benefits of a website, but offers greater contextual focus.   Adding specific actions to message has been a possible with extensions and plugins for a long time, but those were traditionally added by the recipient.  Giving such power to the sender is quite interesting.

Of course, there will be a variety of misuses – spam, phishing, etc – but, I’m sure there will be an even greater variety of useful functionality.  Like this “Send money with Gmail” example.  Here is more information on what’s possible.