ETLA

Filtering Email with Perl and Exim


NAME

Filtering email with perl and exim


ARTICLE

I get a lot of email. This, in a day and age where people often measure their self-worth by how many attempts there are to communicate with them in a day, could be seen as a good thing.

But the vast vast majority of it is for mailing lists. This is a bad thing. I like mailing lists, they're all well and good, but it can be incredibly irritating wading through them looking for your personal email. So, like many people, I filter them automatically into various mailboxes, one for each mailing list. This is very handy, but can be tedious to setup - I need to sit down, look at a typical mail from a mailing list, work out some characteristic of all mails on the list, and encode that using some kind of programming language.

Finally, I gave in, remembered that I knew perl, and decided that it was time to use scripts for what they do best - automating common tasks.

After a bit of thought, I decided that I didn't want to write my own mail handling tools - many clever people have done this already, there are all sorts of issues with locking, parsing email messages, and other things. Far better to avoid the whole affair, and just configure existing tools appropriately.

I decided the rule to implement was this :

  1. See if a particular header contains a certain string.
  2. If so, save that email to a specified folder.

Taking the approach that simplicity is a virtue, I decided on a simple config file, with three fields on each line - header, string, and folder name.

 Reply-To void@slab.org void

But then, after writing a small set of these records to a file, to see if this was a workable approach, I realised it could get a little hard to remember what was what. So I added the idea of comments - any line beginning with a # character would be ignored by my programs. This would let me document my configuration.

After an hour or two of playing, I came up with a script that would take my source file, and generate an Exim filter file as output. Exim filter files are a simple programming language that can be used to specify the details of mail delivery on appropriately configured systems. I could just as easily have used other tools like procmail, or Mail::Audit, but I happen to have exim already setup, and I like the way it works. If I'd chosen to go this way, I'd have used Mail::Audit, a fine module by Simon Cozens, and probably ended up reading in my config file directly, and doing the work entirely in perl without using an external tool.

Then, when it came to generating the files, I found another wrinkle - maybe I only want to handle some mailing lists via this script, and I have a whole host of other filtering requirements. To let me do both, I decided the script would be able to generate complete config files, or fragments of them, suitable for inclusion in a larger file. The desire to generate a 'complete' file would be indicated with the -x option.

  #!/usr/bin/perl -w

This line simply points at my perl interpreter, and asks perl to warn me about certain perl constructions that can easily be wrong.

  # Exim filter file generation
  # Michael Stevens
  # michael@etla.org

I try to always start any script with a few lines of comments, describing what it does, and who I am. It becomes amazing how handy this is when you come back to stuff 6 months later.

  use strict;

Ask perl to do various things, like insisting I declare my variables. This lets me get better error checking for the programs I write.

  use Getopt::Std;

Load the Getopt::Std library, a standard module for parsing command line arguments. This is supplied with all complete and recent perl releases. It is overkill for our application, but we don't need efficiency, and it makes the program simpler.

  use vars qw($opt_x);

We use a global variable, because we need something that will be modified within the getopts subroutine, which runs in a different package to our main code. We're running under use strict because we don't normally want to do this, so we need the use vars pragma to explicitly tell perl this global variable is allowed.

  getopts('x');

We use the simple getopt interface. $opt_x will be set to '1' if the program was run with the -x argument on the command line.

  my $string_time = localtime;

Get the current local time into a string, so we can easily interpolate it later.

  if ($opt_x) {

If we're generating a complete filter file, print some initialisation code.

    my $name = getpwuid $<;
    if (!defined($name)) {
      $name = "UID: $<";
    }

Take the userid of the person running this script, try to get a username to match it, and if we can't, store the user id as the name.

    print STDOUT <<ENDIT;
  # Exim filter

This line indicates to exim that we're dealing with a filter file, and it should handle the rest of the contents appropriately.

  # Automatically generated at ${string_time} by ${name}

This is just a simple reminder, in case I get confused who generated what, and when.

  # don't filter error messages. aids debugging.
  if error_message then finish endif

This is a standard part of an exim filter file - it says to stop processing here if the message is recognised as indicating an error of some kind, and is very useful for debugging.

  ENDIT
  }
  while (<STDIN>) {

The simplest way to get input into our program is on standard input. So we read each line of the input.

    # allow comments
    next if /^\#/;

If the input line is a comment, ignore it unprocessed.

    # allow blanks
    next if /^\s*$/;

Ignore lines that consist only of whitespace.

    chomp;

Remove any trailing newlines from the line we are processing.

    # get headername, value, destination
    my @values = split ' ', $_, 4;

Split the line up on whitespace, and store the results in the array @values.

    if (@values != 3) {
      die "FATAL ERROR: Incorrect number of fields on line $. - expected 3, found " .
        (scalar @values) . "\n";
    }

Each line should have only 3 elements. If we find anything else, indicate an error and give up.

    for (my $i = 0; $i <= 2; $i++) {

Look through each of the expected elements.

      if ($values[$i] =~ /^\s*$/) {

Check if they are whitespace.

        die "FATAL ERROR: Field " . ($i + 1) . " may not be left blank\n";

If they are, indicate an error and give up.

      }
    }
    my ($headername, $value, $destination) = @values;

Now we know the input is ok, we can put it into some variables with userfriendly names.

    print STDOUT <<ENDIT;
  if \$h_${headername} contains "${value}" 
  then
  save \$home/mail/${destination} 0600
  endif
  ENDIT

Output a suitable fragment of exim filter code, to save any mails that contain our chosen string, in our chosen header, into a specific file. We specify some sensible and minimal permissions for the file.

  }
  if ($opt_x) {
    print STDOUT <<ENDIT;
  # End of generated file.
  ENDIT
  }

This block simply puts a comment marking the end of every generated file.

  __END__
  =pod
  =head1 NAME
  make_filter - automatic exim filter file generation
  =head1 SYNOPSIS
  make_filter <sourcefile >generatedfile
  =head1 DESCRIPTION
  This script takes a list of mailing list definitions on standard input, and
  processes them to generate an exim filter file fragment on standard output.
  If the C<-x> option is used, generates a complete exim filter file, suitable
  for use as your forward file. To use this utility, you must have filter
  files enabled in exim, and use exim as your mail transport agent.
  The input file permits comments - any line beginning with the C<#>
  character is ignored. Each real input line should consist of 3 fields
  separated by whitespace - the header name to look for, the value it
  must contain, and the file to save the message in. It is assumed you
  use a subdirectory 'mail' under your home directory to store all
  messages.
  An example input file is given below:
    # file format is headername value destination
  A comment line, to remind me of what goes in the file.
    Reply-To void@slab.org void
  Looks for postings to the void mailing list, while happens to always
  set a Reply-To header.
    X-Mailing-List linux-kernel@vger.kernel.org linux-kernel
    X-Mailing-List pgsql-general@postgresql.org pgsql-general
  Look for postings to two mailing list which use the popular
  C<X-Mailing-List> header to mark all mailings.
  =head1 BUGS
  =over 4
  =item *
  Should handle more types of mailing list.

  =item *
  Work with Mail::Audit, somehow.
  =item *
  Mail storage directory should be configurable.
  =back
  =head1 AUTHOR
  Michael Stevens - michael@etla.org.
  =head1 SEE ALSO
  Exim's User Interface to Mail Filtering.
  =cut

All good programs have their own documentation, indicating methods of use, problems, who wrote them, and any general notes that might be relevant to the user.

And that's that. From now on, you can perform simple filtering tasks the way god intended - very simply.


ETLA

Valid XHTML 1.0!

etla group webmaster - webmaster@etla.org