Filtering email with perl and exim
I get a lot of email. This, in a day and age where people often measure their self-worth by how many attempts there are to communicate with them in a day, could be seen as a good thing.
But the vast vast majority of it is for mailing lists. This is a bad thing. I like mailing lists, they're all well and good, but it can be incredibly irritating wading through them looking for your personal email. So, like many people, I filter them automatically into various mailboxes, one for each mailing list. This is very handy, but can be tedious to setup - I need to sit down, look at a typical mail from a mailing list, work out some characteristic of all mails on the list, and encode that using some kind of programming language.
Finally, I gave in, remembered that I knew perl, and decided that it was time to use scripts for what they do best - automating common tasks.
After a bit of thought, I decided that I didn't want to write my own mail handling tools - many clever people have done this already, there are all sorts of issues with locking, parsing email messages, and other things. Far better to avoid the whole affair, and just configure existing tools appropriately.
I decided the rule to implement was this :
Taking the approach that simplicity is a virtue, I decided on a simple config file, with three fields on each line - header, string, and folder name.
Reply-To void@slab.org void
But then, after writing a small set of these records to a file, to see if this
was a workable approach, I realised it could get a little hard to remember
what was what. So I added the idea of comments - any line beginning with
a #
character would be ignored by my programs. This would let me document
my configuration.
After an hour or two of playing, I came up with a script that would take my source file, and generate an Exim filter file as output. Exim filter files are a simple programming language that can be used to specify the details of mail delivery on appropriately configured systems. I could just as easily have used other tools like procmail, or Mail::Audit, but I happen to have exim already setup, and I like the way it works. If I'd chosen to go this way, I'd have used Mail::Audit, a fine module by Simon Cozens, and probably ended up reading in my config file directly, and doing the work entirely in perl without using an external tool.
Then, when it came to generating the files, I found another wrinkle -
maybe I only want to handle some mailing lists via this script, and I have
a whole host of other filtering requirements. To let me do both, I decided
the script would be able to generate complete config files, or fragments of
them, suitable for inclusion in a larger file. The desire to generate a
'complete' file would be indicated with the -x
option.
#!/usr/bin/perl -w
This line simply points at my perl interpreter, and asks perl to warn me about certain perl constructions that can easily be wrong.
# Exim filter file generation # Michael Stevens # michael@etla.org
I try to always start any script with a few lines of comments, describing what it does, and who I am. It becomes amazing how handy this is when you come back to stuff 6 months later.
use strict;
Ask perl to do various things, like insisting I declare my variables. This lets me get better error checking for the programs I write.
use Getopt::Std;
Load the Getopt::Std
library, a standard module for parsing command
line arguments. This is supplied with all complete and recent perl
releases. It is overkill for our application, but we don't need efficiency,
and it makes the program simpler.
use vars qw($opt_x);
We use a global variable, because we need something that will be
modified within the getopts
subroutine, which runs in a different
package to our main code. We're running under use strict
because we
don't normally want to do this, so we need the use vars
pragma to
explicitly tell perl this global variable is allowed.
getopts('x');
We use the simple getopt interface. $opt_x
will be set to '1' if the
program was run with the -x
argument on the command line.
my $string_time = localtime;
Get the current local time into a string, so we can easily interpolate it later.
if ($opt_x) {
If we're generating a complete filter file, print some initialisation code.
my $name = getpwuid $<; if (!defined($name)) { $name = "UID: $<"; }
Take the userid of the person running this script, try to get a username to match it, and if we can't, store the user id as the name.
print STDOUT <<ENDIT; # Exim filter
This line indicates to exim that we're dealing with a filter file, and it should handle the rest of the contents appropriately.
# Automatically generated at ${string_time} by ${name}
This is just a simple reminder, in case I get confused who generated what, and when.
# don't filter error messages. aids debugging. if error_message then finish endif
This is a standard part of an exim filter file - it says to stop processing here if the message is recognised as indicating an error of some kind, and is very useful for debugging.
ENDIT }
while (<STDIN>) {
The simplest way to get input into our program is on standard input. So we read each line of the input.
# allow comments next if /^\#/;
If the input line is a comment, ignore it unprocessed.
# allow blanks next if /^\s*$/;
Ignore lines that consist only of whitespace.
chomp;
Remove any trailing newlines from the line we are processing.
# get headername, value, destination my @values = split ' ', $_, 4;
Split the line up on whitespace, and store the results in the array
@values
.
if (@values != 3) { die "FATAL ERROR: Incorrect number of fields on line $. - expected 3, found " . (scalar @values) . "\n"; }
Each line should have only 3 elements. If we find anything else, indicate an error and give up.
for (my $i = 0; $i <= 2; $i++) {
Look through each of the expected elements.
if ($values[$i] =~ /^\s*$/) {
Check if they are whitespace.
die "FATAL ERROR: Field " . ($i + 1) . " may not be left blank\n";
If they are, indicate an error and give up.
} } my ($headername, $value, $destination) = @values;
Now we know the input is ok, we can put it into some variables with userfriendly names.
print STDOUT <<ENDIT;
if \$h_${headername} contains "${value}" then save \$home/mail/${destination} 0600 endif
ENDIT
Output a suitable fragment of exim filter code, to save any mails that contain our chosen string, in our chosen header, into a specific file. We specify some sensible and minimal permissions for the file.
}
if ($opt_x) { print STDOUT <<ENDIT; # End of generated file. ENDIT }
This block simply puts a comment marking the end of every generated file.
__END__
=pod
=head1 NAME
make_filter - automatic exim filter file generation
=head1 SYNOPSIS
make_filter <sourcefile >generatedfile
=head1 DESCRIPTION
This script takes a list of mailing list definitions on standard input, and processes them to generate an exim filter file fragment on standard output. If the C<-x> option is used, generates a complete exim filter file, suitable for use as your forward file. To use this utility, you must have filter files enabled in exim, and use exim as your mail transport agent.
The input file permits comments - any line beginning with the C<#> character is ignored. Each real input line should consist of 3 fields separated by whitespace - the header name to look for, the value it must contain, and the file to save the message in. It is assumed you use a subdirectory 'mail' under your home directory to store all messages.
An example input file is given below:
# file format is headername value destination
A comment line, to remind me of what goes in the file.
Reply-To void@slab.org void
Looks for postings to the void mailing list, while happens to always set a Reply-To header.
X-Mailing-List linux-kernel@vger.kernel.org linux-kernel X-Mailing-List pgsql-general@postgresql.org pgsql-general
Look for postings to two mailing list which use the popular C<X-Mailing-List> header to mark all mailings.
=head1 BUGS
=over 4
=item *
Should handle more types of mailing list.
=item *
Work with Mail::Audit, somehow.
=item *
Mail storage directory should be configurable.
=back
=head1 AUTHOR
Michael Stevens - michael@etla.org.
=head1 SEE ALSO
Exim's User Interface to Mail Filtering.
=cut
All good programs have their own documentation, indicating methods of use, problems, who wrote them, and any general notes that might be relevant to the user.
And that's that. From now on, you can perform simple filtering tasks the way god intended - very simply.