Eric S. Raymond's former Design notes on fetchmail (2024)

Forward to Updated designnotes

Back to Fetchmail Home Page

$Date$

These notes are for the benefit of future hackers andmaintainers. The following sections are both functional andnarrative, read from beginning to end.

A direct ancestor of the fetchmail program was originallyauthored (under the name popclient) by Carl Harris<ceharris@mal.com>. I took over development in June 1996 andsubsequently renamed the program `fetchmail' to reflect theaddition of IMAP support and SMTP delivery. In early November 1996Carl officially ended support for the last popclient versions.

Before accepting responsibility for the popclient sources fromCarl, I had investigated and used and tinkered with every otherUNIX remote-mail forwarder I could find, including fetchpop1.9,PopTart-0.9.3, get-mail, gwpop, pimp-1.0, pop-perl5-1.2, popc,popmail-1.6 and upop. My major goal was to get a header-rewritefeature like fetchmail's working so I wouldn't have reply problemsanymore.

Despite having done a good bit of work on fetchpop1.9, when Ifound popclient I quickly concluded that it offered the solidestbase for future development. I was convinced of this primarily bythe presence of multiple-protocol support. The competition didn'tdo POP2/RPOP/APOP, and I was already having vague thoughts of maybeadding IMAP. (This would advance two other goals: learn IMAP andget comfortable writing TCP/IP client software.)

Until popclient 3.05 I was simply following out the implicationsof Carl's basic design. He already had daemon.c in thedistribution, and I wanted daemon mode almost as badly as I wantedthe header rewrite feature. The other things I added were bug fixesor minor extensions.

After 3.1, when I put in SMTP-forwarding support (more aboutthis below) the nature of the project changed -- it became acarefully-thought-out attempt to render obsolete every otherprogram in its class. The name change quickly followed.

MTAs ought to canonicalize the addresses of outgoing non-localmail so that From:, To:, Cc:, Bcc: and other address headerscontain only fully qualified domain names. Failure to do so canbreak the reply function on many mailers. (Sendmail has an optionto do this.)

This problem only becomes obvious when a reply is generated on amachine different from where the message was delivered. The twomachines will have different local username spaces, potentiallyleading to misrouted mail.

Most MTAs (and sendmail in particular) do not canonicalizeaddress headers in this way (violating RFC 1123). Fetchmailtherefore has to do it. This is the first feature I added to theancestral popclient.

The second thing I did reorganize and simplify popclient a lot.Carl Harris's implementation was very sound, but exhibited a kindof unnecessary complexity common to many C programmers. He treatedthe code as central and the data structures as support for thecode. As a result, the code was beautiful but the data structuredesign ad-hoc and rather ugly (at least to this old LISPhacker).

I was able to improve matters significantly by reorganizing mostof the program around the `query' data structure and eliminating abunch of global context. This especially simplified the mainsequence in fetchmail.c and was critical in enabling the daemonmode changes.

The next step was IMAP support. I initially wrote the IMAP codeas a generic query driver and a method table. The idea was to haveall the protocol-independent setup logic and flow of control in thedriver, and the protocol-specific stuff in the method table.

Once this worked, I rewrote the POP3 code to use the sameorganization. The POP2 code kept its own driver for a couple morereleases, until I found sources of a POP2 server to test against(the breed seems to be nearly extinct).

The purpose of this reorganization, of course, is to trivializethe development of support for future protocols as much aspossible. All mail-retrieval protocols have to have pretty similarlogical design by the nature of the task. By abstracting out thatcommon logic and its interface to the rest of the program, both thecommon and protocol-specific parts become easier to understand.

Furthermore, many kinds of new features can instantly besupported across all protocols by modifying the one drivermodule.

The direction of the project changed radically when HarryHochheiser sent me his scratch code for forwarding fetched mail tothe SMTP port. I realized almost immediately that a reliableimplementation of this feature would make all the other deliverymodes obsolete.

Why mess with all the complexity of configuring an MDA orsetting up lock-and-append on a mailbox when port 25 is guaranteedto be there on any platform with TCP/IP support in the first place?Especially when this means retrieved mail is guaranteed to looklike normal sender- initiated SMTP mail, which is really what wewant anyway.

Password encryption in .fetchmailrc

The reason there's no facility to store passwords encrypted inthe .fetchmailrc file is because this doesn't actually addprotection.

Anyone who's acquired the 0600 permissions needed to read your.fetchmailrc file will be able to run fetchmail as you anyway --and if it's your password they're after, they'd be able to rip thenecessary decoder out of the fetchmail code itself to get it.

All .fetchmailrc encryption would do is give a false sense ofsecurity to people who don't think very hard.

Truly concurrent queries to multiple hosts

Occasionally I get a request for this on "efficiency" grounds.These people aren't thinking either. True concurrency would donothing to lessen fetchmail's total IP volume. The best it couldpossibly do is change the usage profile to shorten the duration ofthe active part of a poll cycle at the cost of increasing itsdemand on IP volume per unit time.

If one could thread the protocol code so that fetchmail didn'tblock on waiting for a protocol response, but rather switched totrying to process another host query, one might get an efficiencygain (close to constant loading at the single-host level).

Fortunately, I've only seldom seen a server that incurredsignificant wait time on an individual response. I judge the gainfrom this not worth the hideous complexity increase it wouldrequire in the code.

Multiple concurrent instances of fetchmail

Fetchmail locking is on a per-invoking-user becausefiner-grained locks would be really hard to implement in a portableway. The problem is that you don't want two fetchmails querying thesame site for the same remote user at the same time.

To handle this optimally, multiple fetchmails would have toassociate a system-wide semaphore with each active pair of a remoteuser and host canonical address. A fetchmail would have to blockuntil getting this semaphore at the start of a query, and releaseit at the end of a query.

This would be way too complicated to do just for an "it might benice" feature. Instead, you can run a single root fetchmail pollingfor multiple users in either single-drop or multidrop mode.

The fundamental problem here is how an instance of fetchmailpolling host foo can assert that it's doing so in a way visible toall other fetchmails. System V semaphores would be ideal for thispurpose, but they're not portable.

I've thought about this a lot and roughed up several designs.All are complicated and fragile, with a bunch of the standardproblems (what happens if a fetchmail aborts before clearing itssemaphore, and how do we recover reliably?).

I'm just not satisfied that there's enough functional gain hereto pay for the large increase in complexity that adding thesesemaphores would entail.

I decided to add the multidrop support partly because some userswere clamoring for it, but mostly because I thought it would shakebugs out of the single-drop code by forcing me to deal withaddressing in full generality. And so it proved.

There are two important aspects of the features for handlingmultiple-drop aliases and mailing lists which future hackers shouldbe careful to preserve.

The logic path for single-recipient mailboxes doesn't involveheader parsing or DNS lookups at all. This is important -- it meansthe code for the most common case can be much simpler and morerobust.
The multidrop handing does not rely on doing theequivalent of passing the message to sendmail -t. Instead, itexplicitly mines members of a specified set of local usernames outof the header.
We do not attempt delivery to multidrop mailboxes inthe presence of DNS errors. Before each multidrop poll we probe DNSto see if we have a nameserver handy. If not, the poll is skipped.If DNS crashes during a poll, the error return from the nextnameserver lookup aborts message delivery and ends the poll. Thedaemon mode will then quietly spin until DNS comes up again, atwhich point it will resume delivering mail.

When I designed this support, I was terrified of doing anythingthat could conceivably cause a mail loop (you should be too).That's why the code as written can only append local names(never @-addresses) to the recipients list.

The code in mxget.c is nasty, no two ways about it. But it'sutterly necessary, there are a lot of MX pointers out there. Itreally ought to be a (documented!) entry point in the bindlibrary.

Fetchmail's behavior on DNS errors is to suppress forwarding anddeletion of the individual message that each occurs in, leaving itqueued on the server for retrieval on a subsequent poll. Theassumption is that DNS errors are transient, due to temporaryserver outages.

Unfortunately this means that if a DNS error is permanent amessage can be perpetually stuck in the server mailbox. We've had acouple bug reports of this kind due to subtle RFC822 parsing errorsin the fetchmail code that resulted in impossible things gettingpassed to the DNS lookup routines.

Alternative ways to handle the problem: ignore DNS errors(treating them as a non-match on the mailserver domain), or forwardmessages with errors to fetchmail's invoking user in addition toany other recipients. These would fit an assumption that DNS lookuperrors are likely to be permanent problems associated with anaddress.

The IPv6 support patches are really more protocol-familyindependence patches. Because of this, in most places, "ports"(numbers) have been replaced with "services" (strings, that may bedigits). This allows us to run with certain protocols that usestrings as "service names" where we in the IP world think of portnumbers. Someday we'll plumb strings all over and then, if inet6 isnot enabled, do a getservbyname() down in SocketOpen. The IPv6support patches use getaddrinfo(), which is a POSIX p1003.1gmandated function. So, in the not too distant future, we'll zap theifdefs and just let autoconf check for getaddrinfo. IPv6 supportcomes pretty much automatically once you have protocol familyindependence.

Internationalization is handled using GNU gettext (see the fileABOUT_NLS in the source distribution). This places some minorconstraints on the code.

Strings that must be subject to translation should be wrappedwith GT_() or N_() -- the former in function arguments, the latterin static initializers and other non-function-argumentcontexts.

Adding a control option is not complicated in principle, butthere are a lot of fiddly details in the process. You'll need to dothe following minimum steps.

Add a field to represent the control in structrun, struct query, or structhostdata.
Go to rcfile_y.y. Add the token to the grammar.Don't forget the %token declaration.
Pick an actual string to declare the option in the .fetchmailrcfile. Add the token to rcfile_l.
Pick a long-form option name, and a one-letter short option ifany are left. Go to options.c. Pick a newLA_ value. Hack the longoptions table toset up the association. Hack the big switch statement to set theoption. Hack the `?' message to describe it.
If the default is nonzero, set it in def_opts nearthe top of load_params infetchmail.c.
Add code to dump the option value infetchmail.c:dump_params.
For a per-site or per-user option, add properFLAG_MERGE actions in fetchmail.c's optmerge()function. For a global option, add an override at the end ofload_params; this will involve copying a "cmd_run." field to acorresponding "run." field, see the existing code for models.
Document the option in fetchmail.man. This will require atleast two changes; one to the collected table of options, and onefull text description of the option.
Hack fetchmailconf to configure it. Bump the fetchmailconfversion.
Hack conf.c to dump the option so we won't have a version-skewproblem.
Add an entry to NEWS.
If the option implements a new feature, add a note to thefeature list.

There may be other things you have to do in the way of logic, ofcourse.

Before you implement an option, though, think hard. Is there anyway to make fetchmail automatically detect the circ*mstances underwhich it should change its behavior? If so, don't write an option.Just do the check!

1. Server-side state is essential

The person(s) responsible for removing LAST from POP3 deserve tosuffer. Without it, a client has no way to know which messages in abox have been read by other means, such as an MUA running on theserver.

The POP3 UID feature described in RFC1725 to replace LAST isinsufficient. The only problem it solves is tracking which messageshave been read by this client -- and even that requirestricky, fragile implementation.

The underlying lesson is that maintaining accessible server-side`seen' state bits associated with Status headers is indispensiblein a Unix/RFC822 mail server protocol. IMAP gets this right.

2. Readable text protocol transactions are a Good Thing

A nice thing about the general class of text-based protocolsthat SMTP, POP2, POP3, and IMAP belongs to is that client/servertransactions are easy to watch and transaction code correspondinglyeasy to debug. Given a decent layer of socket utility functions(which Carl provided) it's easy to write protocol engines and nothard to show that they're working correctly.

This is an advantage not to be despised! Because of it, thisproject has been interesting and fun -- no serious or persistentbugs, no long hours spent looking for subtle pathologies.

3. IMAP is a Good Thing.

Now that there is a standard IMAP equivalent of the POP3 APOPvalidation in CRAM-MD5, POP3 is completely obsolete.

4. SMTP is the Right Thing

In retrospect it seems clear that this program (and others likeit) should have been designed to forward via SMTP from thebeginning. This lesson may be applicable to other Unix programsthat now call the local MDA/MTA as a program.

5. Syntactic noise can be your friend

The optional `noise' keywords in the rc file syntax started outas a late-night experiment. The English-like syntax they allow isconsiderably more readable than the traditional terse keyword-valuepairs you get when you strip them all out. I think there may be awider lesson here.

It is truly written: the best hacks start out as personalsolutions to the author's everyday problems, and spread because theproblem turns out to be typical for a large class of users. So itwas with Carl Harris and the ancestral popclient, and so with meand fetchmail.

It's gratifying that fetchmail has become so popular. Until justbefore 1.9 I was designing strictly to my own taste. The multi-dropmailbox support and the new --limit option were the first featuresto go in that I didn't need myself.

By 1.9, four months after I started hacking on popclient and amonth after the first fetchmail release, there were literally ahundred people on the fetchmail-friends contact list. That's prettypowerful motivation. And they were a good crowd, too, sending fixesand intelligent bug reports in volume. A user population like thatis a gift from the gods, and this is my expression ofgratitude.

The beta testers didn't know it at the time, but they were alsothe subjects of a sociological experiment. The results aredescribed in my paper, TheCathedral And The Bazaar.

Special thanks go to Carl Harris, who built a good solid codebase and then tolerated me hacking it out of recognition. And toHarry Hochheiser, who gave me the idea of the SMTP-forwardingdelivery mode.

Other significant contributors to the code have included DaveBodenstab (error.c code and --syslog), George Sipe (--monitor and--interface), Gordon Matzigkeit (netrc.c), Al Longyear (UIDLsupport), Chris Hanson (Kerberos V4 support), and Craig Metz (OPIE,IPv6, IPSEC).

At this point, the fetchmail code appears to be pretty stable.It will probably undergo substantial change only if and whensupport for a new retrieval protocol or authentication method isadded.

Not all of these describe standards explicitly used infetchmail, but they all shaped the design in one way oranother.

RFC821: SMTP protocol
RFC822: Mail header format
RFC937: Post Office Protocol - Version 2
RFC974: MX routing
RFC976: UUCP mail format
RFC1081: Post Office Protocol - Version 3
RFC1123: Host requirements (modifies 821, 822, and 974)
RFC1176: Interactive Mail Access Protocol - Version 2
RFC1203: Interactive Mail Access Protocol - Version 3
RFC1225: Post Office Protocol - Version 3
RFC1344: Implications of MIME for Internet Mail Gateways
RFC1413: Identification server
RFC1428: Transition of Internet Mail from Just-Send-8 to 8-bitSMTP/MIME
RFC1460: Post Office Protocol - Version 3
RFC1508: Generic Security Service Application Program Interface
RFC1521: MIME: Multipurpose Internet Mail Extensions
RFC1869: SMTP Service Extensions (ESMTP spec)
RFC1652: SMTP Service Extension for 8bit-MIMEtransport
RFC1725: Post Office Protocol - Version 3
RFC1730: Interactive Mail Access Protocol - Version 4
RFC1731: IMAP4 Authentication Mechanisms
RFC1732: IMAP4 Compatibility With IMAP2 And IMAP2bis
RFC1734: POP3 AUTHentication command
RFC1870: SMTP Service Extension for Message Size Declaration
RFC1891: SMTP Service Extension for Delivery Status Notifications
RFC1892: The Multipart/Report Content Type for the Reporting of MailSystem Administrative Messages
RFC1894: An Extensible Message Format for Delivery StatusNotifications
RFC1893: Enhanced Mail System Status Codes
RFC1894: An Extensible Message Format for Delivery StatusNotifications
RFC1938: A One-Time Password System
RFC1939: Post Office Protocol - Version 3
RFC1957: Some Observations on Implementations of the Post OfficeProtocol (POP3)
RFC1985: SMTP Service Extension for Remote Message Queue Starting
RFC2033: Local Mail Transfer Protocol
RFC2060: Internet Message Access Protocol - Version 4rev1
RFC2061: IMAP4 Compatibility With IMAP2bis
RFC2062: Internet Message Access Protocol - Obsolete Syntax
RFC2195: IMAP/POP AUTHorize Extension for Simple Challenge/Response
RFC2177: IMAP IDLE command
RFC2449: POP3 Extension Mechanism
RFC2554: SMTP Service Extension for Authentication
RFC2595: Using TLS with IMAP, POP3 and ACAP
RFC2645: On-Demand Mail Relay: SMTP with Dynamic IP Addresses
RFC2683: IMAP4 Implementation Recommendations
RFC2821: Simple Mail Transfer Protocol
RFC2822: Internet Message Format

http://www.faqs.org/faqs/LANs/mail-protocols/: LAN Mail Protocols Summary