This section describes the configuration file in detail.
There is one point that should be made clear immediately: the syntax of the configuration file is designed to be reasonably easy to parse, since this is done every time sendmail starts up, rather than easy for a human to read or write. On the future project list is a configuration-file compiler.
The configuration file is organized as a series of lines, each of which begins with a single character defining the semantics for the rest of the line. Lines beginning with a space or a tab are continuation lines (although the semantics are not well defined in many places). Blank lines and lines beginning with a sharp symbol (`#') are comments.
The core of address parsing are the rewriting rules. These are an ordered production system. Sendmail scans through the set of rewriting rules looking for a match on the left hand side (LHS) of the rule. When a rule matches, the address is replaced by the right hand side (RHS) of the rule.
There are several sets of rewriting rules. Some of the rewriting sets are used internally and must have specific semantics. Other rewriting sets do not have specifically assigned semantics, and may be referenced by the mailer definitions or by other rewriting sets.
The syntax of these two commands are:
Macro expansions of the form $ x are performed when the configuration file is read. Expansions of the form $& x are performed at run time using a somewhat less general algorithm. This for is intended only for referencing internally defined macros such as $h that are changed at runtime.
The left hand side of rewriting rules contains a pattern. Normal words are simply matched directly. Metasyntax is introduced using a dollar sign. The metasymbols are:
Additionally, the LHS can include $@ to match zero tokens. This is not bound to a $ n on the RHS, and is normally only used when it stands alone in order to match the null input.
When the left hand side of a rewriting rule matches, the input is deleted and replaced by the right hand side. Tokens are copied directly from the RHS unless they begin with a dollar sign. Metasymbols are:
The $ n syntax substitutes the corresponding value from a $+, $-, $*, $=, or $~ match on the LHS. It may be used anywhere.
A host name enclosed between $[ and $] is looked up in the host database(s) and replaced by the canonical name[14]. For example, $[ftp$] might become ftp.CS.Berkeley.EDU and $[[128.32.130.2]$] would become vangogh.CS.Berkeley.EDU. Sendmail recognizes it's numeric IP address without calling the name server and replaces it with it's canonical name.
The $( ... $) syntax is a more general form of lookup; it uses a named map instead of an implicit map. If no lookup is found, the indicated default is inserted; if no default is specified and no lookup matches, the value is left unchanged. The arguments are passed to the map for possible use.
The $> n syntax causes the remainder of the line to be substituted as usual and then passed as the argument to ruleset n. The final value of ruleset n then becomes the substitution for this rule. The $> syntax can only be used at the beginning of the right hand side; it can be only be preceded by $@ or $:.
The $# syntax should only be used in ruleset zero or a subroutine of ruleset zero. It causes evaluation of the ruleset to terminate immediately, and signals to sendmail that the address has completely resolved. The complete syntax is:
Normally, a rule that matches is retried, that is, the rule loops until it fails. A RHS may also be preceded by a $@ or a $: to change this behavior. A $@ prefix causes the ruleset to return with the remainder of the RHS as the value. A $: prefix causes the rule to terminate immediately, but the ruleset to continue; this can be used to avoid continued application of a rule. The prefix is stripped before continuing.
The $@ and $: prefixes may precede a $> spec; for example:
Substitution occurs in the order described, that is, parameters from the LHS are substituted, hostnames are canonicalized, subroutines are called, and finally $#, $@, and $: are processed.
There are five rewriting sets that have specific semantics. Four of these are related as depicted by figure 1.
+---+
-->| 0 |-->resolved address
/ +---+
/ +---+ +---+
/ ---->| 1 |-->| S |--
+---+ / +---+ / +---+ +---+ \ +---+
addr-->| 3 |-->| D |-- --->| 4 |-->msg
+---+ +---+ \ +---+ +---+ / +---+
--->| 2 |-->| R |--
+---+ +---+

Ruleset three should turn the address into canonical form. This form should have the basic syntax:
If no @ sign is specified, then the host-domain-spec may be appended (box D in Figure 1) from the sender address (if the C flag is set in the mailer definition corresponding to the sending mailer).
Ruleset zero is applied after ruleset three to addresses that are going to actually specify recipients. It must resolve to a {mailer, host, user} triple. The mailer must be defined in the mailer definitions from the configuration file. The host is defined into the $h macro for use in the argv expansion of the specified mailer.
Rulesets one and two are applied to all sender and recipient addresses respectively. They are applied before any specification in the mailer definition. They must never resolve.
Ruleset four is applied to all addresses in the message. It is typically used to translate internal to external form.
In addition, ruleset 5 is applied to all local addresses (specifically, those that resolve to a mailer with the `F=5' flag set) that do not have aliases. This allows a last minute hook for local names.
A few extra rulesets are defined as hooks that can be defined to get special features. They are all named rulesets. The check_* forms all give accept/reject status; falling off the end or returning normally is an accept, and resolving to $#error is a reject.
The check_relay ruleset is called after a connection is accepted. It is passed
The check_mail ruleset is passed the user name parameter of the SMTP MAIL command. It can accept or reject the address.
The check_rcpt ruleset is passed the user name parameter of the SMTP RCPT command. It can accept or reject the address.
The check_compat ruleset is passed
Some special processing occurs if the ruleset zero resolves to an IPC mailer (that is, a mailer that has [IPC] listed as the Path in the M configuration line. The host name passed after $@ has MX expansion performed; this looks the name up in DNS to find alternate delivery sites.
The host name can also be provided as a dotted quad in square brackets; for example:
The host name passed in after the $@ may also be a colon-separated list of hosts. Each is separately MX expanded and the results are concatenated to make (essentially) one long MX list. The intent here is to create fake MX records that are not published in DNS for private internal networks.
As a final special case, the host name can be passed in as a text string in square brackets:
Macros are named with a single character or with a word in {braces}. Single character names may be selected from the entire ASCII set, but user-defined macros should be selected from the set of upper case letters only. Lower case letters and special symbols are used internally. Long names beginning with a lower case letter or a punctuation character are reserved for use by sendmail, so user-defined long macro names should begin with an upper case letter.
The syntax for macro definitions is:
Macros are interpolated using the construct $ x, where x is the name of the macro to be interpolated. This interpolation is done when the configuration file is read, except in M lines. The special construct $& x can be used in R lines to get deferred interpolation.
Conditionals can be specified using the syntax:
Lower case macro names are reserved to have special semantics, used to pass information in or out of sendmail, and special characters are reserved to provide conditionals, etc. Upper case names (that is, $A through $Z) are specifically reserved for configuration file authors.
The following macros are defined and/or used internally by sendmail for interpolation into argv's for mailers or for other contexts. The ones marked * are information passed into sendmail[16], the ones marked are information passed both in and out of sendmail, and the unmarked macros are passed out of sendmail but are not otherwise used internally. These macros are:
There are three types of dates that can be used. The $a and $b macros are in RFC 822 format; $a is the time as extracted from the Date: line of the message (if there was one), and $b is the current date and time (used for postmarks). If no Date: line is found in the incoming message, $a is set to the current time also. The $d macro is equivalent to the $b macro in UNIX (ctime) format.
The macros $w, $j, and $m are set to the identity of this host. Sendmail tries to find the fully qualified name of the host if at all possible; it does this by calling gethostname(2) to get the current hostname and then passing that to gethostbyname(3) which is supposed to return the canonical version of that host name.[17] Assuming this is successful, $j is set to the fully qualified name and $m is set to the domain part of the name (everything after the first dot). The $w macro is set to the first word (everything before the first dot) if you have a level 5 or higher configuration file; otherwise, it is set to the same value as $j. If the canonification is not successful, it is imperative that the config file set $j to the fully qualified domain name[18].
The $f macro is the id of the sender as originally determined; when mailing to a specific host the $g macro is set to the address of the sender relative to the recipient. For example, if I send to bollard@matisse.CS.Berkeley.EDU from the machine vangogh.CS.Berkeley.EDU the $f macro will be eric and the $g macro will be eric@vangogh.CS.Berkeley.EDU.
The $x macro is set to the full name of the sender. This can be determined in several ways. It can be passed as flag to sendmail. It can be defined in the NAME environment variable. The third choice is the value of the Full-Name: line in the header if it exists, and the fourth choice is the comment field of a From: line. If all of these fail, and if the message is being originated locally, the full name is looked up in the /etc/passwd file.
When sending, the $h, $u, and $z macros get set to the host, user, and home directory (if local) of the recipient. The first two are set from the $@ and $: part of the rewriting rules, respectively.
The $p and $t macros are used to create unique strings (e.g., for the Message-Id: field). The $i macro is set to the queue id on this host; if put into the timestamp line it can be extremely useful for tracking messages. The $v macro is set to be the version number of sendmail; this is normally put in timestamps and has been proven extremely useful for debugging.
The $c field is set to the hop count, i.e., the number of times this message has been processed. This can be determined by the -h flag on the command line or by counting the timestamps in the message.
The $r and $s fields are set to the protocol used to communicate with sendmail and the sending hostname. They can be set together using the -p command line flag or separately using the -M or -oM flags.
The $_ is set to a validated sender host name. If the sender is running an RFC 1413 compliant IDENT server and the receiver has the IDENT protocol turned on, it will include the user name on that host.
The ${client_name}, ${client_addr}, and ${client_port} macros are set to the name, address, and port number of the SMTP client who is invoking sendmail as a server. These can be used in the check_* rulesets (using the $& deferred evaluation form, of course!).
Classes of phrases may be defined to match on the left hand side of rewriting rules, where a phrase is a sequence of characters that do not contain space characters. For example a class of all local names for this site might be created so that attempts to send to oneself can be eliminated. These can either be defined directly in the configuration file or read in from another file. Classes are named as a single letter or a word in {braces}. Class names beginning with lower case letters and special characters are reserved for system use. Classes defined in config files may be given names from the set of upper case letters for short names or beginning with an upper case letter for long names.
The syntax is:
Elements of classes can be accessed in rules using $= or $~. The $~ (match entries not in class) only matches a single word; multi-word entries in the class are ignored in this context.
Some classes have internal meaning to sendmail:
Sendmail can be compiled to allow a scanf(3) string on the F line. This lets you do simplistic parsing of text files. For example, to read all the user names in your system /etc/passwd file into a class, use
Programs and interfaces to mailers are defined in this line. The format is:
The following flags may be set in the mailer description. Any other flags may be used freely to conditionally assign headers to messages destined for particular mailers. Flags marked with * are not interpreted by the sendmail binary; these are the conventionally used to correlate to the flags portion of the H line. Flags marked with apply to the mailers for the sender address rather than the usual recipient mailers.
Configuration files prior to level 6 assume the `A', `w', `5', `:', `|', `/', and `@' options on the mailer named local.
The mailer with the special name error can be used to generate a user error. The (optional) host field is an exit status to be returned, and the user field is a message to be printed. The exit status may be numeric or one of the values USAGE, NOUSER, NOHOST, UNAVAILABLE, SOFTWARE, TEMPFAIL, PROTOCOL, or CONFIG to return the corresponding EX_ exit code, or an enhanced error code as described in RFC 1893, Enhanced Mail System Status Codes. For example, the entry:
The mailer named local must be defined in every configuration file. This is used to deliver local mail, and is treated specially in several ways. Additionally, three other mailers named prog, *file*, and *include* may be defined to tune the delivery of messages to programs, files, and :include: lists respectively. They default to:
The Sender and Recipient rewriting sets may either be a simple ruleset id or may be two ids separated by a slash; if so, the first rewriting set is applied to envelope addresses and the second is applied to headers.
The Directory is actually a colon-separated path of directories to try. For example, the definition D=$z:/ first tries to execute in the recipient's home directory; if that is not available, it tries to execute in the root of the filesystem. This is intended to be used only on the prog mailer, since some shells (such as csh) refuse to execute if they cannot read the home directory. Since the queue directory is not normally readable by unprivileged users csh scripts as recipients can fail.
The Userid specifies the default user and group id to run as, overriding the DefaultUser option (q.v.). If the S mailer flag is also specified, this is the user and group to run as in all circumstances. This may be given as user:group to set both the user and group id; either may be an integer or a symbolic name to be looked up in the passwd and group files respectively. If only a symbolic user name is specified, the group id in the passwd file for that user is used as the group id.
The Charset field is used when converting a message to MIME; this is the character set used in the Content-Type: header. If this is not set, the DefaultCharset option is used, and if that is not set, the value unknown-8bit is used. WARNING: this field applies to the sender's mailer, not the recipient's mailer. For example, if the envelope sender address lists an address on the local network and the recipient is on an external network, the character set will be set from the Charset= field for the local network mailer, not that of the external network mailer.
The Type= field sets the type information used in MIME error messages as defined by RFC 1894. It is actually three values separated by slashes: the MTA-type (that is, the description of how hosts are named), the address type (the description of e-mail addresses), and the diagnostic type (the description of error diagnostic codes). Each of these must be a registered value or begin with X-. The default is dns/rfc822/smtp.
The format of the header lines that sendmail inserts into the message are defined by