[tpop3d-discuss] tpop3d, version 1.5.3

Chris Lightfoot chris at ex-parrot.com
Tue, 25 Nov 2003 12:28:20 +0000


On Tue, Nov 25, 2003 at 01:53:11PM +0300, Odhiambo Washington wrote:
> A few questions: 
>               1. Where does tpop3d write the cached info? As in what is
>                  the path of the destination file?

The authentication cache is in-memory only.

>               2. How do I measure the improvement created by using the
>                  new auth cacheing system? I have one of those busy
>                  servers and I've enabled this cache mechanism.

Well... you're only likely to see an improvement in
situations where there are several incoming connections
per second. It should be possible to measure the run time
of tpop3d (as reported by ps or whatever) as a function of
the number of authentications, but I haven't tried
measuring this.

>               3. On my servers, tpop3d has never been a resource hog
>                  before. Now, how to I gauge what a 'heavy load' situation is?
>                  Chris has noted that one of the functionality pluses in the new
>                  version is "Improved performance under heavy load".

This merits some discussion (and it's too technical to go
on my web log...).


The daemon process in tpop3d is single-threaded. It spends
its time running round a select loop, visiting each file
descriptor in turn. It needs to pay attention to sockets
connected to clients, and to the listening sockets on
which new connections are accepted. Basically the main
loop is this:

    while (1) {
        int s;

        select(...);
        
        while (-1 != (s = accept(listening_socket))) {
            /* accept new connection on s */
            send_banner_to_client(s);
            do_something_with_new_client(s);
        }

        for each connected client_socket
            while (-1 != (n = read(client_socket, ...)))
                /* process commands from client */
                do_something_with_client_commands(client_socket, ...);
    }

Now suppose that we have lots of connected clients, who
are doing authentication, and lots of clients whose
connections have not been completed with accept(2). 

Now, because authentication can be a time-consuming
process (functions like crypt(3) are *designed* to be time
consuming, to make brute force password searches hard),
the second part of the loop can take a while when there
are many clients connected who have sent USER and PASS
commands. While the second part is running, no new
connections are processed by tpop3d, though the actual TCP
connections are established. And new connections can still
be made while commands from existing clients are being
processed.

So under very heavy load, tpop3d could alternate between
spending a long time accepting new connections, and a long
time processing commands from existing connections. This
can be confusing for users, especially if (as Larry Chance
was reporting) the machine is so busy that there is a
delay of many tens of seconds between connecting and
getting a prompt from tpop3d.

The fix in 1.5.3 is to allow the two halves of the above
loop to run for only a fixed amount of time (two seconds,
by default). So tpop3d will spend up to two seconds
accepting connections, then up to two seconds processing
commands, then up to two seconds accepting connections,
....

(Actually that's not quite true. It spends up to two
seconds per listener accepting connections.)

Note that this isn't a performance improvement in the
sense that the authentication cache is -- tpop3d still
does exactly as much work (slightly more if you count
checking elapsed time) -- but in the sense that it
improves *apparent* performance for clients. We do just as
much work, but get to choose when we do it.

(These changes are in netloop.c. Look for references to
the macro LATENCY.)

The authentication cache should yield a real improvement.
The way this works is that we save a message digest of the
information supplied by the user, and save a copy of the
authentication information in a table under that digest
value. When a new connection comes along, we look up the
results in the cache first, and return them instead of
actually going to the database or PAM or whatever. When
this happens you will log messages like

  fork_child:... began session for `...' with pam+cache:...

or mysql+cache or whatever.

(Observe that this whole approach is pretty broken, since
if a user changes their password, tpop3d won't know
anything about it until the cache entry expires, by
default an hour later. But typically users spend their
time checking mail, not changing their passwords, so this
isn't a very great problem.)

This code is in authcache.c.

I should repeat that the authentication cache code is
experimental.

-- 
     Roy Hudd: I've just done this radio show where I never met any of the
               other actors and I didn't understand what any of it was about.
Stephen Moore: Ah, yes, I expect that's the thing I'm in.
                             (describing `The Hitchhikers' Guide To The Galaxy')