[tpop3d-discuss]tpop unable to handle load

Chris Lightfoot chris at ex-parrot.com
Thu, 9 Sep 2004 01:28:52 +0100


On Wed, Sep 08, 2004 at 05:22:19PM -0700, Richard Stockton wrote:
    [...]
> Yes, but it's a pretty beefy server, and postfix (which is working
> MUCH harder than tpop3d) is not having any issues.  Also, when I
> run "systat -vmstat" during the problem times it looks pretty much
> like it does during the non-problem times, leading me to believe
> the system is not out of resources (but I could easily be wrong).

The messages tpop3d is logging don't indicate a true
resource exhaustion, just that it's hit the configured
limit.

> > Are you sure that it's only handling c. 20 connections?
> 
> When it's having problems I keep checking with;
> 
>         ps -ajxwww | grep tpop | wc
> 
> and I have never seen more than 22.

ok... that's fairly conclusive.

> > What does lsof give?
> 
> A lot of stuff.  It looks like each tpop process opens a bunch
> of files.  Here is example output for one user.  Note that we
> don't allow shell access, so this is only the pop connection.

ok, just going through these:

> tpop3d  34621     mike  cwd   VDIR       4,30      512  423952 /m2/m/i/mike/Maildir

user's mailbox;

> tpop3d  34621     mike  rtd   VDIR       4,15      512       2 /

current directory (/ because tpop3d is a daemon);

    [... various mappings of libraries etc.... ]
> tpop3d  34621     mike    0u  VCHR        2,2      0t0       7 /dev/null
> tpop3d  34621     mike    1u  VCHR        2,2      0t0       7 /dev/null
> tpop3d  34621     mike    2u  VCHR        2,2      0t0       7 /dev/null

standard input, output and error;

> tpop3d  34621     mike    3u  unix 0xc7547300      0t0         ->0xc64e0400

unknown, perhaps something to do with the resolver or name
service switch;

> tpop3d  34621     mike    5u  IPv4 0xc7273ccc      0t0     TCP mail.example.com:54060->10.nnn.n.n:3306 (ESTABLISHED)

TCP connection to client;

> tpop3d  34621     mike    7u  IPv4 0xcadb05b8      0t0     TCP pop3->ip-nn-nn-nnn-nn.dialup.example2.com:51580 (ESTABLISHED)

connection to MySQL.

> > What timeout have you set?
> 
> 120 seconds.
> 
> > Is it possible that you have lots of quiescent connections
> > waiting to be timed out?
> 
> I don't think so.  My ps command above would show those, right?

yes, it should.

> > Another quick check you can do (on a test server, probably...):
> >    for i in `seq 1 N` ; do nc localhost 110 & done
> >
> > where N = 2 × max-children + 1
> >
> > this should give you exactly 2 × max-children `connected'
> > log messages, and exactly one `rejected connection...' log
> > message.
> 
> I don't really have a test server to try this on.  Other than
> momentarily shutting out new tpop users, what effect would this
> test have on my live server?

If it has any other effect, it's a bug!

If you do the USER+PASS version I sent, you'll get a bit
of a load spike as tpop3d opens all those maildirs.

> Again, I really appreciate the help.

np.

-- 
``Crash programs fail because they are based on theory that, 
  with nine women pregnant, you can get a baby in a month.''
  (Wernher von Braun)