[tpop3d-discuss]tpop unable to handle load

Richard Stockton tpop3d at richardleestockton.org
Wed, 08 Sep 2004 17:22:19 -0700


Hi Rich and Chris (thanks for your replies),

At 02:55 PM 9/8/2004, Rich, Whidbey Telecom NOC wrote:
>Hi Richard,
>
>We process an average of 400,000 POP3 sessions per day using tpop3d 1.4.2=
=20
>(a version we found to have no serious issues).

We had problems with that version sending duplicate emails to
Outlook clients, which is why we updated.  We appear to be
doing 200,000-250,000 POP3 sessions per day.

>Our tpop3d config specifies max-children at 30, and timeout at 120=20
>seconds. We have 2 load-balanced Dell dual-2.4 Ghz 2650's dedicated to=20
>POP, connected to an NFS storage cluster.

We have max children set to 100, and timeout set to 120 seconds.
We use multiple drives (currently 7 of them) with the user maildirs
and queues spread somewhat equally across them.  We have seen that
it is usually drive contention that slows down both SMTP and POP,
and this setup has worked very well for us, at least until this
problem "popped" up. [weak grin]

By the way, we had "max_children=3D100" in the /etc/tpop3d.conf file,
but before I edited the source, it would die at 16 connections.
That's why I searched the source, found where it was set to 16, and
changed it to 100.

>Did you say you only have a single server handling all incoming email,=20
>POP3, and storage?

Yes, but it's a pretty beefy server, and postfix (which is working
MUCH harder than tpop3d) is not having any issues.  Also, when I
run "systat -vmstat" during the problem times it looks pretty much
like it does during the non-problem times, leading me to believe
the system is not out of resources (but I could easily be wrong).


To Chris:

 > Are you sure that it's only handling c. 20 connections?

When it's having problems I keep checking with;

         ps -ajxwww | grep tpop | wc

and I have never seen more than 22.

 > What does lsof give?

A lot of stuff.  It looks like each tpop process opens a bunch
of files.  Here is example output for one user.  Note that we
don't allow shell access, so this is only the pop connection.

tpop3d  34621     mike  cwd   VDIR       4,30      512  423952=20
/m2/m/i/mike/Maildir
tpop3d  34621     mike  rtd   VDIR       4,15      512       2 /
tpop3d  34621     mike  txt   VREG       4,19   397617  158158=20
/usr/local/sbin/tpop3d
tpop3d  34621     mike  txt   VREG       4,19   101144  212005=20
/usr/libexec/ld-elf.so.1
tpop3d  34621     mike  txt   VREG       4,19   132730  155704=20
/usr/local/lib/mysql/libmysqlclient.so.10
tpop3d  34621     mike  txt   VREG       4,19    28628  123745=20
/usr/lib/libpam.so.2
tpop3d  34621     mike  txt   VREG       4,19    32532  117772=20
/usr/lib/libcrypt.so.2
tpop3d  34621     mike  txt   VREG       4,19   836892  117827=20
/usr/lib/libc.so.5
tpop3d  34621     mike  txt   VREG       4,19    54748  117964=20
/usr/lib/libz.so.2
tpop3d  34621     mike  txt   VREG       4,19   125836  117778=20
/usr/lib/libm.so.2
tpop3d  34621     mike    0u  VCHR        2,2      0t0       7 /dev/null
tpop3d  34621     mike    1u  VCHR        2,2      0t0       7 /dev/null
tpop3d  34621     mike    2u  VCHR        2,2      0t0       7 /dev/null
tpop3d  34621     mike    3u  unix 0xc7547300      0t0         ->0xc64e0400
tpop3d  34621     mike    5u  IPv4 0xc7273ccc      0t0     TCP=20
mail.example.com:54060->10.nnn.n.n:3306 (ESTABLISHED)
tpop3d  34621     mike    7u  IPv4 0xcadb05b8      0t0     TCP=20
pop3->ip-nn-nn-nnn-nn.dialup.example2.com:51580 (ESTABLISHED)

 > What timeout have you set?

120 seconds.

 > Is it possible that you have lots of quiescent connections
 > waiting to be timed out?

I don't think so.  My ps command above would show those, right?

 > Another quick check you can do (on a test server, probably...):
 >    for i in `seq 1 N` ; do nc localhost 110 & done
 >
 > where N =3D 2 =D7 max-children + 1
 >
 > this should give you exactly 2 =D7 max-children `connected'
 > log messages, and exactly one `rejected connection...' log
 > message.

I don't really have a test server to try this on.  Other than
momentarily shutting out new tpop users, what effect would this
test have on my live server?

Again, I really appreciate the help.  We have been very happy with
tpop3d up to now, and would like to continue to use it, if we can
figure out how to get this problem resolved.
  - Richard