[tpop3d-discuss]tpop unable to handle load
Richard Stockton
tpop3d at richardleestockton.org
Wed, 08 Sep 2004 17:22:19 -0700
Hi Rich and Chris (thanks for your replies),
At 02:55 PM 9/8/2004, Rich, Whidbey Telecom NOC wrote:
>Hi Richard,
>
>We process an average of 400,000 POP3 sessions per day using tpop3d 1.4.2=
=20
>(a version we found to have no serious issues).
We had problems with that version sending duplicate emails to
Outlook clients, which is why we updated. We appear to be
doing 200,000-250,000 POP3 sessions per day.
>Our tpop3d config specifies max-children at 30, and timeout at 120=20
>seconds. We have 2 load-balanced Dell dual-2.4 Ghz 2650's dedicated to=20
>POP, connected to an NFS storage cluster.
We have max children set to 100, and timeout set to 120 seconds.
We use multiple drives (currently 7 of them) with the user maildirs
and queues spread somewhat equally across them. We have seen that
it is usually drive contention that slows down both SMTP and POP,
and this setup has worked very well for us, at least until this
problem "popped" up. [weak grin]
By the way, we had "max_children=3D100" in the /etc/tpop3d.conf file,
but before I edited the source, it would die at 16 connections.
That's why I searched the source, found where it was set to 16, and
changed it to 100.
>Did you say you only have a single server handling all incoming email,=20
>POP3, and storage?
Yes, but it's a pretty beefy server, and postfix (which is working
MUCH harder than tpop3d) is not having any issues. Also, when I
run "systat -vmstat" during the problem times it looks pretty much
like it does during the non-problem times, leading me to believe
the system is not out of resources (but I could easily be wrong).
To Chris:
> Are you sure that it's only handling c. 20 connections?
When it's having problems I keep checking with;
ps -ajxwww | grep tpop | wc
and I have never seen more than 22.
> What does lsof give?
A lot of stuff. It looks like each tpop process opens a bunch
of files. Here is example output for one user. Note that we
don't allow shell access, so this is only the pop connection.
tpop3d 34621 mike cwd VDIR 4,30 512 423952=20
/m2/m/i/mike/Maildir
tpop3d 34621 mike rtd VDIR 4,15 512 2 /
tpop3d 34621 mike txt VREG 4,19 397617 158158=20
/usr/local/sbin/tpop3d
tpop3d 34621 mike txt VREG 4,19 101144 212005=20
/usr/libexec/ld-elf.so.1
tpop3d 34621 mike txt VREG 4,19 132730 155704=20
/usr/local/lib/mysql/libmysqlclient.so.10
tpop3d 34621 mike txt VREG 4,19 28628 123745=20
/usr/lib/libpam.so.2
tpop3d 34621 mike txt VREG 4,19 32532 117772=20
/usr/lib/libcrypt.so.2
tpop3d 34621 mike txt VREG 4,19 836892 117827=20
/usr/lib/libc.so.5
tpop3d 34621 mike txt VREG 4,19 54748 117964=20
/usr/lib/libz.so.2
tpop3d 34621 mike txt VREG 4,19 125836 117778=20
/usr/lib/libm.so.2
tpop3d 34621 mike 0u VCHR 2,2 0t0 7 /dev/null
tpop3d 34621 mike 1u VCHR 2,2 0t0 7 /dev/null
tpop3d 34621 mike 2u VCHR 2,2 0t0 7 /dev/null
tpop3d 34621 mike 3u unix 0xc7547300 0t0 ->0xc64e0400
tpop3d 34621 mike 5u IPv4 0xc7273ccc 0t0 TCP=20
mail.example.com:54060->10.nnn.n.n:3306 (ESTABLISHED)
tpop3d 34621 mike 7u IPv4 0xcadb05b8 0t0 TCP=20
pop3->ip-nn-nn-nnn-nn.dialup.example2.com:51580 (ESTABLISHED)
> What timeout have you set?
120 seconds.
> Is it possible that you have lots of quiescent connections
> waiting to be timed out?
I don't think so. My ps command above would show those, right?
> Another quick check you can do (on a test server, probably...):
> for i in `seq 1 N` ; do nc localhost 110 & done
>
> where N =3D 2 =D7 max-children + 1
>
> this should give you exactly 2 =D7 max-children `connected'
> log messages, and exactly one `rejected connection...' log
> message.
I don't really have a test server to try this on. Other than
momentarily shutting out new tpop users, what effect would this
test have on my live server?
Again, I really appreciate the help. We have been very happy with
tpop3d up to now, and would like to continue to use it, if we can
figure out how to get this problem resolved.
- Richard