RES: RES: RES: [tpop3d-discuss]owing to high load - tpop3d

ana.weidlich@procempa.com.br ana.weidlich at procempa.com.br
Tue, 24 Jan 2006 10:09:54 -0300


Hi Chris,

The problem is back. At 9 PM the load from users was incresead and the
clients returns a "timeout message". We made the 2 changes in the netloop.c
(10*,  and the new test line).

The "pgrep tpop3d | wc -l" is returning 10.
The "netstat -an|grep 110 |wc" is returning     425    2551   33980

The messages "owing to high load" and "net_loop: accept: Interrupted system
call" don?t appear anymore, but the timeout in the client software still
occurs.

In the tpop3d log appears:

Jan 24 09:44:10 pwmail tpop3d[19006]: net_loop: timed out client
[10]201.40.148.161/pwmail.procempa.com.br
Jan 24 09:44:44 pwmail tpop3d[13842]: net_loop: timed out client
[168]joritter(200.169.25.133)
Jan 24 09:45:50 pwmail tpop3d[19006]: net_loop: timed out client
[44]201.11.235.109/pwmail.procempa.com.br
Jan 24 09:47:00 pwmail tpop3d[14503]: net_loop: timed out client
[213]alvicio(200.169.24.102)
Jan 24 09:47:29 pwmail tpop3d[19006]: net_loop: accept: Success

Do you have anymore idea what I need to do? In the old server Red Hat we
used the same version from the pop. In the Debian server the timed out
problem begins.

Is there a new version of the netloop.c? Our version is the next, from the
site.

/*
 * netloop.c:
 * Network event loop for tpop3d.
 *
 * Copyright (c) 2002 Chris Lightfoot. All rights reserved.
 * Email: chris@ex-parrot.com; WWW: http://www.ex-parrot.com/~chris/
 *
 */

static const char rcsid[] = "$Id: netloop.c,v 1.10 2003/11/24 19:58:28 chris
Exp $";

Tks,
Ana.


-----Mensagem original-----
De: Chris Lightfoot [mailto:chris@sphinx.mythic-beasts.com]Em nome de
Chris Lightfoot
Enviada em: segunda-feira, 23 de janeiro de 2006 22:20
Para: ana.weidlich@procempa.com.br
Cc: tpop3d-discuss@lists.beasts.org
Assunto: Re: RES: RES: [tpop3d-discuss]owing to high load - tpop3d


On Mon, Jan 23, 2006 at 11:01:55PM -0300, ana.weidlich@procempa.com.br
wrote:
> Chris,
> We made the change in netloop.c. But now is 11 PM and the load is not so
> high...
> Another question is what is the message "net_loop: accept: Interrupted
> system call"? This message  yet happens, after the change to 10* in
> netloop.c. Is it normal?

should be harmless -- it just means that a signal
(presumably SIGCHLD) was received in accept(2). Apply this
patch:

diff -u -r1.13 netloop.c
--- netloop.c   5 Oct 2004 11:51:21 -0000       1.13
+++ netloop.c   24 Jan 2006 01:19:35 -0000
@@ -183,7 +183,7 @@
                 }
             }
 
-            if (errno != EAGAIN)
+            if (errno != EAGAIN && errno != EINTR)
                 log_print(LOG_ERR, "net_loop: accept: %m");
             
         }


> Jan 23 22:57:13 pwmail tpop3d[6682]: connections_post_select: client
> [6]glaicon(201.11.245.134): disconnected; 81/10241 bytes read/written
> Jan 23 22:57:14 pwmail tpop3d[6710]: connections_post_select: client
> [3]phasecom(200.213.42.217): finished session for `phasecom' with
> passwd+cache
> Jan 23 22:57:14 pwmail tpop3d[6710]: connections_post_select: client
> [3]phasecom(200.213.42.217): disconnected; 109/2251 bytes read/written
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [6]200.169.22.120/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
> Jan 23 22:57:14 pwmail tpop3d[29188]: net_loop: accept: Interrupted system
> call
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [7]200.169.22.120/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [8]200.169.31.17/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
> 
> Tks,
> Ana.
> 
> -----Mensagem original-----
> De: Chris Lightfoot [mailto:chris@sphinx.mythic-beasts.com]Em nome de
> Chris Lightfoot
> Enviada em: segunda-feira, 23 de janeiro de 2006 21:06
> Para: ana.weidlich@procempa.com.br
> Cc: tpop3d-discuss@lists.beasts.org
> Assunto: Re: RES: [tpop3d-discuss]owing to high load - tpop3d
> 
> 
> On Mon, Jan 23, 2006 at 10:00:13PM -0300, ana.weidlich@procempa.com.br
> wrote:
> > Hi Chris,
> > The command "pgrep tpop3d | wc -l" returns between 10 and 20. 
> > But the command "netstat -an|grep 110 |wc" returns     509    3054
40720
> > We think that values are so high. But we dont see a direct relation with
> the
> > max-children.
> > We have about 14.000 mailboxes.
> 
> hmm... So that suggests that you have ~100 users connected
> and authenticated, and about 400 in the authentication
> phase. Try bumping up the 2* to 10* in netloop.c as
> described, and see what happens.
> 

-- 
``You have to be careful with referendums;
  they don't always give the result you want''
  (Trevor Phillips, in a London mayoral debate)