RES: RES: RES: [tpop3d-discuss]owing to high load - tpop3d
ana.weidlich@procempa.com.br
ana.weidlich at procempa.com.br
Tue, 24 Jan 2006 15:51:37 -0300
Chris,
We are using the last patches from your site, but using your clue, the 2* to
10* in netloop.c.
But don?t resolve.
In the log we have too much "connection closed by peer":
Jan 24 15:48:12 pwmail tpop3d[29797]: ioabs_tcp_post_select: client
[14]200.208.131.203/pwmail.procempa.com.br: connection closed by peer
Jan 24 15:48:12 pwmail tpop3d[29797]: ioabs_tcp_post_select: client
[20]201.25.46.126/pwmail.procempa.com.br: connection closed by peer
Jan 24 15:48:13 pwmail tpop3d[29797]: ioabs_tcp_post_select: client
[22]200.175.93.198/pwmail.procempa.com.br: connection closed by peer
Jan 24 15:48:13 pwmail tpop3d[10818]: ioabs_tcp_post_select: client
[18]snetto(200.203.33.136): connection closed by peer
"pgrep tpop3d | wc" is 6 6 34
"netstat -an|grep 110 |wc" is 575 3451 45980
The number 575 is increasing.
The only change we do was migrate from a Red Hat server to a better Debian
server. Could it be a problem? Tpop3d with Debian? Need some special
configuration?
Tks,
Ana.
-----Mensagem original-----
De: Ana Luiza Moura Weidlich
Enviada em: terca-feira, 24 de janeiro de 2006 10:10
Para: 'Chris Lightfoot'
Cc: tpop3d-discuss@lists.beasts.org
Assunto: RES: RES: RES: [tpop3d-discuss]owing to high load - tpop3d
Prioridade: Alta
Hi Chris,
The problem is back. At 9 PM the load from users was incresead and the
clients returns a "timeout message". We made the 2 changes in the netloop.c
(10*, and the new test line).
The "pgrep tpop3d | wc -l" is returning 10.
The "netstat -an|grep 110 |wc" is returning 425 2551 33980
The messages "owing to high load" and "net_loop: accept: Interrupted system
call" don?t appear anymore, but the timeout in the client software still
occurs.
In the tpop3d log appears:
Jan 24 09:44:10 pwmail tpop3d[19006]: net_loop: timed out client
[10]201.40.148.161/pwmail.procempa.com.br
Jan 24 09:44:44 pwmail tpop3d[13842]: net_loop: timed out client
[168]joritter(200.169.25.133)
Jan 24 09:45:50 pwmail tpop3d[19006]: net_loop: timed out client
[44]201.11.235.109/pwmail.procempa.com.br
Jan 24 09:47:00 pwmail tpop3d[14503]: net_loop: timed out client
[213]alvicio(200.169.24.102)
Jan 24 09:47:29 pwmail tpop3d[19006]: net_loop: accept: Success
Do you have anymore idea what I need to do? In the old server Red Hat we
used the same version from the pop. In the Debian server the timed out
problem begins.
Is there a new version of the netloop.c? Our version is the next, from the
site.
/*
* netloop.c:
* Network event loop for tpop3d.
*
* Copyright (c) 2002 Chris Lightfoot. All rights reserved.
* Email: chris@ex-parrot.com; WWW: http://www.ex-parrot.com/~chris/
*
*/
static const char rcsid[] = "$Id: netloop.c,v 1.10 2003/11/24 19:58:28 chris
Exp $";
Tks,
Ana.
-----Mensagem original-----
De: Chris Lightfoot [mailto:chris@sphinx.mythic-beasts.com]Em nome de
Chris Lightfoot
Enviada em: segunda-feira, 23 de janeiro de 2006 22:20
Para: ana.weidlich@procempa.com.br
Cc: tpop3d-discuss@lists.beasts.org
Assunto: Re: RES: RES: [tpop3d-discuss]owing to high load - tpop3d
On Mon, Jan 23, 2006 at 11:01:55PM -0300, ana.weidlich@procempa.com.br
wrote:
> Chris,
> We made the change in netloop.c. But now is 11 PM and the load is not so
> high...
> Another question is what is the message "net_loop: accept: Interrupted
> system call"? This message yet happens, after the change to 10* in
> netloop.c. Is it normal?
should be harmless -- it just means that a signal
(presumably SIGCHLD) was received in accept(2). Apply this
patch:
diff -u -r1.13 netloop.c
--- netloop.c 5 Oct 2004 11:51:21 -0000 1.13
+++ netloop.c 24 Jan 2006 01:19:35 -0000
@@ -183,7 +183,7 @@
}
}
- if (errno != EAGAIN)
+ if (errno != EAGAIN && errno != EINTR)
log_print(LOG_ERR, "net_loop: accept: %m");
}
> Jan 23 22:57:13 pwmail tpop3d[6682]: connections_post_select: client
> [6]glaicon(201.11.245.134): disconnected; 81/10241 bytes read/written
> Jan 23 22:57:14 pwmail tpop3d[6710]: connections_post_select: client
> [3]phasecom(200.213.42.217): finished session for `phasecom' with
> passwd+cache
> Jan 23 22:57:14 pwmail tpop3d[6710]: connections_post_select: client
> [3]phasecom(200.213.42.217): disconnected; 109/2251 bytes read/written
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [6]200.169.22.120/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
> Jan 23 22:57:14 pwmail tpop3d[29188]: net_loop: accept: Interrupted system
> call
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [7]200.169.22.120/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
> Jan 23 22:57:14 pwmail tpop3d[29188]: listeners_post_select: client
> [8]200.169.31.17/pwmail.procempa.com.br: connected to local address
> 200.248.222.108:110
>
> Tks,
> Ana.
>
> -----Mensagem original-----
> De: Chris Lightfoot [mailto:chris@sphinx.mythic-beasts.com]Em nome de
> Chris Lightfoot
> Enviada em: segunda-feira, 23 de janeiro de 2006 21:06
> Para: ana.weidlich@procempa.com.br
> Cc: tpop3d-discuss@lists.beasts.org
> Assunto: Re: RES: [tpop3d-discuss]owing to high load - tpop3d
>
>
> On Mon, Jan 23, 2006 at 10:00:13PM -0300, ana.weidlich@procempa.com.br
> wrote:
> > Hi Chris,
> > The command "pgrep tpop3d | wc -l" returns between 10 and 20.
> > But the command "netstat -an|grep 110 |wc" returns 509 3054
40720
> > We think that values are so high. But we dont see a direct relation with
> the
> > max-children.
> > We have about 14.000 mailboxes.
>
> hmm... So that suggests that you have ~100 users connected
> and authenticated, and about 400 in the authentication
> phase. Try bumping up the 2* to 10* in netloop.c as
> described, and see what happens.
>
--
``You have to be careful with referendums;
they don't always give the result you want''
(Trevor Phillips, in a London mayoral debate)