[tpop3d-discuss] Tpop 1.4.2 memory leak?: fork_child: fork: Not enough space
Rich, WhidbeyNet NOC
richs at whidbey.net
Thu, 9 Jan 2003 13:04:01 -0800
This is a multi-part message in MIME format.
------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: 7bit
Hi list,
Chris thanks for the reply as always.
Both machines are Solaris 9, which access Maildirs over NFS, and use NIS
for authentication. Tpop3d has been configured in the following way:
listen-address: 0.0.0.0
max-children: 30
log-facility: mail
mailbox: maildir:$(home)/Maildir
auth-pam-enable: yes
auth-pam-facility: login
It was compiled with (we plan to use ldap eventually):
./configure --with-openldap-root=/usr/local --enable-fcntl-locking \
--disable-flock-locking --disable-dotfile-locking
--disable-cclient-locking \
--enable-auth-pam --enable-auth-ldap --enable-mbox-maildir
--disable-mbox-bsd \
--disable-snide-comments
We've been using PAM, so I ran the pamtest.c (gcc -lpam -o pamtest
pamtest.c) for an hour on a test Solaris 9 machine. "vmstat" and
"/usr/ucb/ps" showed that the "pamtest" process consumed an average of
15 Kb of memory per second, about 1 Mb/minute. After an hour, it was
using 60mb of memory, and continued to grow.
The memory was released back to the system when the process was killed.
If left to continue, would that steady consumption of memory by
"pamtest" constitute a memory leak in pam?
I created a memory and cpu monitor for our system. It ran from 6pm last
night, to 8:30pm this morning (when the tpop3d on both machines stopped
responding again. The statistics are attached as files,
"server1stats.txt" and "server2stats.txt". However, they show no steady
progressive memory consumption that I can see.
For now, I think we may disable pam and try "auth-passwd", to see if
that helps. Are the changes in the CVS (1.5x) stable and significant
enough to put in place of our 1.4.2? Do the attached files shed any
light?
Rich Sandberg
richs@whidbey.net
Network Administrator
WhidbeyNet Network Operations
-----Original Message-----
From: Chris Lightfoot [mailto:chris@ex-parrot.com]
Sent: Thursday, January 09, 2003 1:38 AM
To: Rich, WhidbeyNET
Cc: tpop3d-discuss@lists.beasts.org
Subject: Re: [tpop3d-discuss] Tpop 1.4.2 memory leak?: fork_child: fork:
Not enough space
On Wed, Jan 08, 2003 at 05:37:57PM -0800, Rich, WhidbeyNET wrote:
> Both machines ran fine for 6.5 hours, before the tpop3d daemon could
> no
> longer fork child processes. All remaining connections were
> "disconnected" safely by tpop3d, and tpop3d continued to run, but
would
> no longer answer requests on port 110. The last error in the log was:
>
> "fork_child: fork: Not enough space"
[...]
> From looking at the code, the error is generated from a -1 return code
> when trying to call fork(). Has enyone encountered a memory leak with
> tpop3d, and has it been corrected in the CVS?
>
> This is pretty serious, as it affected two Solaris 9 machines at the
> same time, so it's unlikely a machine-specific problem. We were/are
> ready to implement tpop3d systemwide.
What authenticators are you using? While it's certainly possible that
there are memory leaks in tpop3d -- there have been in the past -- one
source of problems has also been memory leaks in the PAM libraries. But
that's mostly been on Linux; I can't remember whether I found that
Solaris PAM leaks or not.
If you are using PAM,can you download
http://ex-parrot.com/junk/pamtest.c
substitute a correct username and password for `user' and `password' in
the call to auth_pam_new_user_pass, and run the code. If it leaks
memory, you have a problem with PAM on your system.
Of course, if you're not using PAM, this may not be the problem. More
details, please.
--
``Think about the average person. Now, half the people
out there, by definition, are even dumber than that.''
(L. Ron Hubbard, attrib.)
------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
name="server1stats.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="server1stats.txt"
MAIL2 TPOP3d TROUBLE INFORMATION=0A=
=0A=
All decimal figures are percentages=0A=
=0A=
SCPU =3D Total cpu usage on the system=0A=
SMEM =3D Total memory usage on the system=0A=
Memfree =3D Total real memory free on the system (out of 512mb)=0A=
SwpFree =3D Total swap space free on the system (out of 1024mb)=0A=
TpCPU =3D Total CPU used by all Tpop processes combined=0A=
TpMem =3D Total memory used by all Tpop processed combined =0A=
TMAXCPU =3D CPU used by the parent Tpop process=0A=
TMAXMEM =3D Memory used by the parent Tpop process=0A=
#Tpop =3D Total number of Tpop processes running=0A=
=0A=
Time SCPU SMEM MemFree SwpFree TpCPU TpMEM TMAXCPU =
TMAXMEM #Tpop=0A=
01/08-20:00:01 14.2 38.3 192112 656884 9.5 6.9 8.5 4.5 7=0A=
01/08-20:00:10 13.7 38.6 191808 636264 9.8 7.0 8.2 4.5 7=0A=
01/08-20:10:11 13.6 36.4 184468 636132 10.4 6.9 8.7 4.7 8=0A=
01/08-20:15:39 10.9 37.2 184300 665652 6.9 6.6 6.4 4.9 6=0A=
01/08-20:20:10 12.6 40.2 181348 568980 8.7 7.3 7.5 4.9 9=0A=
01/08-20:30:10 10.5 35.7 187228 602696 8.1 6.9 7.3 5.1 7=0A=
01/08-20:40:11 10.6 38.0 179352 681012 7.7 7.3 7.0 5.2 7=0A=
01/08-20:50:10 9.6 37.7 184200 645388 5.9 7.1 5.1 5.4 6=0A=
01/08-21:00:11 12.8 38.8 168128 558260 8.6 8.4 7.5 5.6 9=0A=
01/08-21:10:10 11.3 38.7 166652 628328 7.3 8.2 6.6 5.9 5=0A=
01/08-21:20:10 10.1 34.9 164980 644124 7.2 7.5 6.6 5.9 6=0A=
01/08-21:30:11 9.5 44.9 158872 594612 6.2 7.9 5.1 6.1 5=0A=
01/08-21:40:10 9.6 40.5 170060 585128 6.2 8.4 4.9 6.3 7=0A=
01/08-21:50:10 11.8 35.3 170828 523624 8.7 9.3 7.2 6.4 9=0A=
01/08-22:00:11 8.0 37.4 162636 512904 6.0 8.9 4.9 6.6 8=0A=
01/08-22:10:10 8.3 37.8 169872 533240 5.7 9.7 4.6 6.7 7=0A=
01/08-22:20:10 7.9 37.2 172984 651732 5.5 7.8 4.7 6.8 4=0A=
01/08-22:30:11 8.4 37.5 169696 686824 4.9 7.5 4.5 6.9 3=0A=
01/08-22:40:10 7.0 36.9 168120 681344 4.9 7.9 4.0 7.0 4=0A=
01/08-22:50:10 9.7 43.6 169604 684668 6.0 7.8 5.9 7.2 3=0A=
01/08-23:00:11 8.1 40.5 181988 697876 5.0 8.2 4.6 7.3 4=0A=
01/08-23:10:10 4.9 35.5 178612 681648 2.8 8.2 2.6 7.4 3=0A=
01/08-23:20:10 10.6 41.0 174448 711908 5.8 7.8 5.7 7.5 2=0A=
01/08-23:30:11 6.7 39.1 174632 746076 4.2 7.9 4.2 7.6 2=0A=
01/08-23:40:10 5.1 35.8 180316 645140 3.4 8.6 2.8 7.7 4=0A=
01/08-23:50:10 5.8 35.1 174336 681360 3.6 8.3 3.3 7.7 3=0A=
01/09-00:00:11 6.2 35.2 168900 708120 3.8 8.8 3.5 7.8 3=0A=
01/09-00:10:11 5.5 36.9 187528 601696 3.8 10.0 3.2 7.9 5=0A=
01/09-00:20:10 5.8 37.0 167948 590312 4.1 10.5 3.3 8.0 5=0A=
01/09-00:30:10 7.0 34.5 163948 727204 4.7 8.4 4.7 8.0 2=0A=
01/09-00:40:11 7.4 46.6 158784 723600 3.9 8.1 3.9 8.1 1=0A=
01/09-00:50:10 6.2 36.3 182616 718752 3.8 8.6 3.3 8.2 2=0A=
01/09-01:00:10 5.9 36.5 181744 720708 3.8 8.7 3.3 8.3 2=0A=
01/09-01:10:11 5.5 35.0 177300 687732 3.7 9.0 2.9 8.3 3=0A=
01/09-01:20:10 6.8 39.7 174312 631028 3.9 9.7 2.5 8.4 5=0A=
01/09-01:30:10 5.1 35.3 190372 685136 2.6 9.2 2.2 8.5 3=0A=
01/09-01:40:10 4.9 37.5 188992 647060 2.5 9.2 2.2 8.5 3=0A=
01/09-01:50:10 5.9 37.0 188772 672844 3.9 9.6 2.9 8.6 4=0A=
01/09-02:00:10 5.5 36.9 189584 738576 3.2 8.7 3.2 8.7 1=0A=
01/09-02:10:10 6.8 41.2 188248 748612 3.3 8.8 3.3 8.8 1=0A=
01/09-02:20:11 5.1 36.5 194752 742684 3.3 9.2 3.3 8.9 2=0A=
01/09-02:30:10 5.0 40.3 194016 686640 3.2 9.5 3.0 8.9 3=0A=
01/09-02:40:10 5.1 38.5 198156 709084 2.9 9.6 2.8 9.0 3=0A=
01/09-02:50:11 5.9 35.8 199852 690388 3.1 10.0 2.9 9.1 4=0A=
01/09-03:00:10 5.3 39.9 196524 653896 3.6 9.7 3.4 9.1 3=0A=
01/09-03:10:10 6.8 37.5 197820 650904 4.6 9.5 4.1 9.2 2=0A=
01/09-03:20:11 5.9 40.4 198528 738272 3.1 9.3 3.1 9.3 1=0A=
01/09-03:30:10 4.9 35.8 200912 743276 2.9 9.3 2.9 9.3 1=0A=
01/09-03:40:10 5.9 40.6 201296 728648 3.5 9.4 3.5 9.4 1=0A=
01/09-03:50:11 4.6 40.4 199164 691540 2.0 9.8 1.9 9.5 2=0A=
01/09-04:00:10 5.7 38.3 200844 752520 2.6 9.5 2.6 9.5 1=0A=
01/09-04:10:10 5.6 37.0 203016 677836 3.0 10.4 2.7 9.6 4=0A=
01/09-04:20:10 5.4 31.3 212072 701600 3.5 10.3 3.4 9.7 3=0A=
01/09-04:30:11 4.0 29.9 219744 744828 2.2 10.2 2.1 9.7 3=0A=
01/09-04:40:10 4.9 31.6 220208 656048 3.5 11.0 3.3 9.8 5=0A=
01/09-04:50:10 4.9 35.6 218636 745980 3.1 10.2 2.9 9.9 2=0A=
01/09-05:00:11 3.7 35.1 219452 758312 2.4 10.9 2.2 10.1 3=0A=
01/09-05:10:10 5.6 34.1 225532 750068 3.3 10.1 3.3 10.1 1=0A=
01/09-05:20:10 5.2 33.6 227940 709116 3.3 10.2 3.3 10.2 1=0A=
01/09-05:30:10 4.2 37.3 231456 646508 2.9 12.3 2.5 10.3 6=0A=
01/09-05:40:11 6.4 36.0 232484 748460 3.6 10.3 3.6 10.3 1=0A=
01/09-05:50:10 6.9 36.6 228532 727636 3.8 10.4 3.8 10.4 1=0A=
01/09-06:00:10 5.4 39.3 226628 640312 3.3 11.4 3.2 10.5 3=0A=
01/09-06:10:11 8.3 39.2 225308 602504 5.0 12.2 4.6 10.6 5=0A=
01/09-06:20:10 7.2 36.7 230868 650108 5.2 11.5 5.0 10.7 3=0A=
01/09-06:30:10 6.2 38.7 224344 612556 4.4 11.2 3.9 10.8 2=0A=
01/09-06:40:11 6.1 38.7 218124 594140 4.2 12.2 3.4 10.9 4=0A=
01/09-06:50:10 6.9 42.6 213300 650732 3.9 11.4 3.9 11.0 2=0A=
01/09-07:00:10 7.6 43.5 213532 512452 4.4 12.8 3.4 11.1 5=0A=
01/09-07:10:11 9.9 43.6 211040 505692 6.1 12.9 4.8 11.3 5=0A=
01/09-07:20:10 8.7 43.2 206624 450464 6.2 14.3 5.3 11.3 7=0A=
01/09-07:30:10 8.3 41.8 184928 413116 6.7 15.6 4.4 11.5 7=0A=
01/09-07:40:11 12.7 48.4 166836 477312 9.6 17.2 6.5 11.6 8=0A=
01/09-07:50:10 9.0 41.4 160088 458252 6.8 13.9 5.6 11.7 6=0A=
01/09-08:00:11 12.7 45.2 154256 343844 9.2 14.0 8.5 11.9 6=0A=
01/09-08:10:10 12.2 50.2 149908 435616 7.5 13.8 6.8 12.0 5=0A=
=0A=
At 8:11, Tpop reported "fork_child: fork: Not enough space" but =
continued to run=0A=
and answer pop3 requests:=0A=
=0A=
01/09-08:20:10 8.5 39.7 153060 511348 6.2 13.8 5.8 12.1 5=0A=
=0A=
At 8:26, Tpop reported the error again, but then permanently stopped =
responding=0A=
to requests.=0A=
=0A=
It was 16 hours since the last failure:=0A=
=0A=
Maildir Scans: 45,170 Total bytes received: 2,439,782 Total bytes sent: =
533,903,104=0A=
=0A=
Before that, it had been 7 hours since the last failure. During that 7 =
hours, it=0A=
was our first time with tpop3d in production, so many people had to =
download=0A=
previously saved email:=0A=
=0A=
Maildir Scans: 47,115 Total bytes received: 2,673,760 Total bytes sent: =
739,178,518=0A=
------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
name="server2stats.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="server2stats.txt"
MAIL3 TPOP3d TROUBLE INFORMATION=0A=
=0A=
All decimal figures are percentages=0A=
=0A=
SCPU =3D Total cpu usage on the system=0A=
SMEM =3D Total memory usage on the system=0A=
Memfree =3D Total real memory free on the system (out of 512mb)=0A=
SwpFree =3D Total swap space free on the system (out of 1024mb)=0A=
TpCPU =3D Total CPU used by all Tpop processes combined=0A=
TpMem =3D Total memory used by all Tpop processed combined=0A=
TMAXCPU =3D CPU used by the parent Tpop process=0A=
TMAXMEM =3D Memory used by the parent Tpop process=0A=
#Tpop =3D Total number of Tpop processes running=0A=
=0A=
Time SCPU SMEM MemFree SwpFree TpCPU TpMEM TMAXCPU =
TMAXMEM #Tpop=0A=
01/08-20:02:57 9.9 41.3 107560 618048 7.2 7.2 4.7 3.1 12=0A=
01/08-20:10:10 8.2 45.3 105592 584604 6.1 7.8 3.5 3.2 14=0A=
01/08-20:15:45 10.0 38.1 116296 686852 6.5 6.3 5.8 3.3 8=0A=
01/08-20:20:11 10.5 38.2 117040 684580 6.6 6.7 5.7 3.4 8=0A=
01/08-20:30:10 10.3 39.5 106140 666300 7.7 7.5 5.8 3.5 10=0A=
01/08-20:40:11 9.9 39.9 103696 685612 6.7 6.6 5.6 3.6 7=0A=
01/08-20:50:10 13.2 38.7 102712 712640 7.3 4.9 5.8 3.7 5=0A=
01/08-21:00:10 9.0 38.1 107808 722328 5.7 4.5 5.2 3.9 3=0A=
01/08-21:10:11 9.4 37.1 108512 697420 6.3 5.5 6.0 4.0 6=0A=
01/08-21:20:10 8.4 37.4 111840 672748 5.6 6.3 4.0 4.1 8=0A=
01/08-21:30:11 9.9 39.1 104168 640404 7.7 6.1 6.6 4.2 7=0A=
01/08-21:40:10 9.1 35.1 108232 645132 6.6 6.6 5.2 4.4 8=0A=
01/08-21:50:10 9.6 44.3 93176 663928 5.4 6.0 4.8 4.5 5=0A=
01/08-22:00:11 8.8 32.2 94616 767060 6.1 4.9 6.0 4.6 2=0A=
01/08-22:10:10 8.9 31.6 96304 682276 5.6 7.2 4.3 4.7 9=0A=
01/08-22:20:10 6.5 29.7 117124 747176 4.2 5.4 3.9 4.8 3=0A=
01/08-22:30:11 7.6 34.5 120100 722296 5.2 5.8 5.0 4.9 4=0A=
01/08-22:40:10 7.8 36.4 129544 729708 4.5 5.6 4.4 5.0 3=0A=
01/08-22:50:10 7.8 39.1 120884 674736 4.3 6.4 3.9 5.1 5=0A=
01/08-23:00:11 7.5 34.7 131072 650432 5.2 6.8 3.9 5.2 6=0A=
01/08-23:10:10 6.8 37.8 127780 679172 4.7 6.7 4.4 5.3 5=0A=
01/08-23:20:10 7.1 43.5 118536 653344 3.9 6.9 2.9 5.4 6=0A=
01/08-23:30:11 5.9 34.2 121192 709192 3.8 6.7 3.3 5.5 5=0A=
01/08-23:40:10 9.0 36.5 120828 717872 5.9 6.7 4.1 5.6 3=0A=
01/08-23:50:11 9.0 38.9 112228 721204 6.2 8.0 4.5 5.7 3=0A=
01/09-00:00:10 6.1 33.9 93860 741816 3.9 6.3 3.7 5.7 3=0A=
01/09-00:10:10 5.8 34.1 126288 709164 3.7 7.0 3.1 5.8 5=0A=
01/09-00:20:11 5.7 36.2 120872 700712 3.2 6.5 2.9 5.9 3=0A=
01/09-00:30:10 3.6 34.1 115228 721184 2.3 6.6 2.1 6.0 3=0A=
01/09-00:40:10 6.0 37.5 119508 735552 3.2 6.6 2.7 6.0 3=0A=
01/09-00:50:11 4.7 35.0 121824 770260 2.4 6.4 2.3 6.1 2=0A=
01/09-01:00:10 3.7 35.4 123688 732060 1.6 6.4 1.6 6.1 2=0A=
01/09-01:10:10 3.8 33.3 131320 768088 2.3 6.2 2.3 6.2 1=0A=
01/09-01:20:10 5.9 36.6 148168 763112 3.5 6.3 3.5 6.3 1=0A=
01/09-01:30:11 5.5 34.5 147296 773176 3.8 6.3 3.8 6.3 1=0A=
01/09-01:40:10 5.8 34.6 144492 762932 3.7 6.4 3.7 6.4 1=0A=
01/09-01:50:10 5.4 33.1 144412 729136 3.5 7.1 3.3 6.5 3=0A=
01/09-02:00:11 6.3 37.9 140356 725428 4.3 6.9 3.3 6.5 2=0A=
01/09-02:10:10 5.4 34.2 143268 739460 3.3 7.2 3.1 6.6 3=0A=
01/09-02:20:10 4.2 36.3 143396 725784 1.9 7.0 1.9 6.7 2=0A=
01/09-02:30:11 7.0 37.3 143996 755844 3.4 6.7 3.4 6.7 1=0A=
01/09-02:40:10 7.0 36.4 145052 757796 4.0 7.1 3.8 6.8 2=0A=
01/09-02:50:10 4.7 37.5 143956 727892 3.0 7.1 2.7 6.8 2=0A=
01/09-03:00:11 6.1 36.3 147284 755160 2.7 6.9 2.7 6.9 1=0A=
01/09-03:10:10 4.9 34.9 146200 722660 3.1 7.3 3.1 7.0 2=0A=
01/09-03:20:10 5.9 40.3 144324 747300 2.2 7.0 2.2 7.0 1=0A=
01/09-03:30:11 4.2 36.3 143812 701296 2.1 7.7 2.1 7.1 3=0A=
01/09-03:40:10 5.2 34.0 145096 700756 3.2 8.0 2.3 7.1 4=0A=
01/09-03:50:10 4.7 36.0 143704 764952 2.5 7.4 2.5 7.2 2=0A=
01/09-04:00:11 3.2 37.4 141816 751792 1.4 7.2 1.4 7.2 1=0A=
01/09-04:10:10 4.6 37.2 142076 726680 2.9 7.9 2.5 7.3 3=0A=
01/09-04:20:10 5.7 38.4 141492 720620 2.9 7.6 2.8 7.3 2=0A=
01/09-04:30:10 4.6 39.0 144672 751040 2.4 7.4 2.4 7.4 1=0A=
01/09-04:40:11 5.4 30.9 152672 687820 2.9 8.0 1.8 7.4 3=0A=
01/09-04:50:10 3.5 35.0 157552 727144 1.7 7.8 1.7 7.5 2=0A=
01/09-05:00:10 3.9 36.7 160448 746784 2.3 7.5 2.3 7.5 1=0A=
01/09-05:10:11 4.7 36.2 161660 708020 3.0 8.5 2.7 7.6 4=0A=
01/09-05:20:10 5.6 34.8 162980 746308 3.0 7.7 3.0 7.7 1=0A=
01/09-05:30:10 4.8 35.5 160856 692216 3.2 8.3 2.9 7.7 3=0A=
01/09-05:40:11 5.6 39.1 162072 712592 3.0 8.2 3.0 7.8 2=0A=
01/09-05:50:10 5.2 36.0 161684 617940 2.6 9.2 1.3 7.8 5=0A=
01/09-06:00:10 5.1 38.2 160660 644996 2.8 8.9 2.1 7.9 4=0A=
01/09-06:10:10 6.6 37.0 161160 676452 4.6 8.9 4.2 8.0 4=0A=
01/09-06:20:11 4.4 35.6 189556 756224 2.2 8.0 2.2 8.0 1=0A=
01/09-06:30:10 5.7 39.7 187404 721988 2.5 8.4 2.4 8.1 2=0A=
01/09-06:40:10 6.5 40.5 186456 602884 4.5 9.1 3.9 8.2 4=0A=
01/09-06:50:11 6.7 41.4 190240 637064 3.5 8.9 3.4 8.3 3=0A=
01/09-07:00:10 7.8 38.8 190248 487100 4.9 10.6 3.6 8.4 7=0A=
01/09-07:10:10 7.5 43.5 179244 535592 5.1 11.6 3.5 8.4 7=0A=
01/09-07:20:11 8.9 42.8 172116 577400 5.4 9.5 4.7 8.6 4=0A=
01/09-07:30:10 8.3 38.8 164936 660680 6.0 9.6 5.5 8.6 4=0A=
01/09-07:40:10 8.8 38.0 160412 569252 6.1 10.5 5.0 8.7 6=0A=
01/09-07:50:11 9.1 39.4 153904 467380 6.7 11.2 5.4 8.9 8=0A=
01/09-08:00:10 11.6 45.6 141816 465108 8.7 12.3 7.3 9.0 9=0A=
01/09-08:10:11 12.8 40.6 135628 616040 8.4 10.0 8.1 9.1 4=0A=
01/09-08:20:10 13.6 44.6 147776 610744 9.2 9.6 9.0 9.3 2=0A=
=0A=
8:28 Tpop3d stopped responding:=0A=
=0A=
Jan 9 08:28:10 mail2 tpop3d[29107]: [ID 702911 mail.error] fork_child: =
fork: Not enough space=0A=
=0A=
Tpop3d statistics over 16 hours, since the last crash:=0A=
=0A=
Maildir Scans: 47,765 Total bytes received: 2,610,901 Total bytes =
written: 586,167,434=0A=
=0A=
Before that, it had been 7 hours since the last failure. During that 7 =
hours, it=0A=
was our first time with tpop3d in production, so many people had to =
download=0A=
previously saved email:=0A=
=0A=
Sorry, log file was overwritten.=0A=
------=_NextPart_000_0023_01C2B7DF.94830640--