[tpop3d-discuss] Tpop 1.4.2 memory leak?: fork_child: fork: Not enough space

Rich, WhidbeyNet NOC richs at whidbey.net
Thu, 9 Jan 2003 13:04:01 -0800


This is a multi-part message in MIME format.

------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
	charset="Windows-1252"
Content-Transfer-Encoding: 7bit

Hi list, 

Chris thanks for the reply as always.

Both machines are Solaris 9, which access Maildirs over NFS, and use NIS
for authentication. Tpop3d has been configured in the following way:

listen-address: 0.0.0.0
max-children: 30
log-facility: mail
mailbox: maildir:$(home)/Maildir
auth-pam-enable: yes
auth-pam-facility: login

It was compiled with (we plan to use ldap eventually):

./configure --with-openldap-root=/usr/local --enable-fcntl-locking \
--disable-flock-locking --disable-dotfile-locking
--disable-cclient-locking \
--enable-auth-pam --enable-auth-ldap --enable-mbox-maildir
--disable-mbox-bsd \
--disable-snide-comments

We've been using PAM, so I ran the pamtest.c (gcc -lpam -o pamtest
pamtest.c) for an hour on a test Solaris 9 machine. "vmstat" and
"/usr/ucb/ps" showed that the "pamtest" process consumed an average of
15 Kb of memory per second, about 1 Mb/minute. After an hour, it was
using 60mb of memory, and continued to grow.

The memory was released back to the system when the process was killed.
If left to continue, would that steady consumption of memory by
"pamtest" constitute a memory leak in pam?

I created a memory and cpu monitor for our system. It ran from 6pm last
night, to 8:30pm this morning (when the tpop3d on both machines stopped
responding again. The statistics are attached as files,
"server1stats.txt" and "server2stats.txt". However, they show no steady
progressive memory consumption that I can see.

For now, I think we may disable pam and try "auth-passwd", to see if
that helps. Are the changes in the CVS (1.5x) stable and significant
enough to put in place of our 1.4.2? Do the attached files shed any
light?

Rich Sandberg
richs@whidbey.net
Network Administrator
WhidbeyNet Network Operations


-----Original Message-----
From: Chris Lightfoot [mailto:chris@ex-parrot.com] 
Sent: Thursday, January 09, 2003 1:38 AM
To: Rich, WhidbeyNET
Cc: tpop3d-discuss@lists.beasts.org
Subject: Re: [tpop3d-discuss] Tpop 1.4.2 memory leak?: fork_child: fork:
Not enough space


On Wed, Jan 08, 2003 at 05:37:57PM -0800, Rich, WhidbeyNET wrote:
> Both machines ran fine for 6.5 hours, before the tpop3d daemon could 
> no
> longer fork child processes. All remaining connections were 
> "disconnected" safely by tpop3d, and tpop3d continued to run, but
would 
> no longer answer requests on port 110. The last error in the log was:
> 
> "fork_child: fork: Not enough space"
    [...]
> From looking at the code, the error is generated from a -1 return code
> when trying to call fork().  Has enyone encountered a memory leak with

> tpop3d, and has it been corrected in the CVS?
> 
> This is pretty serious, as it affected two Solaris 9 machines at the
> same time, so it's unlikely a machine-specific problem. We were/are 
> ready to implement tpop3d systemwide.

What authenticators are you using? While it's certainly possible that
there are memory leaks in tpop3d -- there have been in the past -- one
source of problems has also been memory leaks in the PAM libraries. But
that's mostly been on Linux; I can't remember whether I found that
Solaris PAM leaks or not.

If you are using PAM,can you download
    http://ex-parrot.com/junk/pamtest.c
substitute a correct username and password for `user' and `password' in
the call to auth_pam_new_user_pass, and run the code. If it leaks
memory, you have a problem with PAM on your system.

Of course, if you're not using PAM, this may not be the problem. More
details, please.

-- 
``Think about the average person. Now, half the people
  out there, by definition, are even dumber than that.''
  (L. Ron Hubbard, attrib.)

------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
	name="server1stats.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="server1stats.txt"

MAIL2 TPOP3d TROUBLE INFORMATION=0A=
=0A=
All decimal figures are percentages=0A=
=0A=
SCPU =3D Total cpu usage on the system=0A=
SMEM =3D Total memory usage on the system=0A=
Memfree =3D Total real memory free on the system (out of 512mb)=0A=
SwpFree =3D Total swap space free on the system (out of 1024mb)=0A=
TpCPU =3D Total CPU used by all Tpop processes combined=0A=
TpMem =3D Total memory used by all Tpop processed combined =0A=
TMAXCPU =3D CPU used by the parent Tpop process=0A=
TMAXMEM =3D Memory used by the parent Tpop process=0A=
#Tpop =3D Total number of Tpop processes running=0A=
=0A=
Time            SCPU    SMEM    MemFree SwpFree TpCPU   TpMEM   TMAXCPU =
TMAXMEM	#Tpop=0A=
01/08-20:00:01	14.2	38.3	192112	656884	9.5	6.9	8.5	4.5	7=0A=
01/08-20:00:10	13.7	38.6	191808	636264	9.8	7.0	8.2	4.5	7=0A=
01/08-20:10:11	13.6	36.4	184468	636132	10.4	6.9	8.7	4.7	8=0A=
01/08-20:15:39	10.9	37.2	184300	665652	6.9	6.6	6.4	4.9	6=0A=
01/08-20:20:10	12.6	40.2	181348	568980	8.7	7.3	7.5	4.9	9=0A=
01/08-20:30:10	10.5	35.7	187228	602696	8.1	6.9	7.3	5.1	7=0A=
01/08-20:40:11	10.6	38.0	179352	681012	7.7	7.3	7.0	5.2	7=0A=
01/08-20:50:10	9.6	37.7	184200	645388	5.9	7.1	5.1	5.4	6=0A=
01/08-21:00:11	12.8	38.8	168128	558260	8.6	8.4	7.5	5.6	9=0A=
01/08-21:10:10	11.3	38.7	166652	628328	7.3	8.2	6.6	5.9	5=0A=
01/08-21:20:10	10.1	34.9	164980	644124	7.2	7.5	6.6	5.9	6=0A=
01/08-21:30:11	9.5	44.9	158872	594612	6.2	7.9	5.1	6.1	5=0A=
01/08-21:40:10	9.6	40.5	170060	585128	6.2	8.4	4.9	6.3	7=0A=
01/08-21:50:10	11.8	35.3	170828	523624	8.7	9.3	7.2	6.4	9=0A=
01/08-22:00:11	8.0	37.4	162636	512904	6.0	8.9	4.9	6.6	8=0A=
01/08-22:10:10	8.3	37.8	169872	533240	5.7	9.7	4.6	6.7	7=0A=
01/08-22:20:10	7.9	37.2	172984	651732	5.5	7.8	4.7	6.8	4=0A=
01/08-22:30:11	8.4	37.5	169696	686824	4.9	7.5	4.5	6.9	3=0A=
01/08-22:40:10	7.0	36.9	168120	681344	4.9	7.9	4.0	7.0	4=0A=
01/08-22:50:10	9.7	43.6	169604	684668	6.0	7.8	5.9	7.2	3=0A=
01/08-23:00:11	8.1	40.5	181988	697876	5.0	8.2	4.6	7.3	4=0A=
01/08-23:10:10	4.9	35.5	178612	681648	2.8	8.2	2.6	7.4	3=0A=
01/08-23:20:10	10.6	41.0	174448	711908	5.8	7.8	5.7	7.5	2=0A=
01/08-23:30:11	6.7	39.1	174632	746076	4.2	7.9	4.2	7.6	2=0A=
01/08-23:40:10	5.1	35.8	180316	645140	3.4	8.6	2.8	7.7	4=0A=
01/08-23:50:10	5.8	35.1	174336	681360	3.6	8.3	3.3	7.7	3=0A=
01/09-00:00:11	6.2	35.2	168900	708120	3.8	8.8	3.5	7.8	3=0A=
01/09-00:10:11	5.5	36.9	187528	601696	3.8	10.0	3.2	7.9	5=0A=
01/09-00:20:10	5.8	37.0	167948	590312	4.1	10.5	3.3	8.0	5=0A=
01/09-00:30:10	7.0	34.5	163948	727204	4.7	8.4	4.7	8.0	2=0A=
01/09-00:40:11	7.4	46.6	158784	723600	3.9	8.1	3.9	8.1	1=0A=
01/09-00:50:10	6.2	36.3	182616	718752	3.8	8.6	3.3	8.2	2=0A=
01/09-01:00:10	5.9	36.5	181744	720708	3.8	8.7	3.3	8.3	2=0A=
01/09-01:10:11	5.5	35.0	177300	687732	3.7	9.0	2.9	8.3	3=0A=
01/09-01:20:10	6.8	39.7	174312	631028	3.9	9.7	2.5	8.4	5=0A=
01/09-01:30:10	5.1	35.3	190372	685136	2.6	9.2	2.2	8.5	3=0A=
01/09-01:40:10	4.9	37.5	188992	647060	2.5	9.2	2.2	8.5	3=0A=
01/09-01:50:10	5.9	37.0	188772	672844	3.9	9.6	2.9	8.6	4=0A=
01/09-02:00:10	5.5	36.9	189584	738576	3.2	8.7	3.2	8.7	1=0A=
01/09-02:10:10	6.8	41.2	188248	748612	3.3	8.8	3.3	8.8	1=0A=
01/09-02:20:11	5.1	36.5	194752	742684	3.3	9.2	3.3	8.9	2=0A=
01/09-02:30:10	5.0	40.3	194016	686640	3.2	9.5	3.0	8.9	3=0A=
01/09-02:40:10	5.1	38.5	198156	709084	2.9	9.6	2.8	9.0	3=0A=
01/09-02:50:11	5.9	35.8	199852	690388	3.1	10.0	2.9	9.1	4=0A=
01/09-03:00:10	5.3	39.9	196524	653896	3.6	9.7	3.4	9.1	3=0A=
01/09-03:10:10	6.8	37.5	197820	650904	4.6	9.5	4.1	9.2	2=0A=
01/09-03:20:11	5.9	40.4	198528	738272	3.1	9.3	3.1	9.3	1=0A=
01/09-03:30:10	4.9	35.8	200912	743276	2.9	9.3	2.9	9.3	1=0A=
01/09-03:40:10	5.9	40.6	201296	728648	3.5	9.4	3.5	9.4	1=0A=
01/09-03:50:11	4.6	40.4	199164	691540	2.0	9.8	1.9	9.5	2=0A=
01/09-04:00:10	5.7	38.3	200844	752520	2.6	9.5	2.6	9.5	1=0A=
01/09-04:10:10	5.6	37.0	203016	677836	3.0	10.4	2.7	9.6	4=0A=
01/09-04:20:10	5.4	31.3	212072	701600	3.5	10.3	3.4	9.7	3=0A=
01/09-04:30:11	4.0	29.9	219744	744828	2.2	10.2	2.1	9.7	3=0A=
01/09-04:40:10	4.9	31.6	220208	656048	3.5	11.0	3.3	9.8	5=0A=
01/09-04:50:10	4.9	35.6	218636	745980	3.1	10.2	2.9	9.9	2=0A=
01/09-05:00:11	3.7	35.1	219452	758312	2.4	10.9	2.2	10.1	3=0A=
01/09-05:10:10	5.6	34.1	225532	750068	3.3	10.1	3.3	10.1	1=0A=
01/09-05:20:10	5.2	33.6	227940	709116	3.3	10.2	3.3	10.2	1=0A=
01/09-05:30:10	4.2	37.3	231456	646508	2.9	12.3	2.5	10.3	6=0A=
01/09-05:40:11	6.4	36.0	232484	748460	3.6	10.3	3.6	10.3	1=0A=
01/09-05:50:10	6.9	36.6	228532	727636	3.8	10.4	3.8	10.4	1=0A=
01/09-06:00:10	5.4	39.3	226628	640312	3.3	11.4	3.2	10.5	3=0A=
01/09-06:10:11	8.3	39.2	225308	602504	5.0	12.2	4.6	10.6	5=0A=
01/09-06:20:10	7.2	36.7	230868	650108	5.2	11.5	5.0	10.7	3=0A=
01/09-06:30:10	6.2	38.7	224344	612556	4.4	11.2	3.9	10.8	2=0A=
01/09-06:40:11	6.1	38.7	218124	594140	4.2	12.2	3.4	10.9	4=0A=
01/09-06:50:10	6.9	42.6	213300	650732	3.9	11.4	3.9	11.0	2=0A=
01/09-07:00:10	7.6	43.5	213532	512452	4.4	12.8	3.4	11.1	5=0A=
01/09-07:10:11	9.9	43.6	211040	505692	6.1	12.9	4.8	11.3	5=0A=
01/09-07:20:10	8.7	43.2	206624	450464	6.2	14.3	5.3	11.3	7=0A=
01/09-07:30:10	8.3	41.8	184928	413116	6.7	15.6	4.4	11.5	7=0A=
01/09-07:40:11	12.7	48.4	166836	477312	9.6	17.2	6.5	11.6	8=0A=
01/09-07:50:10	9.0	41.4	160088	458252	6.8	13.9	5.6	11.7	6=0A=
01/09-08:00:11	12.7	45.2	154256	343844	9.2	14.0	8.5	11.9	6=0A=
01/09-08:10:10	12.2	50.2	149908	435616	7.5	13.8	6.8	12.0	5=0A=
=0A=
At 8:11, Tpop reported "fork_child: fork: Not enough space" but =
continued to run=0A=
and answer pop3 requests:=0A=
=0A=
01/09-08:20:10	8.5	39.7	153060	511348	6.2	13.8	5.8	12.1	5=0A=
=0A=
At 8:26, Tpop reported the error again, but then permanently stopped =
responding=0A=
to requests.=0A=
=0A=
It was 16 hours since the last failure:=0A=
=0A=
Maildir Scans: 45,170 Total bytes received: 2,439,782 Total bytes sent: =
533,903,104=0A=
=0A=
Before that, it had been 7 hours since the last failure. During that 7 =
hours, it=0A=
was our first time with tpop3d in production, so many people had to =
download=0A=
previously saved email:=0A=
=0A=
Maildir Scans: 47,115 Total bytes received: 2,673,760 Total bytes sent: =
739,178,518=0A=

------=_NextPart_000_0023_01C2B7DF.94830640
Content-Type: text/plain;
	name="server2stats.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="server2stats.txt"

MAIL3 TPOP3d TROUBLE INFORMATION=0A=
=0A=
All decimal figures are percentages=0A=
=0A=
SCPU =3D Total cpu usage on the system=0A=
SMEM =3D Total memory usage on the system=0A=
Memfree =3D Total real memory free on the system (out of 512mb)=0A=
SwpFree =3D Total swap space free on the system (out of 1024mb)=0A=
TpCPU =3D Total CPU used by all Tpop processes combined=0A=
TpMem =3D Total memory used by all Tpop processed combined=0A=
TMAXCPU =3D CPU used by the parent Tpop process=0A=
TMAXMEM =3D Memory used by the parent Tpop process=0A=
#Tpop =3D Total number of Tpop processes running=0A=
=0A=
Time            SCPU    SMEM    MemFree SwpFree TpCPU   TpMEM   TMAXCPU =
TMAXMEM #Tpop=0A=
01/08-20:02:57	9.9	41.3	107560	618048	7.2	7.2	4.7	3.1	12=0A=
01/08-20:10:10	8.2	45.3	105592	584604	6.1	7.8	3.5	3.2	14=0A=
01/08-20:15:45	10.0	38.1	116296	686852	6.5	6.3	5.8	3.3	8=0A=
01/08-20:20:11	10.5	38.2	117040	684580	6.6	6.7	5.7	3.4	8=0A=
01/08-20:30:10	10.3	39.5	106140	666300	7.7	7.5	5.8	3.5	10=0A=
01/08-20:40:11	9.9	39.9	103696	685612	6.7	6.6	5.6	3.6	7=0A=
01/08-20:50:10	13.2	38.7	102712	712640	7.3	4.9	5.8	3.7	5=0A=
01/08-21:00:10	9.0	38.1	107808	722328	5.7	4.5	5.2	3.9	3=0A=
01/08-21:10:11	9.4	37.1	108512	697420	6.3	5.5	6.0	4.0	6=0A=
01/08-21:20:10	8.4	37.4	111840	672748	5.6	6.3	4.0	4.1	8=0A=
01/08-21:30:11	9.9	39.1	104168	640404	7.7	6.1	6.6	4.2	7=0A=
01/08-21:40:10	9.1	35.1	108232	645132	6.6	6.6	5.2	4.4	8=0A=
01/08-21:50:10	9.6	44.3	93176	663928	5.4	6.0	4.8	4.5	5=0A=
01/08-22:00:11	8.8	32.2	94616	767060	6.1	4.9	6.0	4.6	2=0A=
01/08-22:10:10	8.9	31.6	96304	682276	5.6	7.2	4.3	4.7	9=0A=
01/08-22:20:10	6.5	29.7	117124	747176	4.2	5.4	3.9	4.8	3=0A=
01/08-22:30:11	7.6	34.5	120100	722296	5.2	5.8	5.0	4.9	4=0A=
01/08-22:40:10	7.8	36.4	129544	729708	4.5	5.6	4.4	5.0	3=0A=
01/08-22:50:10	7.8	39.1	120884	674736	4.3	6.4	3.9	5.1	5=0A=
01/08-23:00:11	7.5	34.7	131072	650432	5.2	6.8	3.9	5.2	6=0A=
01/08-23:10:10	6.8	37.8	127780	679172	4.7	6.7	4.4	5.3	5=0A=
01/08-23:20:10	7.1	43.5	118536	653344	3.9	6.9	2.9	5.4	6=0A=
01/08-23:30:11	5.9	34.2	121192	709192	3.8	6.7	3.3	5.5	5=0A=
01/08-23:40:10	9.0	36.5	120828	717872	5.9	6.7	4.1	5.6	3=0A=
01/08-23:50:11	9.0	38.9	112228	721204	6.2	8.0	4.5	5.7	3=0A=
01/09-00:00:10	6.1	33.9	93860	741816	3.9	6.3	3.7	5.7	3=0A=
01/09-00:10:10	5.8	34.1	126288	709164	3.7	7.0	3.1	5.8	5=0A=
01/09-00:20:11	5.7	36.2	120872	700712	3.2	6.5	2.9	5.9	3=0A=
01/09-00:30:10	3.6	34.1	115228	721184	2.3	6.6	2.1	6.0	3=0A=
01/09-00:40:10	6.0	37.5	119508	735552	3.2	6.6	2.7	6.0	3=0A=
01/09-00:50:11	4.7	35.0	121824	770260	2.4	6.4	2.3	6.1	2=0A=
01/09-01:00:10	3.7	35.4	123688	732060	1.6	6.4	1.6	6.1	2=0A=
01/09-01:10:10	3.8	33.3	131320	768088	2.3	6.2	2.3	6.2	1=0A=
01/09-01:20:10	5.9	36.6	148168	763112	3.5	6.3	3.5	6.3	1=0A=
01/09-01:30:11	5.5	34.5	147296	773176	3.8	6.3	3.8	6.3	1=0A=
01/09-01:40:10	5.8	34.6	144492	762932	3.7	6.4	3.7	6.4	1=0A=
01/09-01:50:10	5.4	33.1	144412	729136	3.5	7.1	3.3	6.5	3=0A=
01/09-02:00:11	6.3	37.9	140356	725428	4.3	6.9	3.3	6.5	2=0A=
01/09-02:10:10	5.4	34.2	143268	739460	3.3	7.2	3.1	6.6	3=0A=
01/09-02:20:10	4.2	36.3	143396	725784	1.9	7.0	1.9	6.7	2=0A=
01/09-02:30:11	7.0	37.3	143996	755844	3.4	6.7	3.4	6.7	1=0A=
01/09-02:40:10	7.0	36.4	145052	757796	4.0	7.1	3.8	6.8	2=0A=
01/09-02:50:10	4.7	37.5	143956	727892	3.0	7.1	2.7	6.8	2=0A=
01/09-03:00:11	6.1	36.3	147284	755160	2.7	6.9	2.7	6.9	1=0A=
01/09-03:10:10	4.9	34.9	146200	722660	3.1	7.3	3.1	7.0	2=0A=
01/09-03:20:10	5.9	40.3	144324	747300	2.2	7.0	2.2	7.0	1=0A=
01/09-03:30:11	4.2	36.3	143812	701296	2.1	7.7	2.1	7.1	3=0A=
01/09-03:40:10	5.2	34.0	145096	700756	3.2	8.0	2.3	7.1	4=0A=
01/09-03:50:10	4.7	36.0	143704	764952	2.5	7.4	2.5	7.2	2=0A=
01/09-04:00:11	3.2	37.4	141816	751792	1.4	7.2	1.4	7.2	1=0A=
01/09-04:10:10	4.6	37.2	142076	726680	2.9	7.9	2.5	7.3	3=0A=
01/09-04:20:10	5.7	38.4	141492	720620	2.9	7.6	2.8	7.3	2=0A=
01/09-04:30:10	4.6	39.0	144672	751040	2.4	7.4	2.4	7.4	1=0A=
01/09-04:40:11	5.4	30.9	152672	687820	2.9	8.0	1.8	7.4	3=0A=
01/09-04:50:10	3.5	35.0	157552	727144	1.7	7.8	1.7	7.5	2=0A=
01/09-05:00:10	3.9	36.7	160448	746784	2.3	7.5	2.3	7.5	1=0A=
01/09-05:10:11	4.7	36.2	161660	708020	3.0	8.5	2.7	7.6	4=0A=
01/09-05:20:10	5.6	34.8	162980	746308	3.0	7.7	3.0	7.7	1=0A=
01/09-05:30:10	4.8	35.5	160856	692216	3.2	8.3	2.9	7.7	3=0A=
01/09-05:40:11	5.6	39.1	162072	712592	3.0	8.2	3.0	7.8	2=0A=
01/09-05:50:10	5.2	36.0	161684	617940	2.6	9.2	1.3	7.8	5=0A=
01/09-06:00:10	5.1	38.2	160660	644996	2.8	8.9	2.1	7.9	4=0A=
01/09-06:10:10	6.6	37.0	161160	676452	4.6	8.9	4.2	8.0	4=0A=
01/09-06:20:11	4.4	35.6	189556	756224	2.2	8.0	2.2	8.0	1=0A=
01/09-06:30:10	5.7	39.7	187404	721988	2.5	8.4	2.4	8.1	2=0A=
01/09-06:40:10	6.5	40.5	186456	602884	4.5	9.1	3.9	8.2	4=0A=
01/09-06:50:11	6.7	41.4	190240	637064	3.5	8.9	3.4	8.3	3=0A=
01/09-07:00:10	7.8	38.8	190248	487100	4.9	10.6	3.6	8.4	7=0A=
01/09-07:10:10	7.5	43.5	179244	535592	5.1	11.6	3.5	8.4	7=0A=
01/09-07:20:11	8.9	42.8	172116	577400	5.4	9.5	4.7	8.6	4=0A=
01/09-07:30:10	8.3	38.8	164936	660680	6.0	9.6	5.5	8.6	4=0A=
01/09-07:40:10	8.8	38.0	160412	569252	6.1	10.5	5.0	8.7	6=0A=
01/09-07:50:11	9.1	39.4	153904	467380	6.7	11.2	5.4	8.9	8=0A=
01/09-08:00:10	11.6	45.6	141816	465108	8.7	12.3	7.3	9.0	9=0A=
01/09-08:10:11	12.8	40.6	135628	616040	8.4	10.0	8.1	9.1	4=0A=
01/09-08:20:10	13.6	44.6	147776	610744	9.2	9.6	9.0	9.3	2=0A=
=0A=
8:28 Tpop3d stopped responding:=0A=
=0A=
Jan  9 08:28:10 mail2 tpop3d[29107]: [ID 702911 mail.error] fork_child: =
fork: Not enough space=0A=
=0A=
Tpop3d statistics over 16 hours, since the last crash:=0A=
=0A=
Maildir Scans: 47,765 Total bytes received: 2,610,901 Total bytes =
written: 586,167,434=0A=
=0A=
Before that, it had been 7 hours since the last failure. During that 7 =
hours, it=0A=
was our first time with tpop3d in production, so many people had to =
download=0A=
previously saved email:=0A=
=0A=
Sorry, log file was overwritten.=0A=

------=_NextPart_000_0023_01C2B7DF.94830640--