[tpop3d-discuss] Memory leak?

Marc Lewis marc at blarg.net
Fri, 10 May 2002 12:06:45 -0700


On Fri, May 10, 2002 at 07:19:00PM +0100, Chris Lightfoot wrote:
> On Fri, May 10, 2002 at 11:12:28AM -0700, Marc Lewis wrote:
> > On Thu, May 09, 2002 at 11:36:00PM +0100, Chris Lightfoot wrote:
>     [...]
> > > I still don't understand this. The example code in the
> > > openldap distribution doesn't seem to handle this
> > > specially.
> > > 
> > > > tpop3d: getentry.c:46: ldap_next_entry: Assertion `entry != ((void *)0)'
> > > > failed.quit: signal 6 post_fork = 0
> > > > Aborted
> > > 
> > > Well, the patch was a bit bogus. Try the following instead
> > > (it's against the original source code):
> > 
> > Tried applying both patches, and its better, but it still fails after a
> > bit.  It always seems to fail in the call to ldap_search_s.  The re-bind
> > patch did seem to slow it down a bit, which is fine by me as long as valid
> > passwords never get rejected.
> > 
> > This patch, against your patched auth_ldap.c, fixed it for me.  Its
> > probably the totally wrong way to do it, but I beat the snot out of it with
> > multiple expect scripts doing nothing but logging in and logging out for 30
> > minutes, generating between 100 and 150 connections per minute, and had not
> > a single failure to authenticate.  There were quite a few ldap timeouts,
> > but it retries and succeeds on the second try.  Not sure why, but it looks
> > like it needs this on our system.
> 
> How bizarre. I see that you've put both the call to
> ldap_search_s and to ldap_simple_bind_s in loops -- is the
> latter necessary? Perhaps I should alter the code so that
> all calls to the ldap library are attempted multiple
> times?

The failures were varying in location.  After your patch, all of the errors
ended up in the ldap_simple_bind_s when binding as the user, which really
bothered me because logically they should not have been there.

> Did you try your patch without reconnecting to the LDAP
> server on each authentication?

I believe so...No wait, I must have -- I split them out into two different
files.

Before the reconnect patch, it would fail in the search and the bind calls.
After your patch, they all ended up in the simple_bind_s.

I know that our servers aren't the busiest in the world, but they do get
hit between 30 and 50 times per minute.  They are fairly beefy machines,
too (dual 1Ghz PIII's w/1G RAM and 160MB SCSI drives), but between the
Postfix, procmail and ldap searches, there can be some minor pauses, and
I'm guessing that is where the timeouts are occurring.  We do have on
average 200-250 processes always running on the machines and a consistant
load of 2 to 3. Not excessive, but not certainly not idle.  Most of the
load is in the slapd process.  It is configured for 64 threads, a cachesize
of 10000 and a dbcachesize of 1000000.  I may try increasing the cache size
to 2MB or even 5MB, but reducing the number of threads back down to 48 or
64.  The purpose of mentioning all of this is that I don't think that it is
necessarily tpop3d that is having the problem, but the way that OpenLDAP
handles connections on a busy server.  I haven't looked at the source for
the other modules from padl.com to see if they also have loops in their
bind or how they handle retry issues, I just know they haven't ever failed
like the direct ldap calls have been.

I'll be making this patched version live on our system later today, and
will let you know if I run into issues.

Thanks again.

 - Marc

-- 
Marc Lewis
Network Administrator
Blarg! Online Services, Inc.
http://www.blarg.net/~marc