Sep 28, 2010

Debugging memory leaks with DTrace and mdb

Some time ago I wrote about debugging memory problems with libumem.

Now I've a different approach using dtrace and mdb to find the source code position.

First a let the program execute and finish running under dtrace.
The dtrace-Script is from http://blogs.sun.com/sanjeevb/.


# cat ./memleak.d

#!/usr/sbin/dtrace -s

pid$1:libc.so.1:malloc:entry
{
        self->trace = 1;
        self->size = arg0;
}
pid$1:libc.so.1:malloc:return
/self->trace == 1/
{
        printf("Ptr=0x%p Size=%d", arg1, self->size);
        ustack();
        self->trace = 0;
        self->size = 0;
}

pid$1:libc.so.1:realloc:entry
{
        self->trace = 1;
        self->size = arg1;
        self->oldptr = arg0;
}

pid$1:libc.so.1:realloc:return
/self->trace == 1/
{
        printf("Ptr=0x%p Oldptr=0x%p Size=%d", arg1, self->oldptr, self->size);
        ustack();
        self->trace = 0;
        self->size = 0;
}

pid$1:libc.so.1:calloc:entry
/self->trace == 1/
{
        self->trace = 1;
        self->size = arg1;
}

pid$1:libc.so.1:calloc:return
/self->trace == 1/
{
        printf("Ptr=0x%p Size=%d", arg1, self->size);
        ustack();
        self->trace = 0;
        self->size = 0;
}

pid$1:libc.so.1:free:entry
{
        printf("Ptr=0x%p ", arg0);
}

The result output from the dtrace-Script is postprocessed with a
Perl-Script checking if alloced memory is freed again:


#!/usr/bin/perl

# findleaks.pl

use Data::Dumper;

my %hash = ();

while (<>) {
        if ((/malloc:return Ptr=([^ ]*) Size=(.*)/) || 
            (/calloc:return Ptr=([^ ]*) Size=(.*)/)) {
                $hash{$1} = { size => $2 };
  while (<>) {
   last if /^$/;
   $hash{$1}->{stack} .= $_;
  }
        }
        elsif (/free:entry Ptr=([^ ]*)/) {
                if (exists $hash{$1} and $hash{$1}) {
                        $hash{$1} = '';
                }
        }
 elsif (/realloc:return Ptr=([^ ]*) Oldptr=([^ ]*) Size=(.*)/) {
  if ($1 eq $2) {
   if (exists $hash{$1} and $hash{$1}) {
    $hash{$1} = { size => $3 };
    $hash{$1}->{stack} = '';
    while (<>) {
                          last if /^$/;
                           $hash{$1}->{stack} .= $_;
                  }
   } 
  } else {
    $hash{$1} = '';
    $hash{$2}= { size => $3 };
    $hash{$2}->{stack} = '';
    while (<>) {
                          last if /^$/;
                           $hash{$2}->{stack} .= $_;
                  }
  }
 }

}

foreach my $key (keys %hash) {
 next if not $hash{$key}->{size};
 print "Ptr=$key Size=", $hash{$key}->{size}, "\n";
 print $hash{$key}->{stack}, "\n---------\n";
}

Now we get a list of memory allocations which are not freed after closing
the application.

Such an entry might look like
---------
Ptr=0x81c64f8 Size=20
              libumem.so.1`malloc+0x3e
              libstdc++.so.6`_Znaj+0x1b
              libxerces-c.so.28`_ZN11xercesc_2_818IconvLCPTranscoder9transcodeEPKc+0xc4
              libxerces-c.so.28`_ZN11xercesc_2_89XMLString9transcodeEPKc+0x22
              libvsEAI.so404`readFunction+0x39
              libvsEAI.so404`bcfEAIRepositoryInit+0x5a9
              libvsopCsiBdOOPool.so404`ctrl_call_bcf+0x106
              libvsopCsiBdOOPool.so404`ctrl_call_list+0x1bc
              libvsopCsiBdOOPool.so404`ctrl_gosub_list+0xb2
              libvsopCsiBdOOPool.so404`ctrl_call_bcf+0x7dd
              libvsopCsiBdOOPool.so404`ctrl_call_list+0x1bc
              libvsopCsiBdOOPool.so404`ctrl_gosub_list+0xb2
              libvsopCsiBdOOPool.so404`ctrl_call_bcf+0x7dd
              libvsopCsiBdOOPool.so404`ctrl_call_bcf+0x795
              libvsopCsiBdOOPool.so404`ctrl_call_bcf+0x795
              libvsopCsiBdOOPool.so404`ctrl_call_list+0x1bc
              libvsopCsiBdOOPool.so404`prog_control+0x9a
              progvs_prog`vsop_main+0x5a2
              libvszwflx2Bd.so404`main+0x196
              progvs_prog`_start+0x80

To get the position in the readFunction  method, make a core of the application by using
gcore [-F]

and start in mdb with mdb
mdb <core file name >
> libvsEAI.so404`readFunction::dis
libvsEAI.so404`readFunction:    pushl  %ebp
libvsEAI.so404`readFunction+1:  movl   %esp,%ebp
libvsEAI.so404`readFunction+3:  pushl  %edi
libvsEAI.so404`readFunction+4:  pushl  %esi
libvsEAI.so404`readFunction+5:  pushl  %ebx
libvsEAI.so404`readFunction+6:  call   +0x5    
libvsEAI.so404`readFunction+0xb:popl   %ebx
libvsEAI.so404`readFunction+0xc:addl   $0x1527d,%ebx
libvsEAI.so404`readFunction+0x12:       subl   $0xd8,%esp
libvsEAI.so404`readFunction+0x18:       leal   0xfffef969(%ebx),%eax
libvsEAI.so404`readFunction+0x1e:       pushl  %eax
libvsEAI.so404`readFunction+0x1f:       movl   0x8(%ebp),%edi
libvsEAI.so404`readFunction+0x22:       call   -0x620e 
libvsEAI.so404`readFunction+0x27:       movl   %eax,-0x88(%ebp)
libvsEAI.so404`readFunction+0x2d:       leal   0xfffef7a1(%ebx),%eax
libvsEAI.so404`readFunction+0x33:       pushl  %eax
libvsEAI.so404`readFunction+0x34:       call   -0x6220 
libvsEAI.so404`readFunction+0x39:       movl   %eax,-0x8c(%ebp)
libvsEAI.so404`readFunction+0x3f:       leal   0xfffef972(%ebx),%eax
libvsEAI.so404`readFunction+0x45:       pushl  %eax
libvsEAI.so404`readFunction+0x46:       call   -0x6232 
libvsEAI.so404`readFunction+0x4b:       movl   %eax,-0x90(%ebp)
libvsEAI.so404`readFunction+0x51:       leal   0xfffef978(%ebx),%eax
libvsEAI.so404`readFunction+0x57:       pushl  %eax
libvsEAI.so404`readFunction+0x58:       movl   %eax,-0xb0(%ebp)
libvsEAI.so404`readFunction+0x5e:       call   -0x624a 
libvsEAI.so404`readFunction+0x63:       leal   0xfffef97b(%ebx),%edx
Because of the offset from the stack trace  libvsEAI.so404`readFunction+0x39 we see in the disassembly, that the second call to libxerces-c.so.28`_ZN11xercesc_2_89XMLString9transcodeEPKc+0x22 is the problem.


The method called is
>echo "_ZN11xercesc_2_89XMLString9transcodeEPKc" | c++filt
xercesc_2_8::XMLString::transcode(char const*)