hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20061115] Post mortem debugging, or: what happened before it crashed? (Updated)
So your machine paniced, and as you were running X you have no clue what went on? Here's a nice way to find out, assuming you have a kernel crash dump. To ensure the latter, set kern.dump_on_panic=1 in /etc/sysctl.conf. Now, what to do with those crashdumps?
% ls -l /var/crash/
total 3183838
-rw-r--r--  1 root  wheel          3 Nov  2 02:09 bounds
-rw-r--r--  1 root  wheel          5 Jun 30  2004 minfree
...
-rw-------  1 root  wheel  181265401 Nov  2 02:11 netbsd.26.core.gz
-rw-------  1 root  wheel    2162696 Nov  2 02:11 netbsd.26.gz 
In /var/crash, "bounds" contains an increasing counter for the crashdump number (it would be "27" in the above example), and "minfree" contains the minimum amount of free space in kilobytes that should keep free - both files are read by savecore(8) when /etc/rc.conf has "savecore=yes", which is the default.

The actual crashdump consists of two gzipped files - the actual memory dump "netbsd.XX.core.gz" and a copy of the running kernel "netbsd.xx.gz". After uncompressing the files can be used for looking at the system at the point of it's panic:

# gunzip netbsd.26*.gz
#
Note that the crashdump may contain sensitive data and is such only readable by root!

The crashdump can be read by programs that use libkvm to read through the crashdump's kernel memory, e.g. gdb(1), dmesg(8), ps(1), fstat(8), ipcs(1), netstat(8), nfsstat(8), pmap(1), w(1), pstat(8), vmstat(8) etc., using the -M and -N switches.

Some examples:

  • To show the system's message buffer at the time of the crash:
    % dmesg -M netbsd.26.core -N netbsd.26
    ...
    unmounting /home (/dev/wd1e)...
    unmounting /tmp (mfs:371)...warning: mfs read during shutdown
    dev = 0xff00, block = 10496, fs = /tmp
    panic: blkfree: freeing free block
    Begin traceback...
    uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
    fatal page fault in supervisor mode
    trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
    panic: trap
    Faulted in mid-traceback; aborting...
    dumping to dev 0,1 offset 2024327
    dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
    495 494 493 ...
    Apparently the system tried to free a block that was already fred here when umounting /tmp.

  • Display virtual memory parameters:
    % vmstat -M netbsd.26.core -N netbsd.26 -s
         4096 bytes per page
            8 page colors
       127888 pages managed
              ...  

  • Attach the GNU debugger gdb(1) to the system crash dumpQ, to poke around deeply:
    % gdb netbsd.26
    ...
    (gdb) target kcore netbsd.26.core
    panic: blkfree: freeing free block
    #0  0x0ac04000 in ?? ()
    (gdb) bt
    #0  0x0ac04000 in ?? ()
    #1  0xc03084b5 in cpu_reboot ()
    #2  0xc02a57aa in panic ()
    #3  0xc0313127 in trap ()
    #4  0xc0102dfd in calltrap ()
    #5  0xc0182544 in db_get_value ()
    #6  0xc03058f1 in db_stack_trace_print ()
    #7  0xc02a577c in panic ()
    #8  0xc0205db7 in ffs_blkfree ()
    #9  0xc020b8d5 in ffs_indirtrunc ()
    ...  
  • Unfortunately there are a number of programs that I didn't get to work with my crashdump, but that may be due to its point after/during system shutdown, e.g. ps(1) didn't work.
Still that should give some start for poking around...

Update: Apparently 'target kcore' was renamed to 'target kvm' in gdb6, see this posting.

[Tags: , , ]


Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 36037303
Copyright (c) Hubert Feyrer