[20061115]
|
Post mortem debugging, or: what happened before it crashed? (Updated)
So your machine paniced, and as you were running X you have no
clue what went on? Here's a nice way to find out, assuming you have
a kernel crash dump. To ensure the latter, set
kern.dump_on_panic=1 in /etc/sysctl.conf.
Now, what to do with those crashdumps?
% ls -l /var/crash/
total 3183838
-rw-r--r-- 1 root wheel 3 Nov 2 02:09 bounds
-rw-r--r-- 1 root wheel 5 Jun 30 2004 minfree
...
-rw------- 1 root wheel 181265401 Nov 2 02:11 netbsd.26.core.gz
-rw------- 1 root wheel 2162696 Nov 2 02:11 netbsd.26.gz
In /var/crash, "bounds" contains an increasing counter for the
crashdump number (it would be "27" in the above example),
and "minfree" contains the minimum amount of free space
in kilobytes that should keep free - both files are read by
savecore(8) when /etc/rc.conf has "savecore=yes", which is
the default.
The actual crashdump consists of two gzipped files - the
actual memory dump "netbsd.XX.core.gz" and a copy of the
running kernel "netbsd.xx.gz". After uncompressing the
files can be used for looking at the system at the point
of it's panic:
# gunzip netbsd.26*.gz
#
Note that the crashdump may contain sensitive data and is such
only readable by root!
The crashdump can be read by programs that use libkvm to
read through the crashdump's kernel memory, e.g.
gdb(1), dmesg(8), ps(1), fstat(8), ipcs(1), netstat(8), nfsstat(8),
pmap(1), w(1), pstat(8), vmstat(8) etc.,
using the -M and -N switches.
Some examples:
- To show the system's message buffer at the time
of the crash:
% dmesg -M netbsd.26.core -N netbsd.26
...
unmounting /home (/dev/wd1e)...
unmounting /tmp (mfs:371)...warning: mfs read during shutdown
dev = 0xff00, block = 10496, fs = /tmp
panic: blkfree: freeing free block
Begin traceback...
uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
panic: trap
Faulted in mid-traceback; aborting...
dumping to dev 0,1 offset 2024327
dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
495 494 493 ...
Apparently the system tried to free a block that was
already fred here when umounting /tmp.
- Display virtual memory parameters:
% vmstat -M netbsd.26.core -N netbsd.26 -s
4096 bytes per page
8 page colors
127888 pages managed
...
- Attach the GNU debugger gdb(1) to the system crash dumpQ,
to poke around deeply:
% gdb netbsd.26
...
(gdb) target kcore netbsd.26.core
panic: blkfree: freeing free block
#0 0x0ac04000 in ?? ()
(gdb) bt
#0 0x0ac04000 in ?? ()
#1 0xc03084b5 in cpu_reboot ()
#2 0xc02a57aa in panic ()
#3 0xc0313127 in trap ()
#4 0xc0102dfd in calltrap ()
#5 0xc0182544 in db_get_value ()
#6 0xc03058f1 in db_stack_trace_print ()
#7 0xc02a577c in panic ()
#8 0xc0205db7 in ffs_blkfree ()
#9 0xc020b8d5 in ffs_indirtrunc ()
...
- Unfortunately there are a number of programs that I
didn't get to work with my crashdump, but that may be
due to its point after/during system shutdown, e.g.
ps(1) didn't work.
Still that should give some start for poking around...
Update: Apparently 'target kcore' was renamed
to 'target kvm' in gdb6, see
this posting.
[Tags: debugging, dmesg, gdb]
|