Tru64 UNIX Kernel Debugging Part Number: AA-RH99A-TE July 1999 Product Version: Tru64 UNIX Version 5.0 or higher This manual explains how to use tools to debug a kernel and analyze a crash dump of the Tru64 UNIX (formerly DIGITAL UNIX) operating system. Also, this manual explains how to write extensions to the kernel debugging tools.
© 1999 Compaq Computer Corporation COMPAQ and the Compaq logo Registered in U.S. Patent and Trademark Office. Alpha and Tru64 are trademarks of Compaq Information Technologies Group, L.P in the United States and other countries. Microsoft and Windows are trademarks of Microsoft Corporation in the United States and other countries. UNIX and The Open Group are trademarks of The Open Group.All other product names mentioned herein may be trademarks of their respective companies. Confidential computer software.
Contents About This Manual 1 Introduction to Kernel Debugging 1.1 1.2 1.3 1.4 2 Linking a Kernel Image for Debugging . . . .. . .. . .. . .. . . .. . .. . .. . .. . Debugging Kernel Programs . . .. . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . Debugging the Running Kernel . . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . Analyzing a Crash Dump File .. . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . 1–1 1–3 1–3 1–5 Kernel Debugging Utilities 2.1 2.1.1 2.1.2 2.1.
2.2.3.9 Disassembling Instructions .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . 2.2.3.10 Displaying Remote Exported Entries . .. . .. . . .. . .. . .. . .. . 2.2.3.11 Displaying the File Table . . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . 2.2.3.12 Displaying the udb and tcb Tables . . .. . .. . .. . . .. . .. . .. . .. . 2.2.3.13 Performing Commands on Lists . .. . .. . .. . .. . . .. . .. . .. . .. . 2.2.3.14 Displaying the lockstats Structures .. . .. . .. . . .. . .. . .. . .. . 2.2.
3.2.7 Checking Arguments Passed to an Extension . . . .. . .. . .. . .. . 3.2.8 Checking the Fields in a Structure . . . .. . .. . .. . .. . . .. . .. . .. . .. . 3.2.9 Setting the kdbx Context . .. . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . 3.2.10 Passing Commands to the dbx Debugger . . .. . .. . . .. . .. . .. . .. . 3.2.11 Dereferencing a Pointer .. . .. . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . 3.2.12 Displaying the Error Messages Stored in Fields .. . .. . .. . ..
A Output from the crashdc Command Index Examples 3–1 3–2 3–3 3–4 3–5 Template Extension Using Lists . .. . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . Extension That Uses Linked Lists: callout.c . . .. . .. . . .. . .. . .. . .. . Template Extensions Using Arrays .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . Extension That Uses Arrays: file.c . .. . .. . . .. . .. . .. . .. . . .. . .. . .. . .. . Extension That Uses Global Symbols: sum.c . . .. . .. . . .. . .. . .. . .. .
About This Manual This manual provides information on the tools used to debug a kernel and analyze a crash dump file of the Tru64™ UNIX (formerly DIGITAL UNIX) operating system. It also explains how to write extensions to the kernel debugging tools. You can use extensions to display customized information from kernel data structures or a crash dump file. Audience This manual is intended for system programmers who write programs that use kernel data structures and are built into the kernel.
Organization This manual consists of four chapters and one appendix: Chapter 1 Introduces the concepts of kernel debugging and crash dump analysis. Chapter 2 Describes the tools used to debug kernels and analyze crash dump files. Chapter 3 Describes how to write a kdbx debugger extension. This chapter assumes you have purchased and installed a Tru64 UNIX Source Kit and so have access to source files. Chapter 4 Provides background information useful for and examples of analyzing crash dump files.
G Manuals for general users S Manuals for system and network administrators P Manuals for programmers R Manuals for reference page users Some manuals in the documentation help meet the needs of several audiences. For example, the information in some system manuals is also used by programmers. Keep this in mind when searching for information on specific topics. The Documentation Overview provides information on all of the manuals in the Tru64 UNIX documentation set.
% $ A percent sign represents the C shell system prompt. A dollar sign represents the system prompt for the Bourne, Korn, and POSIX shells. # A number sign represents the superuser prompt. % cat Boldface type in interactive examples indicates typed user input. file Italic (slanted) type indicates variable values, placeholders, and function argument names. [|] {|} In syntax definitions, brackets indicate items that are optional and braces indicate items that are required.
1 Introduction to Kernel Debugging Kernel debugging is a task normally performed by systems engineers writing kernel programs. A kernel program is one that is built as part of the kernel and that references kernel data structures.
text file that describes the hardware and software that will be present on the running system. Using this information, the bootstrap linker links the modules that are needed to support this hardware and software. The linker builds the kernel directly into memory. You cannot directly debug a bootstrap-linked kernel because you must supply the name of an image to the kernel debugging tools. Without the image, the tools have no access to symbol names, variable names, and so on.
1.2 Debugging Kernel Programs Kernel programs can be difficult to debug because you normally cannot control kernel execution. To make debugging kernel programs more convenient, the system provides the kdebug debugger. The kdebug debugger is code that resides inside the kernel and allows you to use the dbx debugger to control execution of a running kernel in the same manner as you control execution of a user space program. To debug a kernel program in this manner, follow these steps: 1.
# dbx -k /vmunix /dev/mem This command invokes dbx with the kernel debugging flag, −k, which maps kernel addresses to make kernel debugging easier. The /vmunix and /dev/mem parameters cause the debugger to operate on the running kernel. Once in the dbx environment, you use dbx commands to display process IDs and trace execution of processes. You can perform the same tasks using the kdbx debugger.
1.4 Analyzing a Crash Dump File If your system crashes, you can often find the cause of the crash by using dbx or kdbx to debug or analyze a crash dump file. The operating system can crash because one of the following occurs: • Hardware exception • Software panic • Hung system When a system hangs, it is often necessary to force the system to create dumps that you can analyze to determine why the system hung.
At system reboot time, the copy of core memory saved in the swap partitions is copied into a file, called a crash dump file. You can analyze the crash dump file to determine what caused the crash. By default, the crash dump is a partial (rather than full) dump and is in compressed form. For complete information about managing crash dumps and crash dump files, including how to change default settings, see the System Administration manual. For examples of analyzing crash dump files, see Chapter 4.
2 Kernel Debugging Utilities The Tru64 UNIX system provides several tools you can use to debug the kernel and kernel programs. The Ladebug debugger (available as an option) is also capable of debugging the kernel. This chapter describes three debuggers and a utility for analyzing crash dumps: • The dbx debugger, which is described for kernel debugging in Section 2.1. (For general dbx user information, see the Programmer’s Guide.
______________________ Note _______________________ Starting with Tru64 UNIX Version 5.0, all the previously mentioned tools can be used with compressed (vmzcore.n) and uncompressed (vmcore.n) crash dump files. Older versions of these tools can read only vmcore.n files. If you are using an older version of a tool, use the expand_dump utility to produce a vmcore.n file from a vmzcore.n file.
• The system core memory image These files may be files from a running system, such as /vmunix and /dev/mem, or dump files, such as vmunix.n and vmzcore.n (compressed) or vmcore.n (uncompressed). By default, crash dump files are created in the /var/adm/crash directory (see the System Administration manual). ______________________ Note _______________________ You might need to be the superuser (root login) to examine the running system or crash dump files produced by savecore.
want to add a symbol table to your current debugging session rather than end the session and start a new one. To add a symbol table to your current debugging session, follow these steps: 1. Go to a window other than the one in which the debugger is running, or put the debugger in the background, and rebuild the modules for which you need a symbol table. 2. Once the modules build correctly, use the ostrip command to strip a symbol table out of the resulting executable file.
• During the dbx session, if you want to load a module dynamically, first set the $module_path dbx variable and then use the addobj command to load the module, as in the following example: (dbx) set $module_path /project4/mod_dir (dbx) addobj kmodC To verify that modules are being loaded from the correct location, turn on verbose module-loading using any one of the following methods: • Specify the -module_verbose dbx command option.
The following examples show how to use dbx to examine kernel images: (dbx) _realstart/X fffffc00002a4008: c020000243c4153e (dbx) _realstart/i [_realstart:153, 0xfffffc00002a4008] subq sp, 0x20, sp (dbx) _realstart/10i [_realstart:153, 0xfffffc00002a4008] subq sp, 0x20, sp [_realstart:154, 0xfffffc00002a400c] br r1, 0xfffffc00002a4018 [_realstart:156, 0xfffffc00002a4010] call_pal 0x4994e0 [_realstart:157, 0xfffffc00002a4014] bgt r31, 0xfffffc00002a3018 [_realstart:171, 0xfffffc00002a4018] ldq gp, 0(r1) [_rea
struct timeval { int tv_sec; int tv_usec; } it_value; }; 2.1.7 Debugging Multiple Threads You can use the dbx debugger to examine the state of the kernel’s threads with the querying and scoping commands described in this section. You use these commands to show process and thread lists and to change the debugger’s context (by setting its current process and thread variables) so that a stack trace for a particular thread can be displayed.
/usr/include/machine/reg.h header file to determine where registers are stored in the exception frame. The savedefp variable contains the location of the exception frame. (Note that no exception frames are created when you force a system to dump, as described in the System Administration manual.
To specify a full crash dump permanently so that this setting remains in effect after a reboot, use the patch command in dbx, as shown in the following example: (dbx) patch partial_dump=0 With either command, a partial_dump value of 1 specifies a partial dump. The following example shows how to examine the state of a user program named test1 that purposely precipitated a kernel crash with a syscall after several recursive calls: # dbx -k vmunix.1 vmzcore.1 /usr/proj7/test1 dbx version 5.
2 The up 8 command moves the debugging context 8 activation levels up the stack to one of the recursive calls within the user program code. 3 The print r command displays the current value of the variable r, which is a structure of array elements. Full symbolization is available for the user program, assuming it was compiled with the -g option. 4 The print r.a[511] command displays the current value of array element 511 of structure r. 2.1.
For debugging purposes, set the lockmode attribute to 4. Follow these steps to set the lockmode attribute to 4: 1. Create a stanza-formatted file named, for example, generic.stanza that appears as follows: generic: lockmode=4 The contents of this file indicate that you are modifying the lockmode attribute of the generic subsystem. 2. Add the new definition of lockmode to the /etc/sysconfigtab database: # sysconfigdb -a -f generic.stanza generic 3. Reboot your system.
[0] 0xfffffc000065c580 [1] 0xfffffc000065c780 } • Lock statistics are recorded to allow you to determine what kind of contention you have on a particular lock. Use the kdbx lockstats extension as shown in the following example to display lock statistics: # kdbx /vmunix (kdbx) lockstats Lockstats li_name cpu count tries misses %misses waitsum waitmax waitmin trmax =========== ===================== === ====== ========== ======= ====== ============ ======= ======= ====== k0x00657d40 inode.
/dev/mem, respectively. By default, crash dump files are created in the /var/adm/crash directory (see the System Administration manual). Use the following kdbx command to examine a running system: # kdbx −k /vmunix /dev/mem Use a kdbx command similar to the following to examine a compressed or uncompressed crash dump file, respectively: # kdbx −k vmunix.1 vmzcore.1 # kdbx −k vmunix.1 vmcore.1 The version number (.
context proc | user Sets context to the user’s aliases or the extension’s aliases. This command is used only by the extensions. coredata start_address end_address Dumps, in hexadecimal, the contents of the core file starting at start_address and ending before end_address. dbx command-string Passes the command-string to dbx. Specifying dbx is optional; if kdbx does not recognize a command, it automatically passes that command to dbx. See the dbx(1) reference page for a complete description of dbx commands.
print string Displays string on the terminal. If this command is used by an extension, the terminal receives no output. quit Exits the kdbx debugger. source [-x] [file(s)] Reads and interprets files as kdbx commands in the context of the current aliases. If the you specify the −x flag, the debugger displays commands as they are executed. unalias name Removes the alias, if any, from name. The kdbx debugger contains many predefined aliases, which are defined in the kdbx startup file /var/kdbx/system.kdbxrc.
Notation Address Type Replaces Example k kseg fffffc00 k0x00487c48 u user space 00000000 u0x86406200 ? Unrecognized or random type ?0x3782cc33 The sections that follow describe the kdbx extensions that are supplied with your system. 2.2.3.1 Displaying the Address Resolution Protocol Table The arp extension displays the contents of the address resolution protocol (arp) table.
to the start_address is usually of the form &arrayname[0]. flags If the you specify the −head flag, the next argument appears as the table header. If the you specify the −size flag, the next argument is used as the array element size; otherwise, the size is calculated from the element type. If the you specify the −cond flag, the next argument is used as a filter. It is evaluated by dbx for each array element, and if it evaluates to TRUE, the action is taken on the element.
0xfffffc0000473838 0xfffffc0000473848 0xfffffc0000473858 0xfffffc0000473870 0xfffffc0000473878 0xfffffc0000473888 (kdbx) = = = = = = "syscalltrace" "boothowto" "do_virtual_tables" "netblk" "zalloc_physical" "trap_debug" 2.2.3.3 Displaying the Buffer Table The buf extension displays the buffer table. This extension has the following format: buf [ addresses -free | -all] If you omit arguments, the debugger displays the buffers on the hash list.
wakeup thread_timeout thread_timeout realitexpire thread_timeout k0x0187a220 k0x010ee950 k0x0132f220 k0x01069950 k0x01bba950 374923 376286 40724481 80436086 82582849 The abscallout extension displays the absolute callout table. This table contains callout entries with the absolute time in fractions of seconds.
config For example: (kdbx) config Bus #0 (0xfffffc000048c6a0): Name - "tc" Connected to - "nexus" Config 1 - tcconfl1 Config 2 - tcconfl2 Controller "scc" (0xfffffc000048c970) (kdbx) 2.2.3.7 Converting the Base of Numbers The convert extension converts numbers from one base to another. This extension has the following format: convert [-in [ 8 | 10 | 16] ] [-out [ 2 | 8 | 10 | 16] ] [ args] The −in and −out flags specify the input and output bases, respectively.
2.2.3.9 Disassembling Instructions The dis extension disassembles some number of instructions. This extension has the following format: dis start-address [ num-instructions] The num-instructions, argument specifies the number of instructions to be disassembled. The start-address argument specifies the starting address of the instructions. If you omit the num-instructions argument, 1 is assumed. For example: (kdbx) dis 0xffffffff864c2a08 5 [., 0xffffffff864c2a08] call_pal 0x20001 [.
=========== v0x90406000 v0x90406058 v0x904060b0 v0x90406108 v0x90406160 v0x904061b8 v0x90406210 v0x90406268 v0x904062c0 v0x90406318 v0x90406370 . . .
list_action " type" next-field end-addr start-addr [ flags] command The arguments to the list_action extension are as follows: "type " The type of an element in the specified list. next-field The name of the field that points to the next element. end-addr The value of the next field that terminates the list. If the list is NULL-terminated, the value of the end-addr argument is zero (0). If the list is circular, the value of the end-addr argument is equal to the start-addr argument.
For example: (kdbx) list_action "struct proc *" p_nxt 0 allproc p \ %c.task.u_address.uu_comm %c.p_pid "list_action" 1382 "dbx" 1380 "kdbx" 1379 "dbx" 1301 "kdbx" 1300 "sh" 1296 "ksh" 1294 "csh" 1288 "rlogind" 1287 . . . 2.2.3.14 Displaying the lockstats Structures The lockstats extension displays the lock statistics contained in the lockstats structures. Statistics are kept for each lock class on each CPU in the system.
−class name Displays the lockstats structures for the specified lock class. (Use the lockinfo command to display information about the names of lock classes.) −cpu number Displays the lockstats structures for the specified CPU. −read Displays the reads, sleeps attributes, and waitsums or misses. −sum Displays summary data for all CPUs and all lock types. −total Displays summary data for all CPUs. −update n Updates the display every n seconds.
This extension has the following format: lockinfo [ -class name ] The −class flag allows you to display the lockinfo structure for a particular class of locks. If you omit the flag, lockinfo displays the lockinfo structures for all classes of locks.
2.2.3.
2.2.3.19 Converting the Contents of Memory to Symbols The paddr extension converts a range of memory to symbolic references and has the following format: paddr address number-of-longwords The arguments to the paddr extension are as follows: address The starting address. number-of-longwords The number of longwords to display. For example: (kdbx) paddr 0xffffffff90be36d8 20 [., 0xffffffff90be36d8]: [h_kmem_free_memory_:824, 0xfffffc000037f47c] 0x0000000000000000 [., 0xffffffff90be36e8]: [.
substitution, which the dbx debugger’s printf command does not. This extension has the following format: printf format-string [ args] The arguments to the printf extension are as follows: format-string A character string combining literal characters with conversion specifications. args The arguments for which you want kdbx to display values. For example: (kdbx) printf "allproc = 0x%lx" allproc allproc = 0xffffffff902356b0 2.2.3.
v0x81a28210 5301 in pagv ctty exec v0x819aad80 195 v0x8197c210 6346 v0x819c4210 204 : 5276 5301 1138 0 00080002 00000000 NULL 1 1 1 195 6346 0 0 0 0 0 00080628 00000000 0 00004006 00000000 0 00086efe 00000000 NULL in pagv NULL in pagv exec NULL in pagv 2.2.3.23 Converting an Address to a Procedure name The procaddr extension converts the specified address to a procedure name. This extension has the following format: procaddr [ address ] For example: (kdbx) procaddr callout.
(kdbx) sum Hostname : system.dec.com cpu: DEC3000 - M500 avail: 1 Boot-time: Tue Nov 3 15:01:37 1992 Time: Fri Nov 6 09:59:00 1998 Kernel : OSF1 release 1.2 version 1.2 (alpha) (kdbx) 2.2.3.
v0x81a0ab70 v0x81a26b70 v0x819f2b70 v0x81a14b70 v0x81a3cb70 v0x81a28000 v0x819aab70 v0x8197c000 v0x819c4000 . . .
−u Displays the stack trace of all user threads For example: (kdbx) trace *** stack trace of thread 0xffffffff819af590 pid=0 *** > 0 thread_run(new_thread = 0xffffffff819af928) ["../../../../src/kernel/kern/sched_prim.c":1637, 0xfffffc00002f9368] 1 idle_thread() ["../../../../src/kernel/kern/sched_prim.c":2717, 0xfffffc00002fa32c] *** stack trace of thread 0xffffffff819af1f8 pid=0 *** > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1455, 0xfffffc00002f9084] 1 softclock_main() ["../../../..
u [ proc-addr] If you omit arguments, the extension displays the u structure of the currently running process.
-uthread Displays all ucreds referenced by the uthread structures -file Displays all ucreds referenced by the file structures -buf Displays all ucreds referenced by the buf structures -ref address Displays all references to a given ucred -check address Checks the reference count of a particular ucred -checkall Checks the reference count of all ucreds, with mismatches marked by an asterisk ( *) For example: (kdbx) ucred ADDR OF UCRED =================== 0xffffffff863d4960 0xffffffff8651fb80 0xfff
=================== 0xffffffff863d5a40 ====== 4 ======= 4 2.2.3.32 Removing Aliases The unaliasall extension removes all aliases, including the predefined aliases. This extension has the following format: unaliasall For example: (kdbx) unaliasall 2.2.3.
For example: (kdbx) vnode ADDR_VNODE V_TYPE =========== ====== v0x9021e000 VBLK v0x9021e1e8 VBLK v0x9021e3d0 VBLK v0x9021e5b8 VDIR v0x9021e7a0 VDIR v0x9021ed58 VBLK v0x9021ef40 VBLK v0x9021f128 VREG v0x9021f310 VDIR v0x9021f8c8 VREG v0x9021fe80 VREG v0x902209f0 VDIR v0x90220fa8 VBLK v0x90221190 VBLK v0x90221560 VREG v0x90221748 VBLK . . .
and perform other debugging tasks, just as you would when debugging user space programs. The ability to debug a running kernel is provided through remote debugging. The kernel code you are debugging runs on a test system. The dbx debugger runs on a remote build system. The debugger communicates with the kernel code you are debugging over a serial communication line or through a gateway system. You use a gateway system when you cannot physically connect the test and build systems.
To use the kdebug debugger, you must set up your build, gateway, and test systems as described in Section 2.3.1. Once you complete the setup, you invoke dbx as described in Section 2.3.2 and enter commands as you normally would. Refer to Section 2.3.3 if you have problems with the setup of your remote kdebug debugging session. 2.3.1 Getting Ready to Use the kdebug Debugger To use the kdebug debugger, you must do the following: 1. Attach the test system and the build (or gateway) system.
The $kdebug_host variable is the name of the gateway system. By default, $kdebug_host is set to localhost, assuming no gateway system is being used. The $kdebug_line variable selects the serial line definition to use in the /etc/remote file of the build system (or the gateway system, if one is being used). By default, $kdebug_line is set to kdebug.
Setting this system attribute makes debugging on an SMP system easier. For information about the advantages provided see Section 2.1.11. 8. Set the OPTIONS KDEBUG configuration file option in your test kernel. To set this option, run the doconfig command without flags, as shown: # doconfig Choose KERNEL BREAKPOINT DEBUGGING from the kernel options menu when it is displayed by doconfig. Once doconfig finishes building a new kernel, copy that kernel to the /vmunix file and reboot your system.
dbx debugging commands. See Section 2.1, the dbx(1) reference page, or the Programmer’s Guide for information on dbx debugging commands. If you are unable to bring your test kernel up to a fully operational mode, you can reboot the halted system running the generic kernel, as follows: >>> set boot_osflags "S" >>> set boot_file "/genvmunix" >>> boot Once the system is running, you can run the bcheckrc script manually to check and mount your local file systems.
– Check the /etc/inittab file to see if any processes are using that line. If so, disable these lines until you finish with the kdebug session. See the inittab(4) reference page for information on disabling lines. – Examine your /etc/remote file to determine which serial line is associated with the kdebug label. Then, use the ps command to see if any processes are using the line.
– Remove any settings of the $kdebug_line variable as follows: set $kdebug_line= • Start dbx on the build system. You should see informational messages on the terminal line /dev/ttyp2 that kdebug is starting. • If you are using a gateway system, ensure that the inetd daemon is running on the gateway system. Also, check the TCP/IP connection between the build and gateway systems using one of the following commands: rlogin, rsh, or rcp. 2.3.
uses existing system tools and utilities to extract information from crash dumps. The information garnered from crash dump files or from the running kernel includes the hardware and software configuration, current processes, the panic string (if any), and swap information. The crashdc utility is invoked each time the system is booted. If it finds a current crash dump, crashdc creates a data collection file with the same numerical file name extension as the crash dump (see Section 2.1.
3 Writing Extensions to the kdbx Debugger To assist in debugging kernel code, you can write an extension to the kdbx debugger. Extensions interact with kdbx and enable you to examine kernel data relevant to debugging the source program. This chapter provides the following: • A list of considerations before you begin writing extensions (Section 3.1) • A description of the kdbx library routines that you can use to write extensions (Section 3.2) • Examples of kdbx extensions (Section 3.
discussed in Section 3.3 can help you understand what is involved in writing an extension and provide good examples of using the kdbx library functions. 3.2 Standard kdbx Library Functions The kdbx debugger provides a number of library functions that are used by the resident extensions. You can use these functions, which are declared in the ./usr/include/kdbx.h header file, to develop customized extensions for your application. To use the functions, you must include the ./usr/include/kdbx.
The values in comm and local provide the error code interpreted by print_status. • The FieldRec data type, which is used to declare a field of interest in a data structure. The following is the type definition for the FieldRec data type: typedef struct { char *name; int type; caddr_t data; char *error; } FieldRec; The char *name declaration is the name of the field in question. The int type declaration is the type of the field, for example, NUMBER, STRUCTURE, POINTER.
3.2.3 Getting a Representation of an Array Element The array_element function returns a representation of one element of an array. The function returns non-NULL if it succeeds or NULL if an error occurs. When the value of error is non-NULL, the error argument is set to point to the error message.
data type. It returns the pointer value if the data type of the array element is a pointer data type. This function returns TRUE if it is successful, FALSE otherwise. When the return value is FALSE, an error message is returned in an argument to the function.
3.2.5 Returning the Size of an Array The array_size function returns the size of the specified array.
For example: if(!cast(addr, "struct file", &fil, &error)){ fprintf(stderr, "Couldn’t cast address to a file:\n"); fprintf(stderr, "%s\n", error); quit(1); } 3.2.7 Checking Arguments Passed to an Extension The check_args function checks the arguments passed to an extension or displays a help message. The function displays a help message when the user specifies the −help flag on the command line.
char** hints); Argument Input/Output Description symbol Input Names the structure to be checked fields Input Describes the fields to be checked nfields Input Specifies the size of the fields argument hints Input Unused and should always be set to NULL You should check the structure type using the check_fields function before using the read_field_vals function to read field values. For example: FieldRec fields[] = { { ".sc_sysid", NUMBER, NULL, NULL }, { ".
3.2.10 Passing Commands to the dbx Debugger The dbx function passes a command to the dbx debugger. The function has an argument, expect_output, that controls when it returns. If you set the expect_output argument to TRUE, the function returns after the command is sent, and expects the extension to read the output from dbx. If you set the expect_output argument to FALSE, the function waits for the command to complete execution, reads the acknowledgement from kdbx, and then returns.
structure = deref_pointer(struct_pointer); 3.2.12 Displaying the Error Messages Stored in Fields The field_errors function displays the error messages stored in fields by the check_fields function.
format_addr((long) struct_addr(ele), address); format_addr((long) fields[2].data, cred); format_addr((long) fields[3].data, data); sprintf(buf, "%s %s %4d %4d %s %s %s %6d %s%s%s%s%s%s%s%s%s", address, get_type((int) fields[0].data), fields[1].data, fields[2].data, ops, cred, data, fields[6].data, ((long) fields[7].data) & FREAD ? " read" : , ((long) fields[7].data) & FWRITE ? " write" : , ((long) fields[7].data) & FAPPEND ? " append" : , ((long) fields[7].data) & FNDELAY ? " ndelay" : , ((long) fields[7].
Argument Input/Output Description command Input Names the command to be executed quote Input If set to TRUE causes the quote character, apostrophe, and backslash to be appropriately quoted so that they are treated normally, instead of as special characters expect_output Input Indicates whether the extension expects output and determines when the function returns For example: do { : if(doit){ format(command, buf, type, addr, last, i, next); context(True); krash(buf, False, True); while((line =
• The next argument is the address of the next node in the list; for example, the next node might be at address 0xffffffff8196d050. 3.2.16 Getting the Address of an Item in a Linked List The list_nth_cell function returns the address of one of the items in a linked list.
3.2.17 Passing an Extension to kdbx The new_proc function directs kdbx to execute a proc command with arguments specified in args. The args arguments can name an extension that is included with the operating system or an extension that you create.
For example: resp = read_response_status(); next_number(resp, NULL, &size); ret->size = size; 3.2.19 Getting the Next Token as a String The next_token function returns a pointer to the next token in the specified pointer to a string. A token is a sequence of nonspace characters.
} 3.2.20 Displaying a Message The print function displays a message on the terminal screen. Because of the input and output redirection done by kdbx, all output to stdout from a kdbx extension goes to dbx. As a result, a kdbx extension cannot use normal C output functions such as printf and fprintf(stdout,...) to display information on the screen. Although the fprintf(stderr,...
if(status.type != OK){ print_status("read_line failed", &status); quit(1); } 3.2.22 Exiting from an Extension The quit function sends a quit command to kdbx. This function has the following format: void quit( int i); Argument Input/Output Description i Input The status at the time of the exit from the extension For example: if (!read_sym_val("vm_swap_head", NUMBER, &end, &error)) { fprintf(stderr, "Couldn’t read vm_swap_head:\n"); fprintf(stderr, "%s\n", error); quit(1); } 3.2.
return(False); } 3.2.24 Returning a Line of kdbx Output The read_line function returns the next line of the output from the last kdbx command executed. If the end of the output is reached, this function returns NULL and a status of OK. If the status is something other than OK when the function returns NULL, an error occurred.
You can use this function to look up any type of value; however it is most useful for retrieving the value of pointers that point to other pointers. For example: start_addr = (long) ((long *)utask_fields[7].data + i-NOFILE_IN_U); if(!read_memory(start_addr , sizeof(long *), (char *)&val1, &error) || !read_memory((long)utask_fields[8].data , sizeof(long *), (char *)&val2, &error)){ fprintf(stderr, "Couldn’t read_memory\n"); fprintf(stderr, "%s\n", error); quit(1); } 3.2.
3.2.27 Reading Symbol Representations The read_sym function returns a representation of the named symbol. This function has the following format: DataStruct read_sym( char* name); Argument Input/Output Description name Input Names the symbol, which is normally a pointer to a structure or an array of structures inside the kernel Often you use the result returned by the read_sym function as the input argument of the array_element, array_element_val, or read_field_vals function.
3.2.29 Reading the Value of a Symbol The read_sym_val function returns the value of the specified symbol.
print(buf); sprintf(buf, "\tConfig 1 - %s\tConfig 2 - %s", addr_to_proc((long) bus_fields[3].data), addr_to_proc((long) bus_fields[4].data)); print(buf); if(!prctlr((long) bus_fields[0].data)) quit(1); print(); } 3.2.31 Converting a String to a Number The to_number function converts a string to a number. The function returns TRUE if successful, or FALSE if conversion was not possible.
• Extensions that use arrays. Example 3–3 provides a C language template and Example 3–4 is the source code for the /var/kdbx/file extension, which shows how to develop an extension using arrays. • Extensions that use global symbols. Example 3–5 is the source code for the /var/kdbx/sum extensions, which shows how to pull global symbols from the kernel. A template is not provided because the means for pulling global symbols from a kernel can vary greatly, depending upon the desired output.
Example 3–1: Template Extension Using Lists (cont.) quit(0); } 1 The help string is output by the check_args function if the user enters the help extension_name command at the kdbx prompt. The first line of the help string should be a one-line description of the extension. The rest should be a complete description of the arguments. Also, each line should end with the string \\\n\. 2 Every structure field to be extracted needs an entry.
Example 3–2: Extension That Uses Linked Lists: callout.c (cont.
Example 3–2: Extension That Uses Linked Lists: callout.c (cont.
Example 3–2: Extension That Uses Linked Lists: callout.c (cont.) } } /* end of for */ quit(0); } /* end of main() */ Example 3–3: Template Extensions Using Arrays #include #include
Example 3–3: Template Extensions Using Arrays (cont.) } } 1 The help string is output by the check_args function if the user enters the help extension_name command at the kdbx prompt. The first line of the help string should be a one-line description of the extension. The rest should be a complete description of the arguments. Also, each line should end with the string \\\n\. 2 Every structure field to be extracted needs an entry.
Example 3–4: Extension That Uses Arrays: file.c (cont.) char buffer[256]; /* *** Implement addresses *** */ FieldRec fields[] = { { ".f_type", NUMBER, NULL, NULL }, { ".f_count", NUMBER, NULL, NULL }, { ".f_msgcount", NUMBER, NULL, NULL }, { ".f_cred", NUMBER, NULL, NULL }, { ".f_data", NUMBER, NULL, NULL }, { ".f_ops", NUMBER, NULL, NULL }, { ".f_u.fu_offset", NUMBER, NULL, NULL }, { ".f_flag", NUMBER, NULL, NULL } }; FieldRec fields_pid[] = { { ".pe_pid", NUMBER, NULL, NULL }, { ".
Example 3–4: Extension That Uses Arrays: file.c (cont.) else if((long) (fields[5].data) == vn_addr) ops = " vnops"; else if((long) (fields[5].data) == socket_addr) ops = "sockops"; else format_addr((long) fields[5].data, op_buf); format_addr((long) struct_addr(ele), address); format_addr((long) fields[3].data, cred); format_addr((long) fields[4].data, data); sprintf(buf, "%s %s %4d %4d %s %11s %11s %6d%s%s%s%s%s%s%s%s%s", address, get_type((int) fields[0].data), fields[1].data, fields[2].
Example 3–4: Extension That Uses Arrays: file.c (cont.
Example 3–4: Extension That Uses Arrays: file.c (cont.
Example 3–4: Extension That Uses Arrays: file.c (cont.) { if((pid_entry_ele = array_element(pid_entry_struct, index, &error))==NULL){ fprintf(stderr, "Couldn’t get pid array element %d\n", index); fprintf(stderr, "%s\n", error); continue; } if(!read_field_vals(pid_entry_ele, fields_pid, 2)) { fprintf(stderr, "Couldn’t get values of pid array element %d\n", index); field_errors(fields_pid, 2); continue; } addr_of_proc = (long)fields_pid[1].
Example 3–4: Extension That Uses Arrays: file.c (cont.) quit(1); } if (first_file) { sprintf(buf, "[Process ID: %d]", fields_pid[0].data); print(buf); first_file = False; } if(!prfile(fil)) fprintf(stderr, "Continuing with next file address.\n"); } } /* for loop */ return(True); } /* end */ Example 3–5: Extension That Uses Global Symbols: sum.c #include #include
Example 3–5: Extension That Uses Global Symbols: sum.c (cont.) * cpup no longer exists, emmulate platform_string(), * a.k.a. get_system_type_string(). read_var("cpup.
The following example shows how to invoke the test extension from within the kdbx debugger: # kdbx -k /vmunix dbx version 5.0 Type ’help’ for help. (kdbx) test Hostname : system.dec.com cpu: DEC3000 - M500 avail: 1 Boot-time: Fri Nov 6 16:09:10 1992 Time: Mon Nov 9 10:51:48 1992 Kernel : OSF1 release 1.2 version 1.2 (alpha) (kdbx) 3.5 Debugging Custom Extensions The kdbx debugger and the dbx debugger include the capability to communicate with each other using two named pipes.
2. Set up kdbx and dbx to communicate with each other. In the kdbx session, enter the procpd alias to create the files /tmp/pipein and /tmp/pipeout as follows: (kdbx) procpd The file pipein directs output from the dbx session to the kdbx session. The file pipeout directs output from the kdbx session to the dbx session. 3.
4 Crash Analysis Examples Finding problems in crash dump files is a task that takes practice and experience to do well. Exactly how you determine what caused a crash varies depending on how the system crashed. The cause of some crashes is relatively easy to determine, while finding the cause of other crashes is difficult and time-consuming. This chapter helps you analyze crash dump files by providing the following information: • Guidelines for examining crash dump files (Section 4.
related data structures and functions that appear earlier in the stack. An earlier function might have passed corrupt data to the function that caused a crash. 6. Determine whether you can fix the problem. If the system crashed because of a hardware problem (for example, because a memory board became corrupt), correcting the problem probably requires repairing or replacing the hardware.
c":753, 0xfffffc00003c4ae4] 1 panic(s = 0xfffffc000044b618 = "mode = 0%o, inum = %d, pref = %d fs = %s\n")\ ["../../../../src/kernel/bsd/subr_prf.c":1119, 0xfffffc00002bdbb0] 2 ialloc(pip = 0xffffffff8c6acc40, ipref = 57664, mode = 0, ipp = 0xffffffff8c\ f95af8) ["../../../../src/kernel/ufs/ufs_alloc.c":501, 0xfffffc00002dab48] 3 maknode(vap = 0xffffffff8cf95c50, ndp = 0xffffffff8cf922f8, ipp = 0xffffffff\ 8cf95b60) ["../../../../src/kernel/ufs/ufs_vnops.
c3200] (kdbx) q dbx (pid 29939) died. Exiting... 1 Use the sum command to get a summary of the system. 2 Display the panic string (panicstr). 3 Perform a stack trace of the current thread block. The stack trace shows that the direnter function, at line 986 in file ufs_lookup.c, called the panic function. 4.3 Identifying a Hardware Exception Occasionally, your system might crash due to a hardware error. During a hardware exception, the hardware encounters a situation from which it cannot continue.
4, 0xfffffc00003e0c78] 3 _XentMM() ["/usr/sde/alpha/build/alpha.nightly/src/kernel/arch/alpha/locore.\ s":702, 0xfffffc00003d4ff4] 5 (dbx) kps PID COMM 00000 kernel idle 00001 init 00002 device server 00003 exception hdlr 00663 ypbind 00018 cfgmgr 00219 automount . . .
(dbx) px savedefp[28] 0xfffffc000032972c 8 9 (dbx) savedefp[28]/i [nfs_putpage:2344, 0xfffffc000032972c] 10 (dbx) savedefp[23]/i [ubc_invalidate:1768, 0xfffffc0000315fe0] ldl r5, 40(r1) stl r0, 84(sp) 11 (dbx) func nfs_putpage (dbx) file 12 /usr/sde/alpha/build/alpha.nightly/src/kernel/kern/sched_prim.c 13 (dbx) func ubc_invalidate ubc_invalidate: Source not available 14 (dbx) file /usr/sde/alpha/build/alpha.nightly/src/kernel/vfs/vfs_ubc.
11 Point the dbx debugger to the nfs_putpage function. 12 Display the name of the source file that contains the nfs_putpage function. 13 Point the dbx debugger to the ubc_invalidate function. 14 Display the name of the source file that contains the ubc_invalidate function. The result from this example shows that the ubc_invalidate function, which resides in the /vfs/vfs_ubc.c file at line number 1768, called the nfs_putpage function at line number 2344 in the /kern/sched_prim.
4.4 Finding a Panic String in a Thread Other Than the Current Thread The dbx and kdbx debuggers have the concept of the current thread. In many cases, when you invoke one of the debuggers to analyze a crash dump, the panic string is in the current thread. At times, however, the current thread contains no panic string and so is probably not the thread that caused the crash. The following example shows a method for stepping through kernel threads to identify the events that lead to the crash: # dbx -k .
Source not available thread 0x8d42dd70 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available 4 (dbx) tset 0x8d42f5d0 thread 0x8d42f5d0 stopped at [boot:696 ,0xfffffc00003e119c] Source not ava\ ilable 5 (dbx) t > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/mac\ hdep.c":694, 0xfffffc00003e1198] 1 panic(s = 0xfffffc000048a098 = " sp contents at time of fault: 0x%l01\ 6x\r\n\n") ["../../../../src/kernel/bsd/subr_prf.c":1110, 0xfffffc00002beef4] 2 trap() ["../..
% dbx -k ./vmunix.1 ./vmzcore.1 dbx version 5.0 Type ’help’ for help. stopped at [boot:1494 ,0xfffffc0000442918] 1 (dbx) p ustsname struct { sysname = "OSF1" nodename = "system.dec.com" release = "V5.0" version = "688.
11 pgrp_ref(0xffffffffa52c6000, 0x0, 0xfffffc000023ee20, 0x6b7, 0xfffffc000\ 05e1080) ["../../../../src/kernel/bsd/kern_proc.c":561, 0xfffffc00003734c4] 12 exit(0xffffffffb53ef740, 0x100, 0x1, 0xffffffffa42e5e80, 0x1) ["../../..\ /../src/kernel/bsd/kern_exit.c":868, 0xfffffc000023ef30] 13 rexit(0xffffffff814d2d80, 0xffffffffb53ef758, 0xffffffffb53ef8b8, 0x1000\ 00001, 0x0) ["../../../../src/kernel/bsd/kern_exit.
panic function were performed after the system was corrupt and during an attempt to save data. Normally, any events that occur after the initial call to the panic function will not help you determine why the system crashed. In this example, the problem is in the pgrp_ref function on line 561 in the kern_proc.c file. If you follow the stack trace after the pgrp_ref function, you can see that the pgrp_ref function calls the simple_lock_valid_violation function.
A Output from the crashdc Command This appendix contains a sample crash-data.n file created by the crashdc command (using a compressed crash-dump file, vmzcore.0). The output is explained in the list following the example. # # Crash Data Collection (Version 1.4) # 1 _crash_data_collection_time: Fri Jul 10 01:25:31 EDT 1998 _current_directory: / _crash_kernel: /var/adm/crash/vmunix.0 _crash_core: /var/adm/crash/vmzcore.0 _crash_arch: alpha _crash_os: Tru64 UNIX _host_version: Tru64 UNIX V5.0 (Rev.
asc1 at tcds0 slot 1 rz8 at asc1 bus 1 target 0 lun 0 (DEC RZ57 (C) DEC 5000) rz9 at asc1 bus 1 target 1 lun 0 (DEC RZ56 (C) DEC 0300) fb0 at tc0 slot 8 1280X1024 bba0 at tc0 slot 7 ln0: DEC LANCE Module Name: PMAD-BA ln0 at tc0 slot 7 ln0: DEC LANCE Ethernet Interface, hardware address: 08-00-2b-2c-f3-83 Firmware revision: 5.1 PALcode: Tru64 UNIX version 1.21 AlphaServer 4100 5/400 4MB lvm0: configured. lvm1: configured.
_proc_thread_list_end: _dump_begin: 7 > 0 boot(reason = 0, howto = 0) ["../../../../src/kernel/arch/alpha/machdep.c": 1118, 0xfffffc0000374a08] mp = 0xffffffff961962f8 nmp = 0xffffffff86333ab8 fsp = (nil) rs = 5368785696 error = -1776721160 ind = 2424676 nbusy = 4643880 1 panic(s = 0xfffffc000043cf70 = "kernel memory fault") ["../../../../src\ /kernel/bsd/subr_prf.c"\ :616, 0xfffffc000024ff60] bootopt = 0 2 trap() ["../../../../src/kernel/arch/alpha/trap.
_kernel_memory_fault_data_begin: 10 struct { fault_va = 0x0 fault_pc = 0x0 fault_ra = 0xfffffc000028951c fault_sp = 0xffffffff96199a48 access = 0xffffffffffffffff status = 0x0 cpunum = 0x0 count = 0x1 pcb = 0xffffffff96196000 thread = 0xffffffff863d36c0 task = 0xffffffff86306b80 proc = 0xffffffff95aaf6a0 } _kernel_memory_fault_data_end: Invalid character in input _uptime: .88 hours _stack_trace_begin: 11 > 0 boot(reason = 0, howto = 0) ["../../../../src/kernel/arch/alpha/machdep.
_kdbx_sum_start: Hostname : system.dec.com cpu: AlphaServer 4100 5/400 avail: 1 Boot-time: Tues Jul 7 10:33:25 1998 Time: Mon Jul 13 13:58:52 1998 Kernel : OSF1 release V5.0 version 688.
trap: invalid memory ifetch access from kernel mode This message describes the kernel memory fault and indicates that the kernel was unable to fetch a needed instruction. The preserved message buffer also contains the faulting virtual address, the pc of the instruction that failed, the contents of the return address register, and the stack pointer at the time of the memory fault. 4 The kernel process status list shows the processes that were active at the time of the crash.
11 The _stack_trace_begin line begins a trace of the current thread block’s stack at the time of the crash. In this case the _XentMM function called the trap function. The trap function called the panic function, which called the boot function and the system crashed. 12 The exception frame is a stack frame created to store the state of the process running at the time of the exception. It stores the registers and pc associated with the process.
Index A abscallout kdbx extension, 2–19 access variable, A–6 addr_to_proc function, 3–3 alias command, 2–13 Alpha hardware architecture documentation, 1–1 arp kdbx extension, 2–16 array using in a kdbx extension, 3–27e using in kdbx extension, 3–28e array_action kdbx extension, 2–16 array_element function, 3–4 array_element_val function, 3–4 array_size function, 3–6 B boot function, 1–5 bootstrap-linked kernel debugging, 1–1 breakpoint setting on an SMP system, 2–44 buf kdbx extension, 2–18 build system, 2
crash-data.
using to determine type of kernel, 1–2 file kdbx extension, 2–21 firmware version displaying, 2–10 format_addr function, 3–10 free_sym function, 3–11 example of using for crash dump analysis, 4–3, 4–7 example of using for identifying hardware exception, 4–4 executing extensions to, 2–14 extensions to, 3–22 initialization files, 2–13 library functions for extensions to, 3–2 G gateway system, 2–38 global symbols using in kdbx extension, 3–34e H hardware exception example of debugging, 4–4 help command, 2
krash function, 3–11 L ld command using to build a kernel image file, O ofile kdbx extension, 2–27 operating system version displaying, 2–10 location of in crashdc output, A–5 1–2 libkdbx.
Q quit command, 2–15 quit function, 3–17 R read_field_vals function, 3–17 read_line function, 3–18 read_memory function, 3–18 read_response function, 3–19 read_sym function, 3–20 read_sym_addr function, 3–20 read_sym_val function, 3–21 reg.h header file ( See /usr/include/machine/reg.
displaying using the inpcb kdbx extension, 2–22 test system, 2–38 testing kernel programs, 2–37 thread displaying the process control block for, 2–28 thread kdbx extension, 2–32 thread variable, A–6 to_number function, 3–22 trace command, 4–3 trace kdbx extension, 2–32 tracing execution during crash dump analysis, 4–4 on an SMP system, 4–11 tset command, 4–11 U u kdbx extension, 2–33 Index–6 ucred kdbx extension, 2–34 udb table displaying using the inpcb kdbx extension, 2–22 unalias command, 2–15 unalias