13–15 Nov 2023
America/New_York timezone

Session

Linux Kernel Debugging MC

14 Nov 2023, 09:30

Description

When things go wrong, we need to debug the kernel. There are about as many ways to do that as you can imagine: printk, kdb/kgdb over serial, tracing, attaching debuggers to /proc/kcore, and post-mortem debugging using core dumps, just to name a few. Frequently, tools and approaches used by userspace debuggers aren't enough for the requirements of the kernel, so special tools are created to handle them: crash, drgn, makedumpfile, libkdumpfile, and many, many others.

Presentation materials

  1. Petr Tesařík
    14/11/2023, 09:30

    In the past few years, much attention has been paid to various tools that enable live debugging and post-mortem analysis. Some of these tools access the underlying data and metadata through the libkdumpfile library. But not (yet) all.

    This talk is a tour of the kernel dump file format zoo, how these formats can be handled in a unified way, and what needs to be done to make libkdumpfile...

    Go to contribution page
  2. Omar Sandoval
    14/11/2023, 10:00

    drgn is currently read-only: it can attach to the running kernel and read memory, but it can't modify memory or modify the flow of execution. These read-write features would clearly be useful for development (for example, in a virtual machine or a lab). If done safely, they could also be useful for modifying the kernel in production. There are many potential...

    Go to contribution page
  3. Stephen Brennan (Oracle)
    14/11/2023, 10:30

    Kernel debugging takes a variety of forms, but when a "real debugger" is required, you usually need to have debuginfo, and the standard kind of debuginfo is usually DWARF.

    While DWARF is very powerful, it's not always the right choice for every situation. Fortunately, the kernel already contains nearly enough introspection information to power basic debugging operations. Kallsyms can...

    Go to contribution page
  4. Guilherme Piccoli (Igalia)
    14/11/2023, 11:30

    For some lightweight systems, triggering a kdump could be a bit painful - it requires a generous amount of RAM to be pre-reserved, not available for regular usage at kernel runtime. Also, the panic kernel boot process takes time, and is prone to non-deterministic failures due to HW status or related to the cause of the panic event. So, despite kdump is a pretty standard way for collecting...

    Go to contribution page
  5. Elliot Berman (Qualcomm), Mukesh Ojha
    14/11/2023, 12:00

    Qualcomm devices in engineering mode provide a mechanism for generating full system RAM dumps from field / test farm for postmortem debugging even in the case of not-kernel system crashes. But, on end user devices, taking complete RAM dump at the moment of failure has substantial storage requirement as well as it is time consuming to transfer them electronically. So, instead of copying and...

    Go to contribution page
  6. Lukáš Hruška
    14/11/2023, 12:30

    Linux Kernel currently has a mechanism to create a dump of a whole memory for
    further debugging of an observed issue with the help of crashkernel.
    Unfortunately, we are unable to do this without restarting the host which causes
    a problem in case of having a high availability service running on the system
    experiencing some complex issue that cannot be debugged without the complete
    memory...

    Go to contribution page
  7. Omar Sandoval

    Kernel core dumps are usually saved to disk by the kdump capture kernel, then inspected once the system reboots into the normal kernel. However, saving a core dump to disk may not be desirable for many reasons: security/compliance requirements, inadequate disk space, excessive downtime, etc. One potential alternative is to use drgn directly in the kdump capture kernel to run a script that...

    Go to contribution page
  8. Omar Sandoval

    Kernel core dumps are usually saved to disk by the kdump capture kernel, then inspected once the system reboots into the normal kernel. However, saving a core dump to disk may not be desirable for many reasons: security/compliance requirements, inadequate disk space, excessive downtime, etc. One potential alternative is to use drgn directly in the kdump capture kernel to run a script that...

    Go to contribution page
Building timetable...