Linux Plumbers Conference 2023

Name: Linux Plumbers Conference 2023
Start: 2023-11-13T09:00:00-05:00
End: 2023-11-15T23:30:00-05:00
Location: No location set

13–15 Nov 2023

America/New_York timezone

2023

contact@linuxplumbersconf.org

Kernel Livedump

14 Nov 2023, 12:30

30m

"James River Salon B" (Omni Richmond Hotel)

"James River Salon B"

Omni Richmond Hotel

Linux Kernel Debugging MC Linux Kernel Debugging MC

Lukáš Hruška

Linux Kernel currently has a mechanism to create a dump of a whole memory for
further debugging of an observed issue with the help of crashkernel.
Unfortunately, we are unable to do this without restarting the host which causes
a problem in case of having a high availability service running on the system
experiencing some complex issue that cannot be debugged without the complete
memory dump and hypervisor-assisted dumps are not an option on bare metal
setups. For this purpose, there is a live dump mechanism being developed which
was initially introduced by Yoshida Maasanori [1] in 2012. This PoC was already
able to create a consistent image of memory with the support of dumping the data
into a reserved raw block device.

The PoC remained idle and as the ever-growing Linux community introduces dozens
or even hundreds of new features every release, that work obsoleted, especially
due to MM changes. I've spent time adapting the patchset to make it work again
on Linux v6.4 and I've added a few more features (like vmcore formatting).

In order to put forward the patchset into the upstream again, there is a lot of
research and work ahead because of a few tradeoffs that must be better
described and understood. Similar to the crashkernel method, which necessitated
preallocated space specific to each running instance and was resolved through
approximations, there is also reserved preallocated memory. If this memory is
not sufficiently large, it may, in certain cases, compromise the consistency of
the dumped state. To maintain consistency, one option is to wait within the
page fault in kernel memory, but this approach could potentially introduce
failures in the original kernel due to synchronization or deadlines in
different parts of the kernel.

At LPC I would like to gather as much feedback as possible on my current
approach with a discussion about other possible usecases.

[1] https://lore.kernel.org/all/20121011055356.6719.46214.stgit@t3500.sdl.hitachi.co.jp/

Lukáš Hruška

Plumbers2023_Kernel_Livedump.pdf

Video

Linux Plumbers Conference 2023

2023

Kernel Livedump

"James River Salon B"

Omni Richmond Hotel

Speaker

Description

Primary author

Presentation materials

Diamond Sponsors

Platinum Sponsor

Gold Sponsors

Silver Sponsors

Catchbox Sponsor

Livestream Sponsors

T-Shirt Sponsor

Conference Services Provided by