diff --git a/DC-SLES-kdump b/DC-SLES-kdump new file mode 100644 index 000000000..f64295e9d --- /dev/null +++ b/DC-SLES-kdump @@ -0,0 +1,16 @@ +# This file originates from the project https://github.com/openSUSE/doc-kit +# This file can be edited downstream. + +MAIN="kdump.asm.xml" +# Point to the ID of the of your assembly +#ROOTID="article-example" +SRC_DIR="articles" +IMG_SRC_DIR="images" + +PROFOS="sles" +PROFCONDITION="suse-product" +#PROFCONDITION="suse-product;beta" +#PROFCONDITION="community-project" + +STYLEROOT="/usr/share/xml/docbook/stylesheet/suse2022-ns" +FALLBACK_STYLEROOT="/usr/share/xml/docbook/stylesheet/suse-ns" \ No newline at end of file diff --git a/articles/kdump.asm.xml b/articles/kdump.asm.xml new file mode 100644 index 000000000..3f3476246 --- /dev/null +++ b/articles/kdump.asm.xml @@ -0,0 +1,132 @@ + + + + %entities; +]> + + + + + + + + + + + + + + Introduction to &kdump; + + 2026-01-12 + + + Initial version + + + + + + + + + + Smart Docs + + + + Administration + Configuration + Security + + + + + + https://bugzilla.suse.com/enter_bug.cgi + Documentation + SUSE Linux Enterprise Server 16.0 + amrita.sakthivel@suse.com + + yes + + + + + &x86-64; + &power; + &zseries; + &aarch64; + + + + + &productname; + + + + Introduction to kdump + Learn how to configure kdump in case your system crashes. kdump is a + kernel crash dumping mechanism. When a system encounters a fatal error, kdump allows the system to save the contents of its memory to a file for expert analysis. + + + Use kdump to capture data on system crashes + + + + WHAT? + + + Configure &kdump; in case your system crashes. Its primary purpose is to capture a snapshot of the system's memory (a vmcore file) at the exact moment a kernel crashes (kernel panic). + + + + + WHY? + + +Correctly setting up kdump and obtaining the memory dump may help SUSE support or kernel developers to debug a potential kernel crash. + + + + + EFFORT + + +The average reading time of this article is approximately 40 minutes. + + + + + REQUIREMENTS + + + + +Linux fundamentals: Understanding basic Linux commands, file permissions, directory structures +and use of the command line. + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/concepts/about-kdump.xml b/concepts/about-kdump.xml new file mode 100644 index 000000000..58d3e8cbc --- /dev/null +++ b/concepts/about-kdump.xml @@ -0,0 +1,82 @@ + + + %entities; +]> + + + + + + + About &kdump; + + + +&kdump; is a kernel crash dumping mechanism that captures the system’s memory state into a vmcore file when +system crash occurs. A vmcore file is a snapshot of your computer's system memory (RAM) taken at the exact moment the Linux kernel crashed. + + + +
+ Why is &kdump; important? + The primary importance of &kdump; lies in its ability to capture a snapshot of a system's memory at the exact moment of a critical failure. + When a Linux kernel experiences a fatal error; syslog or journald usually fail along with it often leaving no record of what went wrong. &kdump; bypasses this limitation by using &kexec; to boot a secondary capture kernel in a reserved slice of RAM. + The unstable crashed system is replaced with a freshly started and stable capture kernel. Without this tool, administrators are often left with nothing but a blank screen or a frozen console, making it nearly impossible to diagnose the root cause of intermittent or silent system crashes. +
+
+ Understanding the dual-kernel model + Kdump uses a second isolated kernel referred to as the capture or crash kernel to handle.When the main system kernel fails, you can not trust it to write its own crash logs to disk—because the kernel memory might be corrupted and the kernel itself is no longer reliable. The dual-kernel approach solves this by jumping into a completely different environment. + The model relies on two distinct kernels residing in memory simultaneously: + + The production (primary) kernel: is the kernel you use every day. It runs your applications and services. + The capture (crash) kernel: is a second copy of the kernel loaded in a small reserved area of RAM. It only starts when the primary kernel panics. Alongside the crash kernel, a special-purpose initramfs image is loaded to the reserved RAM. +It is built by the Kdump tool and includes all the drivers, settings, and programs to store the vmcore file. + + +
+
+ About the vmcore file + +A vmcore file is a snapshot of your system's physical memory (RAM) taken at the exact moment the Linux kernel crashed. +When kdump is set up on a system, the kdump service loads the capture kernel and initramfs on system boot, using the kexec program, into a pre-reserved area of RAM - the crash kernel area. +If at some point the system crashes, the capture kernel is started. Its memory is restricted to the pre-reserved crash kernel area, so none of the memory used by the crashed production kernel is overwritten. +The job of the capture kernel and initramfs is to save the contents of the production kernel's memory into a vmcore file. + + The vmcore file is a snapshot of the RAM and includes: + +The Kernel state:All active kernel data structures, global variables and the call stack, which is what the CPU was doing when it died. +Process information:A list of every process that was running, including their individual stacks and registers. +Memory pages:Depending on your settings, it can contain the actual data held in RAM by applications. +VMCOREINFO:special section that tells analysis tools how the kernel's memory was laid out so they can make sense of the raw data. + +
+
+ What is &kexec;? + &kexec; is a system call that functions as a software-defined boot loader, allowing a running kernel to bypass the hardware BIOS/UEFI stage and directly hand over control to a new kernel. + By loading the secondary kernel's image and parameters into memory while the system is still active, &kexec; performs a warm boot that preserves the state of RAM and significantly reduces downtime. + This mechanism is the backbone of the &kdump; dual-kernel model, as it provides a reliable way to jump from a crashing production environment into a clean recovery environment for data capture. + The most important component of &kexec; is the kexec command. You can load a kernel with &kexec; in two ways: + +Load the kernel to the address space of a production kernel for a regular reboot: +&prompt.sudo; kexec -l KERNEL_IMAGE +You can later boot to this kernel with the command kexec -e. +Instead of using &kexec;, you can directly use a program called kexec-bootloader. It finds the default kernel, +initrd and command line options from the boot loader configuration and passes everything to &kexec; to load the default kernel properly. + + + +Load the kernel to a reserved area of memory: + &prompt.sudo; kexec -p KERNEL_IMAGE + This kernel is booted automatically when the system crashes.This is what kdump uses to load the capture kernel. + + +
+
\ No newline at end of file diff --git a/glues/more-info-kdump.xml b/glues/more-info-kdump.xml new file mode 100644 index 000000000..7628e5def --- /dev/null +++ b/glues/more-info-kdump.xml @@ -0,0 +1,51 @@ + + + + + %entities; +]> + + + For more information + + + + + + + + For information on &kdump;, refer to the following resources: + + + + + Man pages: + + + man 7 kdump + man 5 kdump + + + + + Official Linux kernel documentation: + + + + + + Man page for &kexec;: + + + + + diff --git a/tasks/configure-kdump.xml b/tasks/configure-kdump.xml new file mode 100644 index 000000000..129604747 --- /dev/null +++ b/tasks/configure-kdump.xml @@ -0,0 +1,90 @@ + + + %entities; +]> + + + + + + + + Installing and configuring &kdump; + +You can install &kdump; + &prompt.sudo; zypper install kdump + This command downloads the following packages: + +kdump +kexec-tools +makedumpfile + + + To boot another kernel and preserve the data of the production kernel when the system crashes, you need to reserve a dedicated area of the system memory. + The production kernel never loads to this area because it must be always available. It is used for the capture kernel so that the memory pages of the production kernel can be preserved. + + + + To use &kexec; with a capture kernel and to use &kdump; in any way, RAM needs to be allocated for the capture kernel. + To configure the reserved memory for the capture kernel, you must modify the crashkernel= parameter within the GRUB configuration file. + This value defines the specific block of RAM sequestered for the secondary kernel and its optimal size is typically determined by the total physical memory available in the system. + + + Calculating the allocation size + Find the base value for your system, run: +&prompt.sudo; kdumptool calibrate + Total: 49074 + Low: 72 + High: 180 + MinLow: 72 + MaxLow: 3085 + MinHigh: 0 + MaxHigh: 45824 + +Total:Your total system RAM. +Low:The minimum memory required in the low memory zone (first 4GB) for the kernel to boot. +High:The recommended amount for the high memory zone. This covers the actual work of saving the crash dump. +MinLow/MaxLow:The safe range for the low reservation. You are currently at the absolute minimum. +MinHigh/MaxHigh: +MaxHigh:he range available for high reservation. + +All values are in megabytes. Note the Low value. + +Based on your system architecture, adapt the Low or High value from the previous step for the number of LUN kernel paths (paths to storage devices) attached to the system. + A sensible value in megabytes can be calculated using this formula: +&prompt.sudo; SIZE_LOW = RECOMMENDATION + (LUNs / 2) +&prompt.sudo; SIZE_HIGH = RECOMMENDATION + (LUNs / 2) + +SIZE_LOW/SIZE_HIGH:The resulting value for Low/High. +RECOMMENDATION:The value recommended by the commandkdumptool calibratefor Low/High. +LUNs:The maximum number of LUN kernel paths that you expect to ever create on your system. + Exclude multipath devices from this number, as these are ignored. To get the current number of LUNs available on your system, run: +cat /proc/scsi/scsi | grep Lun | wc -l + + + +Set the values in the correct location. Append the following kernel option to your boot loader configuration: +crashkernel=SIZE_HIGH,high crashkernel= SIZE_LOW,low +crashkernel= SIZE_LOW + +The changes won't take effect until the boot loader is rebuilt and the system is restarted to reserve the memory. +&prompt.sudo; grub2-mkconfig -o /boot/grub2/grub.cfg + +After restarting, confirm that the primary kernel has successfully allocated the memory for the secondary capture kernel. +cat /sys/kernel/kexec_crash_size + + +Ensure the &kdump; service is ready to catch a crash: +&prompt.sudo; systemctl enable --now kdump +&prompt.sudo; kdumpctl status + + + \ No newline at end of file diff --git a/tasks/troubleshoot-kdump.xml b/tasks/troubleshoot-kdump.xml new file mode 100644 index 000000000..99b8b6d42 --- /dev/null +++ b/tasks/troubleshoot-kdump.xml @@ -0,0 +1,104 @@ + + + %entities; +]> + + + + + + + + Common troubleshooting &kdump; issues + + +Testing and troubleshooting &kdump; is a critical process to ensure that your system can successfully capture a vmcore file during a kernel crash, which is often the only way to diagnose system crashes. +When troubleshooting &kdump;, the process usually fails at one of three stages: during boot (memory reservation), during the crash when the the dump does not start or during the save process when the file is not written. + + + +
+ Testing &kdump; + It is advisable to test &kdump; after configuring it by simulating a kernel crash. +Otherwise you may only find out that it does not work when an actual kernel crash occurs,leaving you with no possibility to debug the crash. +Ensure no critical workloads are running and no unsaved data is present on the system. Additionally, ensure to +sync and unmount file systems: +echo s > /proc/sysrq-trigger +echo u > /proc/sysrq-trigger +Then you can simulate a kernel crash: +echo c > /proc/sysrq-trigger +Verify by checking if there is a new directory created under your KDUMP_SAVEDIR which is /var/crash by default. This contains the dmesg and +vmcore of the crashed kernel. + +
+
+ Troubleshooting &kdump; + One of the most common reasons &kdump; fails is that the amount of crash kernel + memory reserved is insufficient. Different system configurations may require + more memory than estimated by kdumptool calibrate and set up automatically in + the boot loader config by the kdump-commandline.service. + During &kdump;, if you see error messages mentioning low memory and invoking the +Out of Memory (OOM) killer, this is the likely cause. In case, you don't see such messages, trying with increased crash kernel reservation is a good +first step. +The recommended ways to rectify this are: + + Find the size of the current automatically configured crash kernel reservation: +&prompt.sudo; cat /proc/cmdline +This should contain one or two parameters in the form of: + +crashkernel=X M +crashkernel=Y M,low +crashkernel=Z M,high + +Add up the values of X,Y and Z which equates to the size of the current reservation in MiB (current). + + +Manually set the reservation by editing /etc/sysconfig/kdump and changing the value of +KDUMP_CRASHKERNEL +&prompt.sudo; KDUMP_CRASHKERNEL="crashkernel=<2 * current>M" +Then restart and reboot: +&prompt.sudo; systemctl restart kdump +&prompt.sudo; reboot + +Repeat a few times until &kdump; works. +Fine-tuning crash kernel memory involves finding the smallest RAM reservation that successfully captures a crash dump without triggering "Out of Memory" errors in the capture kernel. + Fine tune with the last working and non-working value. + +Other troubleshooting issues include: + + +Issues with switching to a text virtual terminal + + + + makedumpfile or network config errors + + + + Dracut errors + + + + kdump initramfs does not generate correctly + + + +Once you get some output, you can increase &kdump; verbosity. Setting + KDUMP_VERBOSE to 11 turns on debugging output during all stages of the &kdump; + process, it: + + Removes the quiet option from the capture kernel command line. + Runs the kdump-save script with -x. + runs makedumpfile with debugging. + + +
+
\ No newline at end of file