Technical White Paper NVMe Surprise Removal on Dell EMC PowerEdge servers running Linux operating systems Abstract This white paper describes the support for Non-Volatile Memory Express (NVMe) Surprise Removal on Dell EMC PowerEdge servers running supported Enterprise Linux operating systems.
Revisions Revisions Date Description October 2020 Initial release December 2020 Document updated with NVMe surprise removal information for Ubuntu LTS 20.04.01 Server March 2021 Document updated with NVMe surprise removal information for Red Hat Enterprise Linux 8.2 Acknowledgements Author: Narendra K Support: Austin Bolen, Gurupreet Kaushik, Sherry Keller The information in this publication is provided “as is.” Dell Inc.
Table of contents Table of contents Revisions.............................................................................................................................................................................2 Acknowledgements .............................................................................................................................................................2 Table of contents .................................................................................................
Executive summary Executive summary NVMe devices are being used more widely, and features such as surprise removal are important to the continuous availability of the server and serviceability needs. Surprise removal allows you to remove a device from the server without prior notification. This white paper outlines the best practices that are to be followed for the surprise removal of NVMe devices running supported Linux operating systems on supported Dell EMC PowerEdge servers.
Introduction 1 Introduction As NVMe devices are being used more widely, they must provide enterprise functionality such as surprise removal that you rely on. Surprise removal enhances the serviceability of NVMe devices by eliminating additional steps required to prepare the devices for orderly removal and ensures availability of servers by eliminating server downtime. 1.
Surprise removal of NVMe devices 2 Surprise removal of NVMe devices 2.1 Supported and unsupported scenarios for surprise removal of NVMe devices The following table describes the supported and unsupported scenarios while performing surprise removal of NVMe devices. Supported and unsupported scenarios for surprise removal of NVMe devices Supported scenarios Unsupported scenarios Surprise removal of a single NVMe device at a time is supported.
Surprise removal of NVMe devices Determining the PCIe slot number of the /dev/nvme0n1 4. To verify that the operating system successfully unregisters the device: a. Use the command nvme list to list the connected devices and verify that the /dev/nvme0n1 is not listed. b. Use the command lspci to verify PCIe device 0000:3d:00.0 is not listed. c. Use the command lsblk to verify that the /dev/nvme0n1is not listed.
Surprise removal of NVMe devices *Note: The minimum kernel version required for surprise removal is version kernel-4.18.0193.13.2.el8_2.x86_64.
Known issues with NVMe surprise removal 3 Known issues with NVMe surprise removal The following section describes the known issues encountered when surprise removal is performed on servers running supported Linux operating systems. 3.1 SUSE Linux Enterprise Server Service Pack 2 3.1.
Known issues with NVMe surprise removal 3.1.4 /proc/mdstat and mdadm -D commands display incorrect statuses when two NVMe devices are surprise removed from a RAID 5 MD array Description: When two of three NVMe devices are surprise removed from a RAID 5 MD array, the command cat/proc/mdstat displays the array status incorrectly as active. Similarly, when the status of the MD RAID is queried using the mdadm -D /dev/mdN command, the number of active and working devices displayed is two.
Known issues with NVMe surprise removal 3.2.3 /proc/mdstat and mdadm -D commands display incorrect statuses when two NVMe devices are surprise removed from a RAID 5 MD array Description: When two of three NVMe devices are surprise removed from a RAID 5 MD array, the command cat/proc/mdstat displays the array status incorrectly as active. Similarly, when the status of the MD RAID is queried using the mdadm -D /dev/mdN command, the number of active and working devices displayed is two.
Known issues with NVMe surprise removal 3.3.3 Status of the RAID 0 logical volume is displayed as Available when one of the members of the RAID array is surprise removed Description: When Logical Volume Manager (LVM) is used to create a RAID 0 array and a member of the RAID array is surprise removed, the lvdisplay command shows the logical volume (LV) status as ‘Available’. Solution: Use the command lvs -o +lv_health_status to check the status of the RAID array.
Summary 4 Summary This white paper describes the concept of NVMe surprise removal and provides guidance on how to perform surprise removal on supported enterprise Linux operating systems on supported Dell EMC PowerEdge servers. The step-by-step instructions for performing NVMe surprise removal are documented with guidelines to be followed for successful surprise removal of NVMe devices.
References 5 References • • • • • • 14 Dell Express Flash NVMe PCIe SSD User’s Guide SUSE Linux Enterprise Server Certification Matrix for Dell EMC PowerEdge Servers Dell EMC PowerEdge Systems Running SUSE Linux Enterprise Server 15 Release Notes Ubuntu Server 20.