In this article, we'll demonstrate practically how to replace NVMe drives of a dynamic array on the fly.
We have a dynamic mirrored array on Windows Server 2016, constructed with two identical NVMe drives. The capacity of the array is about 3 TB and is used for QuickBooks. Naturally, the client began to run out of space. Two new 6 TB NVMe drives were purchased. We need to transfer the dynamic array to these new drives.
The issue is that the server on which everything operates can only accommodate two NVMe drives; all the slots are taken. We will have to replace the drives in the array sequentially and then expand it.
Another challenge is that the server cannot be shut down, nor can QuickBooks be halted. We managed to negotiate a possible brief service interruption, not exceeding 5 minutes.
In Disk Management, you can view our array.
Drive D: is located on two physical NVMe drives.
The array's status is Healthy.
Jumping ahead, we managed to replace the drives on two such servers without stopping services. However, there's a catch, which will be explained later. The chances of doing this without stopping services are 50/50. This is due to an annoying feature of dynamic arrays that I find puzzling, disturbing, and simply harmful.
Drive D: has 49% free space, but this is only because we were able to temporarily transfer some of the databases to a neighboring server.
We're using the following drives: Intel SSD 6.4 TB U.2 - SSDPF2KE064T1.
Preparing to replace the first drive
Before ejecting the drive, you need to disassemble our dynamic array.
Right-click on any of the D: drives, it doesn’t matter which. Select Break Mirrored Volume.
You are warned that after this process, the data on the drives will no longer be identical. Click Yes.
Our dynamic mirrored array ceased to be an array and split into two separate dynamic disks. One of them remained as drive D:, while the other one is now E:.
This situation seems very odd to us. Drive letter assignments occur randomly and do not depend on which drive you right-clicked when disassembling the array. We tried several times to disassemble the mirror on the same array: each time the drive designation was random. This is such an inconvenient oversight that using dynamic mirrored arrays in critical environments becomes practically impossible.
In our case, it doesn't matter which of the drives remained as D:. Currently, it's Disk 1, which the database continues to use, and we won't touch it.
We don't need the E: drive.
We right-click on E: and select "Delete Volume...".
The data on the E: drive will be deleted, but we don’t need it. Yes.
Disk 0 now has an unallocated space; this physical drive will be the first one we change.
Herein lies another difficulty, as we need to determine the serial number of the drive we'll be extracting, specifically Disk 0's serial number. Unfortunately, we couldn't find out the serial number using the OS's standard tools and had to resort to third-party software.
The required drive is labeled as Disk 0, and we note down the serial number A07B8F5A.
Windows Server 2016 supports hot-swapping of drives. In the system tray, we click the appropriate button and command the drive to be ejected. Ensure not to confuse it with the D: drive; although the model is the same, it's labeled differently. Eject.
We are notified that the device can be safely removed.
And here we go again with another bug. The drive remains in Online mode, which isn't good. It's unclear why the drive isn't disconnecting, so manual intervention is required.
Right-click on Disk 0 and switch it to Offline.
Disk 0 is now in Offline status. The drive can now be safely removed.
Replacing the First Drive
Holding the new drive in your hands at the data center, the question arises: which one to pull out?
Both drives appear identical, flashing differently due to various loads, of course, but you don't want to make a mistake. If it were possible to shut down the server, we would simply remove the drives and identify the right one by its serial number. However, shutting down the server is not an option. We need to illuminate the drive.
Almost all servers and storage arrays have a mechanism that allows you to "illuminate a drive". Some have this feature implemented in the web interface for server management (IMM, iLO, IPMI, and other BMCs). Some provide the option to activate the light via a CLI command.
In the latest Supermicro servers, the IPMI management web interface has a dedicated section for managing NVMe drives under Server Health → Storage Monitoring. Under the Physical View tab, you can see a list of available drives and their details: model, manufacturer, serial number, temperature, etc. Additionally, there's an option to perform various operations on these drives. We simply locate the desired drive by its serial number and highlight it.
In our case, there was an issue with the server model; the drive wouldn't illuminate.
We locate the necessary drive by its serial number.
And highlight it, Blink.
The drive started flashing; now we know its location.
In the drop-down list of Available Actions, select Eject. Check the desired disk and click Apply. However, there's a catch: the button is inactive, unclickable, and doesn't work. It's the same issue as with the backlight. We need to address it.
Eject. Yes.
The drive indicator will turn green, and it's safe to remove the drive.
Drive Indicators:
blue solid on — drive is in place
blue blinking — I/O activity
red solid on — failure
red blinking at 1 Hz — rebuilding
red blinking pattern 2+1 at 1 Hz — hot spare
red blinking every 5 seconds — drive power on
red blinking at 4 Hz — identification
green solid on — safe to remove
yellow blinking at 1 Hz — warning, do not remove
Remove the drive and make sure we haven't made a mistake, that the drive's serial number is the one we need, and that the system hasn't crashed.
Disk 0 has disappeared from the system.
Wait for 5 minutes, transfer the drive caddy to the new drive, and insert it.
Ensure that the drive appears in the IPMI web interface. If the drive isn't there and the inserted drive's green light continues to shine, remove the drive and re-insert it a couple of minutes later. I've experienced this on one of the servers.
In the Disk Management snap-in, a new Disk 0 appears with a larger volume.
Right-click on the drive and initialize it.
Since the drive is larger than 2 TB, we choose GPT. OK.
The drive is initialized.
Now, we need to recreate a mirrored dynamic array. Right-click on drive D:, select Add Mirror.
Choose Disk 0. Add Mirror.
This operation will convert Disk 0 to dynamic. Yes.
A RAID 1 mirrored array is being created. However, the data isn't synced yet. The synchronization process starts, and we see the progress percentage. The process takes a while, quite a long time actually. Disk 0 is marked with an exclamation sign since its data doesn't match the primary drive.
After the synchronization is complete, we have a software RAID 1 array with two drives.
The first drive has been replaced, half the job is done.
Preparing to Replace the Second Drive
And once again, we need to break our dynamic array.
Right-click on any of the D: drives; it doesn't matter which one. Choose Break Mirrored Volume.
We're warned that after this process, the data on the drives will no longer be identical. Yes.
Our mirrored dynamic array is no longer an array and has split into two separate dynamic drives; one remains as the D: drive, and the other is now E:. Unfortunately, the D: drive is the one we intended to remove.
Replacing the Second Drive
Now we're back in the data center with the second drive.
In the IPMEI web interface, eject the drive, Eject. We no longer need the serial number since we can't mix up the drives.
Remove the drive from the server. The drive disappears from the system. All services continue to run smoothly.
Switch the brackets to the new drive and insert it into the slot.
The drive shows up in the system.
Initialize it.
GPT. OK.
Both physical drives have been replaced. From here on, it's straightforward:
- Expand the array to cover the entire Disk 0.
- Create a mirrored array, adding Disk 1.
In this example, we demonstrated how to hot-swap two NVMe drives on a server and expand the array without downtime.
If you need a restored server, experts from Newserverlife can help you with the selection, ensure quality, and deliver promptly.
Specialists of our company are ready to help you purchase the server and select the necessary server configuration for any required task.