NS4600 stuck in offline state

  • 187 Views
  • Last Post 28 January 2019
Мидхат Ижбулатов posted this 14 January 2019

After a few power failures my NS4600 stopped working (i was able to ping it over the network, but file access didn't work, neither did its web interface). I followed the advice given in https://forum.promise.com/thread/ns4600-cannot-boot/ by switching it off, taking out the HDDs, then resetting it to factory defaults and putting the drives back in. But i was unable to order the RAID manager to recover the array. Instead of that it now sees two HDDs (bays 3 and 4) as parts of a 4-disk RAID10 array, and two other HDDS (bays 1 and 2) as "Free Disks".

The status of the RAID10 volume looks like this:

RAID Status Off line

Action Status Rebuilding

Background Activity Running

And the progress indicator is stuck at 1%. The only other active tabs in RAID management are "Create" and "Delete", and i want to do neither.

Is it possible to correctly recover from this state (assuming that the data on the disks is still intact - i have no way to verify that at the moment) with just NS4600 itself?

If not, would it be possible to take out the disks, put them somewhere else (would a normal PC suffice?) and recover the data (and then possibly wiping the disks and re-creating the array)?

Order By: Standard | Latest | Votes
PROMISE Technology Inc. posted this 14 January 2019

Hi,

First, a general statement, rebuild can be very slow. I have an NS4300 that's rebuilding and in 4 days it's only reached 50%.

Second, the status is inconsistant, if the array is offline a rebuild cannot start. One possibility would be that a rebuild started after one power outage and that status still shows, but since then another disk has gone unconfigured.

I'd say that this is a very bad state and I can only think of one hope. The drives are probably not bad, but the DDF has been lost on two of  them. The only hope is that if you manually create a new array+LUN exactly the same as it was before, then you will be able to access the data. In order to create a new arral+LUN, you will have to delete the offline LUN.

This is a dangerous move, if you do not re-create the array+LUN exactly the same, the synchronization will overwrite the disks.

Before you do that, you might try hot reseating the unfigured drives. Unseat HDD1 and HDD2, then plug one drive back in and wait to see that it's recognized in the GUI. Then try the next. If one drive's DDF can be read (and you have any rebundant array), then the LUN should come back online, but it will be critical and you should start a rebuild on the remaining drive. If both drives can be read this way than you are back in business.

Мидхат Ижбулатов posted this 21 January 2019

Nope, that didn't work. Drives 1 and 2 are still recognized only as "free".

 

Now, about re-creating the array. Should i use the "Delete" function first? The GUI does say that deleting will erase all the data (or does it mean that the data will be "lost" implicitly and not actually deleted?). If i do go through with that, which parameters should i write down to ensure that everything is re-created the same way it was?

PROMISE Technology Inc. posted this 21 January 2019

Hi,

RAID arrays are a lot like files in a way, when you delete a file, no operating system actually deletes the data on disk, it just delets the pointer to the file in the directory structure. This is why deleted files can often be recovered (the data blocks are marked unused and can be overwritten by other files). When you delete an array+LUN, nothing is done to the data on disk, the DDF that records the array and LUN structure is deleted. If you recreate the array+LUN the same as it was before, it's like undeleting a file. It recreates the DDF that was there previously, at which point the RAID engine can read the data that's still on disk.

In order to recreate the array you will have to delete the existing array+LUN. Since you have an incomplete array to start with, please look and make sure you know exactly how the array and LUN was configured. From above your configuration is a 4-disk RAID 10. Make sure that you note the stripe size and any other tunable parameters, the new LUN will have to match exactly. I'd take screenshots of the RAID details as a safety net.

Then delete the existing array and create a new array + LUN exactly as it was originally.

If your recreated array+LUN is the same as it was previously, you will then see your data, but it will have to sync again.

If the recreated array+LUN is not the same, then that's very bad. Promise subsystems have a 5 minute wait for a sync to start after creating a LUN, but I'm not sure is the Pegasus also does. But if it does than you have 5 minutes to check if your data is visible, and if it is not visible then the array+LUN you created is not the same, you should delete the array+LUN immediately before a sync starts and refer to your screenshots to make sure the LUN configuration is correct.

Мидхат Ижбулатов posted this 28 January 2019

Well, that didn't go as well as i had hoped.

First, there are no stripe size or any other tunable parameters in the web interface, so there was practically nothing to write down and/or adjust.

Second, after i deleted and re-created the array, it immediately got a "Format Status: Formatting..." in the info screen, which seriously freaked me out.

Third, the web interface does not allow me to delete the newly-created array. My guess is that i have to wait until it finishes formatting the disks.

Fourth, other administration pages (the ones that control filesystem access, etc) are still incative (the web interface claims that the device is still busy - probably because it's formatting the disks) - i just couldn't set it up to provide me access to the files (which might not even be there anymore...).

I quickly turned the device off, but i have no idea how to recover the data now, and whether it's even recoverable.

Are you sure that the solution you suggested even applies to NS4600?

Close