Pegasus R4 multiple drive failures

  • 61 Views
  • Last Post 15 April 2020
Martin Godleman posted this 14 April 2020

I have a Promise Pegasus R4 with four 4Tb drives which I purchased from a friend five years ago in 2015 -

I am running OS X Yosemite on my iMac (27-inch, Late 2013)... when I first set my Pegasus R4 up one of the drives began to fail almost immediately but I could still mount the drives and I managed to replace it.

The four drives have worked perfectly since until this morning (14th April 2020) when I booted up and TWO drives (1 & 4) failed. I can't mount the drives at all so I now have no access to my backup. There is a little over 12Tb of material on there which I can't get to - as you'll imagine I'd like to save some of the material there - some is vital.

Promise Utiility is describing drives one and four as 'dead' - can I recover anything?

Order By: Standard | Latest | Votes
R P posted this 14 April 2020

Hi Martin,

Can you post the event logs? To determine the correct recovery procedure it is necessary to know the order of events.

This is best done from the CLI, open a terminal and type 'promiseutil', then enter 'event' and 'phydrv' and copy and paste the results.

Unless both drives have indeed totally failed, recovery seems likely.

Martin Godleman posted this 14 April 2020

Hi RP

 

here are the results you requested - I am running RAID5 so... what if I took the two 'dead' drives out of their respective slots and put a new one in the first slot? Would I be able to recover anything?

 

Thanks for your help - much appreciated...

 

Martin G

 

 

 

Last login: Tue Apr 14 13:18:09 on console

Martins-iMac:~ martingodleman$ promiseutil

-------------------------------------------------------------

Promise Utility

Version: 4.00.0000.08 Build Date: Apr 10, 2017

-------------------------------------------------------------

 

List available RAID HBAs and Subsystems

===============================================================================

Type  #    Model         Alias                         WWN                 Seq 

===============================================================================

hba   1  * Pegasus R4                                  2000-0001-5553-3e10  1  

 

Totally 1 HBA(s) and 0 Subsystem(s)

 

-------------------------------------------------------------

The row with '*' sign refers the current working HBA/Subsystem path

To change the current HBA/Subsystem path, you may use the following command:

  

  spath -a chgpath -t hba|subsys -p <path #>.

 

Type help or ? to display all the available commands

-------------------------------------------------------------

 

cliib> event

===============================================================================

Seq   Device          Severity TimeStamp             Description               

===============================================================================

0     Ctrl 1          Info     Apr 14, 2020 17:07:00 The system is started     

1     SEP 1           Info     Apr 14, 2020 17:07:00 SEP is found              

2     LD 0            Major    Apr 14, 2020 17:07:00 Logical drive has been    

                                                     placed offline. Possible

                                                     Data Loss

 

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

1    ST4000DM000- SATA HDD  4TB       Encl1 Slot1   Dead      Array0 No.0      

2    ST4000DM000- SATA HDD  4TB       Encl1 Slot2   OK        Array0 No.1      

3    ST4000DM000- SATA HDD  4TB       Encl1 Slot3   OK        Array0 No.2      

4    ST4000DM004- SATA HDD  4TB       Encl1 Slot4   Dead      Array0 No.3      

 

 

cliib> 

 

R P posted this 14 April 2020

Hi Martin,

OK, it looks like the drives did not go offline at a different times. But still the safest thing is to force one drive online and start a rebuild on the other drive. From the CLI again you can force PD0 online with

phydrv -a online -p 1

This should being the array online but degraded.

After forcing the drive online, lease send me the output of the array command.

array

Martin Godleman posted this 14 April 2020

Hi -

This is what I now have - let me know your thoughts.

Martin

 

Last login: Tue Apr 14 18:24:35 on ttys000

Martins-iMac:~ martingodleman$ promiseutil

-------------------------------------------------------------

Promise Utility

Version: 4.00.0000.08 Build Date: Apr 10, 2017

-------------------------------------------------------------

 

List available RAID HBAs and Subsystems

===============================================================================

Type  #    Model         Alias                         WWN                 Seq 

===============================================================================

hba   1  * Pegasus R4                                  2000-0001-5553-3e10  1  

 

Totally 1 HBA(s) and 0 Subsystem(s)

 

-------------------------------------------------------------

The row with '*' sign refers the current working HBA/Subsystem path

To change the current HBA/Subsystem path, you may use the following command:

  

  spath -a chgpath -t hba|subsys -p <path #>.

 

Type help or ? to display all the available commands

-------------------------------------------------------------

 

cliib> phydrv -a online -p 1

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              Degraded      16TB        0Byte          0Byte           

 

 

cliib> 

 

R P posted this 14 April 2020

Hi Martin,

I think sometime in the recent past we had an issue like this. The Pegasus won't allow you to start a rebuild to a dead drive, so you have to remove the DDF from PD4 before you can start a rebuild.

** Just to be safe, can you send me the output of the

array -v

command before following the procedure below. I should have asked for the verbose output earlier.

----

This thread might be of interest.

https://forum.promise.com/thread/array-recovery/

I'm just going to copy and past the part you need.

----

Be very careful with the next steps, we need to make PD4 unconfigured, and we will do this by removing all drives except PD4 and deleting the array information on PD4 only.

First, power the Pegasus down (easiest way is to disconnect the thunderbolt cable), wait 30 seconds for the drives to spin down, then unseat all the drives except PD4, reboot and verify that only PD4 is seen, then force it online and delete the array.

phydrv -a online -p 4
array -a del -d 0

This won't delete your array, only the stale array data on PD4, the other (unseated) drives contain the actual array. That's why you need to verify with the 'phydrv' command that only PD4 is seen.

Once PD4 has been cleaned of array information, 'phydrv' should show it unconfigured.

Then power off the Pegasus again, give it 30 seconds or so to let the drives spin down, and reseat all the other drives and unseat PD4. Power the Pegasus back on and let it boot, verify that all the drives except PD4 are seen and that the array and LD are present and the LD is online, then reseat PD4. If PD4 is unconfigured, an automatic rebuild will start. You might want to change settings so that the MAC does not sleep or power off until the rebuild completes.

Martin Godleman posted this 14 April 2020

Hi RP

This is the output of the array -v command

Let me know if it's okay to proceed once you've checked -

Best wishes,

Martin G

 

Last login: Tue Apr 14 18:26:26 on ttys000

Martins-iMac:~ martingodleman$ promiseutil

-------------------------------------------------------------

Promise Utility

Version: 4.00.0000.08 Build Date: Apr 10, 2017

-------------------------------------------------------------

 

List available RAID HBAs and Subsystems

===============================================================================

Type  #    Model         Alias                         WWN                 Seq 

===============================================================================

hba   1  * Pegasus R4                                  2000-0001-5553-3e10  1  

 

Totally 1 HBA(s) and 0 Subsystem(s)

 

-------------------------------------------------------------

The row with '*' sign refers the current working HBA/Subsystem path

To change the current HBA/Subsystem path, you may use the following command:

  

  spath -a chgpath -t hba|subsys -p <path #>.

 

Type help or ? to display all the available commands

-------------------------------------------------------------

 

cliib> array -v

 

-------------------------------------------------------------------------------

DaId: 0

OperationalStatus: Degraded

Alias: 

PhysicalCapacity: 16TB                 ConfigurableCapacity: 16TB

FreeCapacity: 0Byte                    MaxContiguousCapacity: 0Byte

AvailableRAIDLevels: 0 5 6 10 1E

PDM: Enabled                           MediaPatrol: Enabled

NumberOfPhysicalDrives: 4              NumberOfLogicalDrives: 1

NumberOfDedicatedSpares: 0

UserSetPowerSavingLevel: 1             CurrentPowerSavingLevel: 0

PowerManagement: Enabled

 

Physical Drives in the Array:  

===============================================================================

SeqNo PdId CfgCapacity FreeCapacity OpStatus                                   

===============================================================================

0     1    4TB         278.53KB     OK, Forced Online                          

1     2    4TB         278.53KB     OK                                         

2     3    4TB         278.53KB     OK                                         

3     4    4TB         278.53KB     Dead                                       

 

Logical Drives in the Array:  

===============================================================================

LdId Alias          RAIDLevel Capacity  OpStatus                               

===============================================================================

0                   RAID5     12TB      Critical                               

 

Available Spares to the Array:  

===============================================================================

Id  OpStatus  PdId CfgCapacity Revertible Type      DedicatedToArray          

===============================================================================

 

No spare drive available in the array

R P posted this 14 April 2020

Hi Martin,

OK, the array is online but critical, as expected, you can follow the procedure to clear the DDF from PD4.

Martin Godleman posted this 14 April 2020

Thank you so much.

Will let you know how it goes...

MG

Martin Godleman posted this 15 April 2020

Hi R P

Have followed the instructions - the drive has been rebuilding now for twelve hours - I can see and access all my files on the other drives but the fourth drive is still rebuilding - does this sound about right? I have no idea how long it might take to rebuild...

Thanks again for all your help

Martin G

Martin Godleman posted this 15 April 2020

I am also in touch with Promise technical support who I wrote to at the same time of first writing on the forum.

I sent them a subsysteminfo file generated from promise utility and they said they'd investigate...

I told them you had made the recommendations and I was now rebuilding the fourth drive.

They have come back to me with this:

Dear Martin Godleman

Thank you for contacting Promise Technical Support.
There is a response to your tech support case # 20200414070545S

Current Response : Hi Martin, Yes the drives are marked as dead due to removal, they do not have any errors, let us know once the rebuild is complete. Regards, Gautham

To view details, please go to https://support.promise.com and login.

Thank you!

Martin Godleman posted this 15 April 2020

Hi R P

The drive is now rebuilt and everything appears to be working perfectly (according to promise utility).

Is there a need for a diagnostics test that will confirm that?

Thank you so much for your help - and in the middle of a pandemic too!

Best wishes,

Martin G

Close