Drives going down like flies

  • 158 Views
  • Last Post 3 days ago
Alexander Snelling posted this 2 weeks ago

Hi there. I've got terrible problems with a Pegasus 2 R4 2TB x 4 array in RAID 5 (ie 6TB).

It's quite old but never had many problems with it apart from the occasional drive going down (every two years or so).

A week or so ago a drive appeared dead so I orderd a new one and replaced it. Turned the RAID off until the new drive arrived, replaced it and rebuilt (I've done this several times before and very familiar with the process). Rebuild was successful and all drives appeared to be good until this morning when I swapped all the drives into another R4 unit (I use them intechangeably and have been doing this for five years or more). This time, THREE drives appeared with red light and flagged up as dead. After some swapping out and restarting etc, three are now appearing as Staleconfig and the other one is Unconfigured. Terminal says they are online even though they are not so I cannot attempt to force them online to transfer data off.

Have taken each drive into a separate USB bay and run EaseUS recovery software on them all and EaseUS can see most if not all the files that were on the array by just looking at a single drive but I havent tried recovering it - I can't see how it can be recovered from a single drive but it can see this data on a single drive, even the drive that was reported as dead so there is something there and all hope is not lost!

My main concern is to get this data transferred off as there is no real backup (yes I know, but this happened between backups) - but I cannot read the files on the drives even thoughg they are clearly all there.

Is there any hope? It appears the partition map is corrupted or missing and I cannot find a way to recover this.

 

Thank you

Order By: Standard | Latest | Votes
R P posted this 2 weeks ago

Hi Alexander,

If you have the drives inserted in the same slots as before it should be possible to recreate the array/LUN. But if the disks were moved often, then it's possible that the proper sequence is not known.

Do you have any service reports from before this happened to show what was configured previously? Or can you be sure that they are in the proper sequance order now (this means that all the disks are in the slots they where they were when the RAID was created)?

The files are striped across all the disks, so you won't be able to recover them with any normal recovery software.

  • Liked by
  • Alexander Snelling
Alexander Snelling posted this 2 weeks ago

Hi RP. Thanks for your reply. 

Unfortuntalely I don't have this info (unless I have an old service report - I'll look but I doubt this exists). In Promise Utility there appears to an array number on each drive (0,1,2,3) - is that the sequence? I'll insert using that order - what is the next step?

(Failing that, as it's an R4 there are only a manageable number of permutations (24 if my brain is working) but how can I tell if it's the right sequence chosen?)

 

thanks Alex

Alexander Snelling posted this 2 weeks ago

Hi - I've had another look at the drives in Promise Util and also Terminal promiseutil and there appears to be no array present or information available. Is there anything I can do? Surely there must be some sort of flag present to ID which drive is which?

I've seen a couple of threads here where people were told how to revive a staleconfig - is this possible?

I enclose a service report from when this first happens but this alos reports no arrays present.

Attached Files

Alexander Snelling posted this 2 weeks ago

Just uploaded the status of my drives in PU. Seem PU is recognising the array phy drives (3 out of 4) but reporting them as dead, which they are almost ceretainly not. Feel there is a fix here but I don't know what it is.

These drives contain some unrepeatable footage from an as yet unproduced film (I really don't need a lecture on backing up here, I am more than aware of that). 

R P posted this 2 weeks ago

Hi Alexander,

First: This is not guranteed to work. Also, if there is damage to the partition map, it may work but you still may not see the volume on the desktop and will need to repair the volume with diskutil or disk warrior.

----

The image shows the sequence numbers, it's clear that PD3 and PD4 are in the same slots, but PD1 and PD2 have the same drive model number and it's not clear whether they are in the same position or reversed.

Assuming that everything is in the same order, we can proceed...

This will have to be done from the promise CLI.

First, we need to clear the slateconfig status.

cliib> phydrv -a clear -p 1 -t staleconfig

cliib> phydrv -a clear -p 2 -t staleconfig

cliib> phydrv -a clear -p 3 -t staleconfig

PD4 is showing PFA status, we will need to clear that.

cliib> phydrv -a clear -p 4 -t pfa

Now the array can be recreated...

cliib> array -a add -p 1,2,3,4 -l "raid=5,forcesynchronized=yes"

If everything is correct you should see your drive appear on the MAC desktop shortly. Please verify some files, ideally if you have any videos on the drive play one and make sure they are OK.

If everything is not right then the array should be deleted immediatly.

Do not try to repair the drive with diskutil, if the disk order is wrong a repair will do damage to the filesystem. If we have the disk order wrong (don't know about PD1 and PD2) the solution is to delete the array and try again with a different sequence.

Lastly, PD4 was in PFA condition, most likely SMART said it was failing, it may go out again. If you don't have a spare drive it would be a good idea to have one on hand. 2TB drives are pretty inexpensive today.

Alexander Snelling posted this 2 weeks ago

Hi RP

 

Thank you so much for offering some hope here. I've tried the above and get the following (see attached).

Don't want to mess further without knowing what I might be doing.

Thanks

Alex

R P posted this 2 weeks ago

Hi Alexander,

I don't see an attachment.

Alexander Snelling posted this 2 weeks ago

Just to give some more background that might be helpful. Originally this array was made of 4x toshiba drives. One went down a while ago - can't remember when but less than a year ago. I still have the old one (now labelled "wrong.") This still has data on it. The second Toshiba went down a week ago and that was replaced too - I still have that and again it still has data on it. In other words I have 6x drives from the same array until they all disppeared. Seems 3 drives are labelled dead and one unconfigured (as I just cleared it). The other two older drives; one is labelled Unconfigured (I cleared the stale status using PU) and the second one is labelled as stale. The second spare is almost certainly upto date in terms of data as nothing was done after I rebuilt the latest time and I am not convinced there is anything wrong with it.

 

I also have another Pegasus chassis with drive cases, so could use this to test if that is useful.

R P posted this 2 weeks ago

Hi Alexander,

Did you replace the drives after they failed and let a new drive rebuild? Did the rebuilds complete?

The default configuration is RAID5, it will stay online with one drive missing, but not two.

So you'll need at least 3 original drives before a recovery is possible.

And we will have to get the sequence correct.

The directions posted won't work unless all of the drives are valid members of the array and we know the sequence order.

Alexander Snelling posted this 2 weeks ago

Yes I replaced one drive twice. Once a year or so ago and once last week. Rebuilt successfully both times. The most recent time (last week), The rebuild completed successfully (or so I thought) amd then I moved the drives to my other enclosure (I have a Pegasus 1 and 2). At this point on power up, two drives showed a red light (PU flagged them as dead) and shortly after that another one went down. Unfortunately I didn't record the order of the drives at this point. 

I have a spare 2TB drive that I can erase and use as a spare and rebuild using the other three if we can find the correct order. I'm aware this might be time consuming but the alternative is about a months worth of hard file wrangling, so this is way more preferable. 

the issue I have now is that three drives (which I think are OK are showing as dead. Promiseutil cannot fix them by clearing stale or PFA (it refuses) so I'm not sure how to revive them in order to try to reconfigure the array - is there a "clear dead" command? I'm convinced they are not dead. 

thank you

alex

Alexander Snelling posted this 2 weeks ago

Also it has occurred to me I can't remember which chassis originally configured the array - the Pegasus 1 or 2 - if I have the drives in the wrong chassis would that impact a reconfigure?

R P posted this 2 weeks ago

Hi Alexander,

the issue I have now is that three drives (which I think are OK are showing as dead.

The service report you uploaded shows 3 drives as staleconfig and one drive PFA with none dead.

===============================================================================
PdId Model Type Capacity Location OpStatus ConfigStatus
===============================================================================
1 TOSHIBA DT01 SATA HDD 2TB Encl1 Slot1 Stale StaleConfig
2 TOSHIBA DT01 SATA HDD 2TB Encl1 Slot2 Stale StaleConfig
3 ST2000DM008- SATA HDD 2TB Encl1 Slot3 Stale StaleConfig
4 ST2000DM001- SATA HDD 2TB Encl1 Slot4 PFA Unconfigured

Without accurate information about the state of the drives a recovery solution won't be possible.

Alexander Snelling posted this 2 weeks ago

Yes this changed. I didnt do anything apart from swap an old one in and out to check if the older one was showing any different behaviour or status.

 

Just came in this morning and the array numbers had changed from last night (no3 was at the top)

Swapped drives 1 and 4 over and then got the wrong config again. Swapped them back to original placings and now they seem in the right order. (There are two Toshibas from the original stripe and two Seagate replacements.)

 

I'm thinking I could try putting the last failed Toshiba back in so there are three from the original stripe then add a newly formatted drive. Could then try to rebuild from there is I can find the right drive placement (could be a long weekend!) 

Attached Files

Alexander Snelling posted this 2 weeks ago

Problem now is getting the three "dead" drives back online.

Alexander Snelling posted this 2 weeks ago

Tried phydrv -a online -p 2

Now all back online and even saw the array and folder structure momentarily but needed to connect another raid to allow me to get these files off. Now it's not mounting but does appear to allow me to rebuild. Ideally I want to back up my data before rebuild...

 

Attached Files

Alexander Snelling posted this 2 weeks ago

Current status. Volume mounted this morning but I had to power down to connect a drive to offload media. When restarted volume would not mount.

Rebuild does not seem possible now either. I am assuming the array needs to be deleted and recreated but am aware how dangerous this could be. Have started media patrol. Not going to touch anything until I hear back as I sense this is now quite close to a solution.

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

1    ST2000DM001- SATA HDD  2TB       Encl1 Slot1   OK        Unconfigured     

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   Media Pat Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

 

R P posted this 2 weeks ago

Hi Alexander,

If the array is degraded when you boot the Pegasus you will have to accept the array before it will come online.

The CLI command is...

array -a accept -d 0

I would suggest copying the files off and not worry about the rebuild for now.

R P posted this 2 weeks ago

Hi Alexander,

Just came in this morning and the array numbers had changed from last night (no3 was at the top)

This is not possible, the drives cannot move themselves.

Alexander Snelling posted this 2 weeks ago

Hi RP

Just came in this morning and the array numbers had changed from last night (no3 was at the top)

"This is not possible, the drives cannot move themselves."

 

I'm not suggesting the drives moved themselves. I intentionally physically swapped 1 and 4 around in order to get the right order (ie Array 0,1,2,3) as I thought that might be important - I suspect it isn't. Sorry the time difference is making this doubly difficult but I so appreciate what you are doing - as I said in another post, I think (hope) I am nearly there, but dont want to speak too soon. Will focus on getting the media offloaded first using 

array -a accept -d 0

Alexander Snelling posted this 2 weeks ago

Now getting this:

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

1    ST2000DM001- SATA HDD  2TB       Encl1 Slot1   OK        Unconfigured     

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   Media Pat Array0 No.3      

 

cliib> array -a accept -d 0

Accepting this array can result in offline logical drives and lost data

The disk array does not have an incomplete condition to accept

 

R P posted this 2 weeks ago

Hi Alexander,

what are the outputs of 'logdrv' and 'array'?

Alexander Snelling posted this 2 weeks ago

cliib> array -a accept -d 0

Accepting this array can result in offline logical drives and lost data

The disk array does not have an incomplete condition to accept

 

cliib> logdrv

===============================================================================

LdId Alias       OpStatus      Capacity  Stripe RAID    CachePolicy     SYNCed

===============================================================================

0                Critical      6TB       1MB    RAID5   RAhead/WBack    Yes   

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              OK            8TB         0Byte          0Byte           

 

 

R P posted this 2 weeks ago

 Hi Alexander,

The array is in an inconsistent state, the logical drive shows 'critical' but the array status is OK, it should be degraded. Because it's not degraded we can't accept it.

But it should also be visible to the MAC as it the status is not 'offline'.

I'd suggest opening disutuil and if it sees the volume manually mount it.

 

 

Alexander Snelling posted this 2 weeks ago

I'd suggest opening disutuil and if it sees the volume manually mount it.

 

Hi RP

Tried this - not sure exactly what command to use but have tried several; none have worked.

Now running Disk Drill to see if I can recover these files.

Alexander Snelling posted this 1 weeks ago

So monday morning. Nothing over the weekend has worked. I've run Disk Drill Pro for 30 hours to try to recover files on a lost partition but most of them are corrupted or just won't play. Am now trying to recover the entire disk but another 20 hours away from that but I don't have high hopes.

Going back to diskutil - I don't know the correct command to use - have tried the following:

Alexanders-iMac:~ alexandersnelling$ mount force -t /dev/disk3s2

usage: mount [-dfruvw] [-o options] [-t external_type] special mount_point

       mount [-adfruvw] [-t external_type]

       mount [-dfruvw] special | mount_point

Alexanders-iMac:~ alexandersnelling$ diskutil repairDisk /dev/disk3

Repairing the partition map might erase disk3s1, proceed? (y/N) diskutil repairDisk /dev/disk3s2

 

Wondering if I can force mount the partition without repairing it and risking erasing?

Otherwise I will have to risk repairing it. Any help much appreciated.

Alexander Snelling posted this 1 weeks ago

Alexanders-iMac:~ alexandersnelling$  diskutil repairDisk /dev/disk3

Repairing the partition map might erase disk3s1, proceed? (y/N) N

Repair canceled

Alexanders-iMac:~ alexandersnelling$  diskutil repairDisk /dev/disk3s2

A whole disk must be specified

Alexanders-iMac:~ alexandersnelling$  diskutil verifyDisk /dev/disk3

Started partition map verification on disk3

Checking prerequisites

Checking the partition list

Checking the partition map size

Checking for an EFI system partition

Checking the EFI system partition's size

Checking the EFI system partition's file system

Checking the EFI system partition's folder content

Checking all HFS data partition loader spaces

Checking booter partitions

Checking Core Storage Physical Volume partitions

The partition map appears to be OK

Finished partition map verification on disk3

Alexanders-iMac:~ alexandersnelling$ 

R P posted this 1 weeks ago

Hi Alexander,

Please don't use any filesystem repair tools, we need to get the array issue fixed first.

Due to the inconsistent state, we will need to try something.

First, kill the media patrol.

Please unplug the Pegasus thunderbolt cable and make sure the Pegasus is shut down, then unseat all the drives, just pull them out half an inch or so. We want to give the drives a few seconds to spin down before removing them.

Then plug the Pegasus TB cable back in and let the Pegasus boot.

Then hot-plug PD2. Wait till it comes up and shows as OK in phydrv.

Then hot-plug PD3. Wait till it comes up and shows as OK in phydrv.

Then hot-plug PD4. Wait till it comes up and shows as OK in phydrv.

Then run array, logdrv and phydrv and post there output here.

Alexander Snelling posted this 6 days ago

Hi RP

 

Thanks for the reply. I'm currently running Disk Drill on the entire partition to see if that will find the media - it's got 10 hours to go but I'm not confident as the resukts are looking very corrupted. Will try your suggestion when this is done. Thanks Alex

Alexander Snelling posted this 6 days ago

There you go:

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

Lost Physical Drives

===============================================================================

PdId Model              PhyCapacity Location       OpStatus   ConfigStatus     

===============================================================================

1    ATA                0Byte       Unknown        Missing    Array0 No.0      

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              Incomplete    8TB         0Byte          0Byte           

 

There are incomplete disk array(s), please use:

"array -a accept -d <DaId>" to accept the condition.

 

cliib> logdrv

===============================================================================

LdId Alias       OpStatus      Capacity  Stripe RAID    CachePolicy     SYNCed

===============================================================================

0                Offline       6TB       1MB    RAID5   RAhead/WBack    No    

 

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

Lost Physical Drives

===============================================================================

PdId Model              PhyCapacity Location       OpStatus   ConfigStatus     

===============================================================================

1    ATA                0Byte       Unknown        Missing    Array0 No.0      

 

cliib> 

Alexander Snelling posted this 6 days ago

By the way - Disk Drill results appear to have found a lot of the missing files but unfortunately as I thought they look pretty unusable. So I am approaching the stage where extreme measures might be the only option.

Alexander Snelling posted this 6 days ago

OK now got this - all looks good but it will not mount:

 

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

1    ST2000DM001- SATA HDD  2TB       Encl1 Slot1   OK        Array0 No.0      

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              OK            8TB         0Byte          0Byte           

 

cliib> logdrv

===============================================================================

LdId Alias       OpStatus      Capacity  Stripe RAID    CachePolicy     SYNCed

===============================================================================

0                OK            6TB       1MB    RAID5   RAhead/WBack    Yes   

 

cliib> 

 

 

Have tried mount and force mount in diskutil to no avail/

Have not tried to repair as I suspect that might be the point of no return.

 

Did a 30 hour scan of the drive using DiskDrill Pro and it found a load of files with raw (original) file names (no good to me as I need to connect them to a video project) and that included 150TB of legacy files (on a 6TB drive). None of them would recover or if they did were corrupted so I don't think DiskDrill will help.

 

I have looked at the drive using EaseUS Drive Recovery and that appears to see the correctly named files but hat is a paid service and I'm not sure this is going to work either so have stopped there for now.

 

R P posted this 5 days ago

Hi Alexander,

What did you do?

PD1 has stale data and should not be part of the array. You had it in the correct configuration, the proper procedure would have been to accept the incomplete array then rebuild PD1. Your data should have been online at this point.

The instructions were to leave PD1 out of the array.

I also mentioned earlier to not run any filesystem repair software until we had the array fixed.

I can't help you if you don't follow instructions.

Right now I would remove PD1 and possibly your data may be available. If any writes were done to the LD the parity data is now inconsistent and the data may be unrecoverable.

 

Alexander Snelling posted this 5 days ago

On eof the porblems here is the time difference.
If you go back three posts you'll see I followed exactly what you said, down to the last detail. I then ran phydrv, array and logdrv on the three drive array and posted those results here. They are here again copied from above:

 

There you go:

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

Lost Physical Drives

===============================================================================

PdId Model              PhyCapacity Location       OpStatus   ConfigStatus     

===============================================================================

1    ATA                0Byte       Unknown        Missing    Array0 No.0      

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              Incomplete    8TB         0Byte          0Byte           

 

There are incomplete disk array(s), please use:

"array -a accept -d <DaId>" to accept the condition.

 

cliib> logdrv

===============================================================================

LdId Alias       OpStatus      Capacity  Stripe RAID    CachePolicy     SYNCed

===============================================================================

0                Offline       6TB       1MB    RAID5   RAhead/WBack    No    

 

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

Lost Physical Drives

===============================================================================

PdId Model              PhyCapacity Location       OpStatus   ConfigStatus     

===============================================================================

1    ATA                0Byte       Unknown        Missing    Array0 No.0      


It was not possible to view, mount, forcemount or anything else on that array. It was simply showing as disk3s2.

I then replaced Drive 1 and did what you just described - I rebuilt the array from the 3 good drives onto drive 1.

That is now giving the most recent result which is again here:

1    ST2000DM001- SATA HDD  2TB       Encl1 Slot1   OK        Array0 No.0      

2    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot2   OK        Array0 No.1      

3    ST2000DM008- SATA HDD  2TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  2TB       Encl1 Slot4   OK        Array0 No.3      

 

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              OK            8TB         0Byte          0Byte           

 

cliib> logdrv

===============================================================================

LdId Alias       OpStatus      Capacity  Stripe RAID    CachePolicy     SYNCed

===============================================================================

0                OK            6TB       1MB    RAID5   RAhead/WBack    Yes   

 

cliib> 

 

I am at a loss. The only thing I have not done is repair array using promiseutil as that is warning me that data may be lost however I suspect that may be the only course of action (as the array clearly needs fixing whether it is 3 or 4 drives) but I was waiting for your response nd advice. The data is still there.

 

 

  

R P posted this 5 days ago

Hi Alexander,

It was not possible to view, mount, forcemount or anything else on that array. It was simply showing as disk3s2.

This was where we would have accepted the incomplete array.

cliib> array

===============================================================================

DaId Alias     OpStatus      CfgCapacity FreeCapacity   MaxContiguousCapacity 

===============================================================================

0              Incomplete    8TB         0Byte          0Byte           

 

There are incomplete disk array(s), please use:

"array -a accept -d <DaId>" to accept the condition.

 

Then the LD would be then exposed to the computer's OS and hopefully at this point the data would be visible.

I then replaced Drive 1 and did what you just described - I rebuilt the array from the 3 good drives onto drive 1.

This step would normally be done only after we could see the data.

But the array is back online and now the array and LD are consistent. This is the point where, if necessary, one would use recovery software. Disk Utility is good place to start, if that does not work Disk Warrior is recommended.

Alexander Snelling posted this 5 days ago

Hi RP

 

Thanks for your patience. I've tried DU both front end and in CLI. what is odd is the array is OK and seemingly mounted but I cannot see it. DU cannot mount it - whether is it is mount or mountDisk or even readOnly. I dont really know the right commands or what else to try.

 

Have just tried the following:

Alexanders-iMac:~ alexandersnelling$ diskutil repairVolume disk3s2

Started file system repair on disk3s2

Repairing file system

Volume is already unmounted

Performing fsck_hfs -fy -x /dev/rdisk3s2

File system check exit code is 8

Restoring the original state found as unmounted

Error: -69845: File system verify or repair failed

Underlying error: 8

Alexanders-iMac:~ alexandersnelling$ phydrv

-bash: phydrv: command not found

Alexanders-iMac:~ alexandersnelling$ diskutil repairDisk disk3

Repairing the partition map might erase disk3s1, proceed? (y/N) y

Started partition map repair on disk3

Checking prerequisites

Checking the partition list

Adjusting partition map to fit whole disk as required

Checking for an EFI system partition

Checking the EFI system partition's size

Checking the EFI system partition's file system

Checking the EFI system partition's folder content

Checking all HFS data partition loader spaces

Checking booter partitions

Reviewing boot support loaders

Checking Core Storage Physical Volume partitions

The partition map appears to be OK

Finished partition map repair on disk3

Alexanders-iMac:~ alexandersnelling$ 

Alexander Snelling posted this 5 days ago

This seems to be relevant

 

Alexanders-iMac:~ alexandersnelling$ diskutil repairVolume disk3s2

Started file system repair on disk3s2

Repairing file system

Volume is already unmounted

Performing fsck_hfs -fy -x /dev/rdisk3s2

File system check exit code is 8

Restoring the original state found as unmounted

Error: -69845: File system verify or repair failed

Underlying error: 8

Alexanders-iMac:~ alexandersnelling$ 

 

Alexander Snelling posted this 4 days ago

"This step would normally be done only after we could see the data."

I could see the data. Now I can't.

I've been using Promise for nearly ten years now and have lost count of the number people I have recommended it to but have just lost ALL confidence in the safety of what I thought was a failsafe system.

Array just disppeared. Logical Drive gone. Data is still there but no way of getting it back. Not here anyway...

R P posted this 3 days ago

Hi Alexander,

The thing is, you keep doing things. Every time you post things are different from the post before it. I have no idea what you are doing or why. In a recovery situation things are done slowly and carefully and the situation is verified between steps. But this has not been possible here. You are of course free to act on your own as you have. But if you are unfamiliar with RAID and the CLI this is not a suggested course of action.

Promise arrays are very robust, they don't just disappear.

That being the case there is one last possibility, recreate the array+LUN. Assuming that the drives have not been shuffled again and that the disks are all unconfigured, the CLI commands to do this would be...

cliib> array -a add -p 1,2,3,4 -l "raid=5,forcesynchronized=yes"

cliib> phydrv -a offline -p 1

This will create an LD that won't sync, this is very important, if things are not correct then the sync will overwrite what is there.

As the status of PD1 is unclear, it seems safest to offline it after creating the array. This will put the LUN in critical condition, but that only means that there is no redundancy, it will still be online. A rebuild should wait till after things are corrected.

The MAC should now be able to see the LUN. If it sees the HFS volume it will mount it, if not the volume needs to be repaired.

If diskutil repairvolume fails, it may be necessary to run fsck from safe mode, there are many web pages showing how to do this. If that does not work Disk Warrior is recommended.

Alexander Snelling posted this 3 days ago

Hi RP

I truly thank you for your time and patience. As I said the time difference has played havoc with things and I am also impatient - perhaps my worst failing. I assume you are West Coast and I am in London so we never got to get a thing really going. I did do a few things on my own yes and also with help from some guys at Promise in Europe but none of these were destructive, I understand enought to know that, in fact I think I guessed what your next step would be at least once.

I have tried Disk Warrior and it thinks it has repaired the array but many of my files are missing, in fact some of the most critical so it is not looking good. I am not sure if your above suggestion would pull anything back that Disk Warrior cannot see or not, or if this would be a retrograde step. To be honest, I am ready to restripe all the drives and be done with it so nothing can really do too much harm that hasn't already been done so I will give it a go and call it a day if that fails.

 

Again you are clearly both patient and at the top of your game and I thnak you for both. I'll let you know what happens...

 

R P posted this 3 days ago

Hi Alexander,

I was under the impression that the array and LUN was lost when I read this...

Array just disppeared. Logical Drive gone. Data is still there but no way of getting it back. Not here anyway...

You don't want to recreate the array unless there is no choice as this is a risk move.

At any rate, it looks like things are progressing and you have a game plan going forward, two things to consider.

1. Those 1TB drives probably date from 2012 or so, they are getting old. Perhaps replacing these drives with something newer would be a good idea? Perhaps some new 2TB drives would be a better bet?

2. RAID is not a backup solution, you should also have backups of important files. A simple solution is to buy a USB drive and copy your files to that. Costco has bus powered 2TB and 5TB drives and a wall powered 8TB drive, there should be someting in the range you need to fit your backup needs.

 

Alexander Snelling posted this 3 days ago

Thanks for the advice.  The very last thing I need right now is a lecture on backing up. Everything I had was backed up at least twice apart from the files I have lost. Sometimes this happens. Sometimes you simply do not have the room to back up a 6TB drive. 

All drives are 2TB or actually 4TB in one of my arrays - actually replaced twice - the drives in the main array we have been discussing were a few years old at most. The only 1TB drives are SSDs on a drive that I use not for back-up just for fast playback and access.

As I said I am scrupulous about backups but when you're making a documentary in the middle of a pandemic sometimes it's simply not that easy. I am afraid that Promise hardware for whatever reason has let me down and on diving deeper I have actually lost footage that is irretrievable. Whatever the reason my confidence is shattered and I'm not sure I'll be going back to these drives at all for anything critical.

There was no warning from my Pegasus that things were about to die - to have three drives go down at once is unthinkable; it's great that the data is striped across four drives with redundancy but if the actual redundancy is not protected, then that is a huge failure - one that even a 6 drive array wouldn't have been able to counter. Something happened - could be Mac OS 11 related who knows but it killed my array in one go. All I care about right now is my film and some of that appears to be gone. 

Close