Mac Pro periodically hangs with Pegasus2 R4 RAID

  • 395 Views
  • Last Post 14 May 2020
Michael Lupo posted this 21 April 2020

Hello,

I am writing to express some frustration with a recent development utilizing our RAID attached to our Mac Pro (Trash Can)

Periodically, the RAID just hangs. You cannot launch a finder window. When you try, you get the spinning beach ball. The only way to recover when that happens is to unplug the RAID chassis and then reboot the mac. This is extra unfortunate because it's running in a lab about an hour from my home. With the Covid-19 thing, we are forced to work from home, but on these bad RAID days I have to drive to the office to physically reset.

This problem started immediately after updating the mac to Catalina as well as updating the RAID Chassis' firmware. The RAID array had been running on this machine prior for about 3 years flawlessly. This must have something to do with firmeware/Catalina update. I don't see any glaring disk errors (like a physical disk failing) in the logs. 

Has anyone else experienced this?

Here is a link to my subsystem information

Order By: Standard | Latest | Votes
R P posted this 27 April 2020

Hi Michael,

There are some disk errors showing in the service report.

NVRAM|  531 Mar 26, 2020 10:14:32   Minor       PD  3 Command times out on physical disk                           Y3MSPALKS
NVRAM|  532 Mar 26, 2020 10:14:32    Info       PD  3 Physical Disk has been reset                                 Y3MSPALKS
NVRAM|  533 Mar 26, 2020 22:18:55   Minor       PD  1 Command times out on physical disk                           Y3MSPBHKS
NVRAM|  534 Mar 26, 2020 22:18:55    Info       PD  1 Physical Disk has been reset                                 Y3MSPBHKS
NVRAM|  535 Mar 28, 2020 07:33:59   Minor       PD  3 Command times out on physical disk                           Y3MSPALKS
NVRAM|  536 Mar 28, 2020 07:33:59    Info       PD  3 Physical Disk has been reset                                 Y3MSPALKS
NVRAM|  537 Mar 28, 2020 14:13:34   Minor       PD  1 Command times out on physical disk                           Y3MSPBHKS
NVRAM|  538 Mar 28, 2020 14:13:34    Info       PD  1 Physical Disk has been reset                                 Y3MSPBHKS
NVRAM|  539 Apr 06, 2020 08:29:12 Warning     Ctrl  1 Last shutdown is abnormal
NVRAM|  540 Apr 06, 2020 08:29:20    Info     Ctrl  1 The system is started
NVRAM|  541 Apr 06, 2020 15:14:36   Minor       PD  2 Command times out on physical disk                           Y3MS7B2KS
NVRAM|  542 Apr 06, 2020 15:14:36    Info       PD  2 Physical Disk has been reset                                 Y3MS7B2KS
NVRAM|  543 Apr 08, 2020 22:17:57   Minor       PD  1 Command times out on physical disk                           Y3MSPBHKS
NVRAM|  544 Apr 08, 2020 22:17:58    Info       PD  1 Physical Disk has been reset                                 Y3MSPBHKS
NVRAM|  545 Apr 10, 2020 06:08:52   Minor       PD  2 Command times out on physical disk                           Y3MS7B2KS
NVRAM|  546 Apr 10, 2020 06:08:52    Info       PD  2 Physical Disk has been reset                                 Y3MS7B2KS
NVRAM|  547 Apr 10, 2020 16:53:39   Minor       PD  3 Command times out on physical disk                           Y3MSPALKS
NVRAM|  548 Apr 10, 2020 16:53:39    Info       PD  3 Physical Disk has been reset                                 Y3MSPALKS
NVRAM|  549 Apr 20, 2020 08:37:38 Warning     Ctrl  1 Last shutdown is abnormal
NVRAM|  550 Apr 20, 2020 08:37:46    Info     Ctrl  1 The system is started
NVRAM|  551 Apr 20, 2020 22:14:39   Minor       PD  3 Command times out on physical disk                           Y3MSPALKS
NVRAM|  552 Apr 20, 2020 22:14:39    Info       PD  3 Physical Disk has been reset                                 Y3MSPALKS

It looks like command timeouts are seen on PD1, PD2 and PD3. PD4 has not reported any errors.

And you can see the abnormal shutdowns after multiple disk timeouts.

This looks like a disk problem.

Francesco Rizzo posted this 14 May 2020

FYI, I am having the same set of problems since going to Catalina on my Mac Pro trashcan and Promise Utility v 4.04.0000.40 (C6) and P2 R8 firmware 5.04.0000.64.  OS seems to hang on filesystem access to the array and never comes back until power-cycling the array.  TB connection is active, and down-stream array devices are still mounted and functional.  Issue definitely seems to be on the array controller side of things rather than the TB interconnects or disk devices.  Seeing PD resets across all 8 drives periodially (some of which have been replaced with new drives in the last 16 months and PD command timeouts and disk resets continued shortly afterward)  Seems like a chassis/controller issue rather than an actual drive issue.  Removed drives passed vendor diagnostics and r/w stress testing separately just fine.

Also starting to see "PSU 3.3v power is out of the threshold range" errors...  Been running on clean power behind UPS 24x7 for many years.

Michael Lupo posted this 14 May 2020

Update from my original post: Since @RP replied and pointed out disk read errors, I immediately ordered new and systematically replaced each and every drive in my RAID chassis, one-at-a-time. I thought, ok, this fixes the problem most-likely...but NOPE. The chassis did that hanging thing again about 10 days later.  I had to drive into the office despite our building being closed by the government and physically reset the RAID/Mac. 

Promise, this is a problem that you need to resolve.

 

Close