Clicking sound > Device Status: Great > Drive failure

  • 184 Views
  • Last Post 16 September 2023
S A posted this 11 September 2023

My Pegasus3 R4 is making a clicking sound. There's cleary something wrong with one of the drives. 

 

How can I determine which drive to swap? Promise Utility Pro says "Device Status: Great" so it's of no use with the diagnostics.

Here's a quick video:
https://www.dropbox.com/scl/fi/h96o7pyo94v4guou9f5ob/PROMISE-RAID-PROBLEM-IMG_0067-16LUFS.mp4?rlkey=za9hfpdzkiz1orqjipihc39x0&dl=0

 

Note: there's some blinking in the top drive's light in the video. Last time the clicking sound happened, the blinking was on the 2nd drive so the blinking does not seem to indicate which drive is starting to fail, but something else (reading/writing perhaps?).

 

Order By: Standard | Latest | Votes
S A posted this 12 September 2023

Also: is it even possible to swap individual drives if the Promise Utility Pro does not recognize any flaws?

R P posted this 12 September 2023

H S A,

I would check the SMART status of the drives. The command from the CLI is...

smart

Most likely one drive is showing many errors.

If a drive is clicking you can be sure it will fail eventually, probably soon, so this question will answer itself.

You can rearrange the drives if you think that will help debug, but be sure to power the Pegasus off before unplugging a drive.

  • Liked by
  • S A
S A posted this 13 September 2023

Thanks!


Surely enough, the drive failed last night. Going to try and swap the drive now.

 

About rearranging the drives: is it really possible to rearrange the drive slots in a RAID 5 setup for debugging purposes? That's interesting. 

 

And: is it possible to switch a drive withot a fail, pre-emptively?

S A posted this 13 September 2023

On to the next question:

 

I replaced the faulty TOSHIBA DT01ACA3 3TB with a new SEAGATE BARRACUDA ST30000DM001 3TB. The new drive is not recognized by Utility Pro.

 

How do I proceed from here?


R P posted this 13 September 2023

Hi S A,

Can you open the CLI and run these command and paste the results in a reply?

phydrv

phydrv -v

To open the CLI, open a terminal, enter promiseutil and hit the enter key.

  • Liked by
  • S A
S A posted this 14 September 2023

Hi!

 

For some reason the new drive showed up last night! Might have something to do with the computer going to sleep and then waking up?

 

Not fully working, though.

• The new drive shows up in the array as a darker shade of blue than the original drives.
• The enclosure is beeping 2 beeps every few seconds.
• There's an orange "Warning" status next the device name.
• There's a "Disk Array 0 is Degraded" warning visible in the Critical Events.




Here's the event listing.

 

I also tried running promiseutil in Terminal, but it did not run:

S A posted this 14 September 2023

The did a quick search of the manual and it says this about the beeps: "When the disk array is rebuilding and the alert sound is enabled, the Pegasus unit emits two quick beeps every five seconds. The beeps stop when the rebuild is done." So it's a good kind of beeping, then?

 

If this is the case, it'd be great if the Utility dashboard would somehow recognize the rebuild with a visible prompt like "Rebuilding Disk Array 0" or similar. The "Critical" and "Degraded" statuses are pretty scary. :-D

R P posted this 14 September 2023

Hi S A,

From the zsh error you must be running an M1 MAC. Please delete the Promise Utility currently installed and install this version then try promiseutil again.

  • Liked by
  • S A
S A posted this 15 September 2023

Hi S A,

Can you open the CLI and run these command and paste the results in a reply?

phydrv

phydrv -v

To open the CLI, open a terminal, enter promiseutil and hit the enter key.

 

cliib> phydrv

===============================================================================

PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     

===============================================================================

1    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot1   OK        Array0 No.0      

2    ST3000DM001- SATA HDD  3TB       Encl1 Slot2   OK        PassThru         

3    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot3   OK        Array0 No.2      

4    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot4   OK        Array0 No.3  

 

 

 

S A posted this 15 September 2023

cliib> phydrv -v

 

-------------------------------------------------------------------------------

PdId: 1

OperationalStatus: OK

Alias: 

PhysicalCapacity: 3TB                  ConfigurableCapacity: 3TB

UsedCapacity: 3TB                      LogicalBlockSize: 512Bytes

ConfigStatus: Array0 No.0              Location: Encl1 Slot1

ModelNo: TOSHIBA DT01ACA3

SerialNo: 68JUM8NAS                    FirmwareVersion: MX6OABB0

DriveInterface: SATA 6Gb/s             Protocol: ATA/ATAPI-8

WriteCacheSupport: Yes                 WriteCache: Enabled

RLACacheSupport: Yes                   RLACache: Enabled

SMARTFeatureSetSupport: Yes

SMARTSelfTestSetSupport: Yes           SMARTErrorLoggingSupport: Yes

CmdQueuingSupport: NCQ                 CmdQueuing: Enabled

CmdQueueDepth: 32                      MediumErrorThreshold: 64

Errors: 0                              NonRWErrors: 0

ReadErrors: 0                          WriteErrors: 0

PowerSavingStatus: Full Power          TemperaturePollingInterval: 3 minutes

DriveTemperature: 38C/100F             ReferenceDriveTemperature: N/A

Flags: N/A                             LastUnconfiguredFragement: N/A

PhysicalSectorSize: 4KB

 

-------------------------------------------------------------------------------

PdId: 2

OperationalStatus: OK

Alias: 

PhysicalCapacity: 3TB                  ConfigurableCapacity: 3TB

UsedCapacity: 0Byte                    LogicalBlockSize: 512Bytes

ConfigStatus: PassThru                 Location: Encl1 Slot2

ModelNo: ST3000DM001-1ER1

SerialNo: ZA5004S0                     FirmwareVersion: CC25

DriveInterface: SATA 6Gb/s             Protocol: ATA/ATAPI-9

WriteCacheSupport: Yes                 WriteCache: Enabled

RLACacheSupport: Yes                   RLACache: Enabled

SMARTFeatureSetSupport: Yes

SMARTSelfTestSetSupport: Yes           SMARTErrorLoggingSupport: Yes

CmdQueuingSupport: NCQ                 CmdQueuing: Enabled

CmdQueueDepth: 32                      MediumErrorThreshold: 64

Errors: 0                              NonRWErrors: 0

ReadErrors: 0                          WriteErrors: 0

PowerSavingStatus: Full Power          TemperaturePollingInterval: 3 minutes

DriveTemperature: 34C/93F              ReferenceDriveTemperature: N/A

Flags: N/A                             LastUnconfiguredFragement: N/A

PhysicalSectorSize: 4KB

 

-------------------------------------------------------------------------------

PdId: 3

OperationalStatus: OK

Alias: 

PhysicalCapacity: 3TB                  ConfigurableCapacity: 3TB

UsedCapacity: 3TB                      LogicalBlockSize: 512Bytes

ConfigStatus: Array0 No.2              Location: Encl1 Slot3

ModelNo: TOSHIBA DT01ACA3

SerialNo: 68JUM8JAS                    FirmwareVersion: MX6OABB0

DriveInterface: SATA 6Gb/s             Protocol: ATA/ATAPI-8

WriteCacheSupport: Yes                 WriteCache: Enabled

RLACacheSupport: Yes                   RLACache: Enabled

SMARTFeatureSetSupport: Yes

SMARTSelfTestSetSupport: Yes           SMARTErrorLoggingSupport: Yes

CmdQueuingSupport: NCQ                 CmdQueuing: Enabled

CmdQueueDepth: 32                      MediumErrorThreshold: 64

Errors: 0                              NonRWErrors: 0

ReadErrors: 0                          WriteErrors: 0

PowerSavingStatus: Full Power          TemperaturePollingInterval: 3 minutes

DriveTemperature: 37C/98F              ReferenceDriveTemperature: N/A

Flags: N/A                             LastUnconfiguredFragement: N/A

PhysicalSectorSize: 4KB

 

-------------------------------------------------------------------------------

PdId: 4

OperationalStatus: OK

Alias: 

PhysicalCapacity: 3TB                  ConfigurableCapacity: 3TB

UsedCapacity: 3TB                      LogicalBlockSize: 512Bytes

ConfigStatus: Array0 No.3              Location: Encl1 Slot4

ModelNo: TOSHIBA DT01ACA3

SerialNo: 68JUM8YAS                    FirmwareVersion: MX6OABB0

DriveInterface: SATA 6Gb/s             Protocol: ATA/ATAPI-8

WriteCacheSupport: Yes                 WriteCache: Enabled

RLACacheSupport: Yes                   RLACache: Enabled

SMARTFeatureSetSupport: Yes

SMARTSelfTestSetSupport: Yes           SMARTErrorLoggingSupport: Yes

CmdQueuingSupport: NCQ                 CmdQueuing: Enabled

CmdQueueDepth: 32                      MediumErrorThreshold: 64

Errors: 0                              NonRWErrors: 0

ReadErrors: 0                          WriteErrors: 0

PowerSavingStatus: Full Power          TemperaturePollingInterval: 3 minutes

DriveTemperature: 37C/98F              ReferenceDriveTemperature: N/A

 

Flags: N/A                             LastUnconfiguredFragement: N/A

S A posted this 15 September 2023

And here's the 1st one

 

cliib> smart

-------------------------------------------------------------------------------

PdId: 1

Model Number: TOSHIBA DT01ACA3

Drive Type: SATA

SMART Status: Enable

SMART Health Status: OK

SCT Status Version:                  3

SCT Version (vendor specific):       256 (0x0100)

SCT Support Level:                   1

Device State:                        SMART Off-line Data Collection executing in background (4)

Current Temperature:                    38 Celsius

Power Cycle Min/Max Temperature:     35/38 Celsius

Lifetime    Min/Max Temperature:     19/42 Celsius

Under/Over Temperature Limit Count:   0/0

Self-test execution status:      (   0) The previous self-test routine

completed without error or no self-test

has ever been run.

Error logging capability:        (0x01) Error logging supported.

Short self-test routine 

recommended polling time:  (   1) minutes.

Extended self-test routine

recommended polling time:  ( 255) minutes.

SCT capabilities:        (0x003d) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Self-test log structure revision number: 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

SMART Error Log Version: 1

No Errors Logged

 

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

==============================================================================

ID# ATTRIBUTE_NAME

    FLAG    VALUE WORST THRESH TYPE      UPDATED    WHEN_FAILED  RAW_VALUE

==============================================================================

  1 Raw_Read_Error_Rate     

    0x000b  100   100   016    Pre-fail  Always     -            0

  2 Throughput_Performance  

    0x0005  140   140   054    Pre-fail  Offline    -            67

  3 Spin_Up_Time            

    0x0007  136   136   024    Pre-fail  Always     -            430 (Average 413)

  4 Start_Stop_Count        

    0x0012  100   100   000    Old_age   Always     -            3894

  5 Reallocated_Sector_Ct   

    0x0033  100   100   005    Pre-fail  Always     -            0

  7 Seek_Error_Rate         

    0x000b  100   100   067    Pre-fail  Always     -            0

  8 Seek_Time_Performance   

    0x0005  124   124   020    Pre-fail  Offline    -            33

  9 Power_On_Hours          

    0x0012  097   097   000    Old_age   Always     -            27719

 10 Spin_Retry_Count        

    0x0013  100   100   060    Pre-fail  Always     -            0

 12 Power_Cycle_Count       

    0x0032  100   100   000    Old_age   Always     -            3893

192 Power-Off_Retract_Count 

    0x0032  097   097   000    Old_age   Always     -            3896

193 Load_Cycle_Count        

    0x0012  097   097   000    Old_age   Always     -            3896

194 Temperature_Celsius     

    0x0002  157   157   000    Old_age   Always     -            38 (Lifetime Min/Max 19/42)

196 Reallocated_Event_Count 

    0x0032  100   100   000    Old_age   Always     -            0

197 Current_Pending_Sector  

    0x0022  100   100   000    Old_age   Always     -            0

198 Offline_Uncorrectable   

    0x0008  100   100   000    Old_age   Offline    -            0

199 UDMA_CRC_Error_Count    

    0x000a  200   200   000    Old_age   Always     -            0

 

-------------------------------------------------------------------------------

PdId: 2

Model Number: ST3000DM001-1ER1

Drive Type: SATA

SMART Status: Enable

SMART Health Status: OK

SCT Status Version:                  3

SCT Version (vendor specific):       522 (0x020a)

SCT Support Level:                   1

Device State:                        Active (0)

Current Temperature:                    34 Celsius

Power Cycle Min/Max Temperature:     32/34 Celsius

Lifetime    Min/Max Temperature:     22/34 Celsius

Under/Over Temperature Limit Count:   0/0

Self-test execution status:      (   0) The previous self-test routine

completed without error or no self-test

has ever been run.

Error logging capability:        (0x01) Error logging supported.

Short self-test routine 

recommended polling time:  (   1) minutes.

Extended self-test routine

recommended polling time:  ( 255) minutes.

SCT capabilities:        (0x1085) SCT Status supported.

 

SMART Self-test log structure revision number: 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

SMART Error Log Version: 1

No Errors Logged

 

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

==============================================================================

ID# ATTRIBUTE_NAME

    FLAG    VALUE WORST THRESH TYPE      UPDATED    WHEN_FAILED  RAW_VALUE

==============================================================================

  1 Raw_Read_Error_Rate     

    0x000f  100   100   006    Pre-fail  Always     -            0

  3 Spin_Up_Time            

    0x0003  094   094   000    Pre-fail  Always     -            0

  4 Start_Stop_Count        

    0x0032  100   100   020    Old_age   Always     -            10

  5 Reallocated_Sector_Ct   

    0x0033  100   100   010    Pre-fail  Always     -            0

  7 Seek_Error_Rate         

    0x000f  100   253   030    Pre-fail  Always     -            0

  9 Power_On_Hours          

    0x0032  100   100   000    Old_age   Always     -            49

 10 Spin_Retry_Count        

    0x0013  100   100   097    Pre-fail  Always     -            0

 12 Power_Cycle_Count       

    0x0032  100   100   020    Old_age   Always     -            10

183 Runtime_Bad_Count(total)

    0x0032  100   100   000    Old_age   Always     -            0

184 End_to_End_Error_Detection_Count

    0x0032  100   100   099    Old_age   Always     -            0

187 Uncorrectable_Error_Count

    0x0032  100   100   000    Old_age   Always     -            0

188 Unknown_Attribute       

    0x0032  100   100   000    Old_age   Always     -            0

189 High_Fly_Writes         

    0x003a  100   100   000    Old_age   Always     -            0

190 Airflow_Temperature_Cel 

    0x0022  066   065   045    Old_age   Always     -            34 (Lifetime Min/Max 32/34)

191 G-Sense_Error_Rate      

    0x0032  100   100   000    Old_age   Always     -            0

192 Power-Off_Retract_Count 

    0x0032  100   100   000    Old_age   Always     -            10

193 Load_Cycle_Count        

    0x0032  100   100   000    Old_age   Always     -            10

194 Temperature_Celsius     

    0x0022  034   040   000    Old_age   Always     -            34 (0 23 0 0)

197 Current_Pending_Sector  

    0x0012  100   100   000    Old_age   Always     -            0

198 Offline_Uncorrectable   

    0x0010  100   100   000    Old_age   Offline    -            0

199 UDMA_CRC_Error_Count    

    0x003e  200   253   000    Old_age   Always     -            0

240 Head_Flying_Hours       

    0x0000  100   253   000    Old_age   Offline    -            46

241 Total_LBAs_Written      

    0x0000  100   253   000    Old_age   Offline    -            0

242 Total_LBAs_Read         

    0x0000  100   253   000    Old_age   Offline    -            111

 

-------------------------------------------------------------------------------

PdId: 3

Model Number: TOSHIBA DT01ACA3

Drive Type: SATA

SMART Status: Enable

SMART Health Status: OK

SCT Status Version:                  3

SCT Version (vendor specific):       256 (0x0100)

SCT Support Level:                   1

Device State:                        SMART Off-line Data Collection executing in background (4)

Current Temperature:                    37 Celsius

Power Cycle Min/Max Temperature:     35/37 Celsius

Lifetime    Min/Max Temperature:     19/42 Celsius

Under/Over Temperature Limit Count:   0/0

Self-test execution status:      (   0) The previous self-test routine

completed without error or no self-test

has ever been run.

Error logging capability:        (0x01) Error logging supported.

Short self-test routine 

recommended polling time:  (   1) minutes.

Extended self-test routine

recommended polling time:  ( 255) minutes.

SCT capabilities:        (0x003d) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Self-test log structure revision number: 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

SMART Error Log Version: 1

No Errors Logged

 

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

==============================================================================

ID# ATTRIBUTE_NAME

    FLAG    VALUE WORST THRESH TYPE      UPDATED    WHEN_FAILED  RAW_VALUE

==============================================================================

  1 Raw_Read_Error_Rate     

    0x000b  100   100   016    Pre-fail  Always     -            0

  2 Throughput_Performance  

    0x0005  140   140   054    Pre-fail  Offline    -            69

  3 Spin_Up_Time            

    0x0007  130   130   024    Pre-fail  Always     -            440 (Average 440)

  4 Start_Stop_Count        

    0x0012  100   100   000    Old_age   Always     -            3896

  5 Reallocated_Sector_Ct   

    0x0033  100   100   005    Pre-fail  Always     -            0

  7 Seek_Error_Rate         

    0x000b  100   100   067    Pre-fail  Always     -            0

  8 Seek_Time_Performance   

    0x0005  124   124   020    Pre-fail  Offline    -            33

  9 Power_On_Hours          

    0x0012  097   097   000    Old_age   Always     -            27721

 10 Spin_Retry_Count        

    0x0013  100   100   060    Pre-fail  Always     -            0

 12 Power_Cycle_Count       

    0x0032  100   100   000    Old_age   Always     -            3895

192 Power-Off_Retract_Count 

    0x0032  097   097   000    Old_age   Always     -            3901

193 Load_Cycle_Count        

    0x0012  097   097   000    Old_age   Always     -            3901

194 Temperature_Celsius     

    0x0002  162   162   000    Old_age   Always     -            37 (Lifetime Min/Max 19/42)

196 Reallocated_Event_Count 

    0x0032  100   100   000    Old_age   Always     -            0

197 Current_Pending_Sector  

    0x0022  100   100   000    Old_age   Always     -            0

198 Offline_Uncorrectable   

    0x0008  100   100   000    Old_age   Offline    -            0

199 UDMA_CRC_Error_Count    

    0x000a  200   200   000    Old_age   Always     -            0

 

-------------------------------------------------------------------------------

PdId: 4

Model Number: TOSHIBA DT01ACA3

Drive Type: SATA

SMART Status: Enable

SMART Health Status: OK

SCT Status Version:                  3

SCT Version (vendor specific):       256 (0x0100)

SCT Support Level:                   1

Device State:                        SMART Off-line Data Collection executing in background (4)

Current Temperature:                    38 Celsius

Power Cycle Min/Max Temperature:     35/38 Celsius

Lifetime    Min/Max Temperature:     19/43 Celsius

Under/Over Temperature Limit Count:   0/0

Self-test execution status:      (   0) The previous self-test routine

completed without error or no self-test

has ever been run.

Error logging capability:        (0x01) Error logging supported.

Short self-test routine 

recommended polling time:  (   1) minutes.

Extended self-test routine

recommended polling time:  ( 255) minutes.

SCT capabilities:        (0x003d) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Self-test log structure revision number: 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

SMART Error Log Version: 1

No Errors Logged

 

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

==============================================================================

ID# ATTRIBUTE_NAME

    FLAG    VALUE WORST THRESH TYPE      UPDATED    WHEN_FAILED  RAW_VALUE

==============================================================================

  1 Raw_Read_Error_Rate     

    0x000b  100   100   016    Pre-fail  Always     -            0

  2 Throughput_Performance  

    0x0005  139   139   054    Pre-fail  Offline    -            70

  3 Spin_Up_Time            

    0x0007  133   133   024    Pre-fail  Always     -            430 (Average 430)

  4 Start_Stop_Count        

    0x0012  100   100   000    Old_age   Always     -            3896

  5 Reallocated_Sector_Ct   

    0x0033  100   100   005    Pre-fail  Always     -            0

  7 Seek_Error_Rate         

    0x000b  100   100   067    Pre-fail  Always     -            0

  8 Seek_Time_Performance   

    0x0005  124   124   020    Pre-fail  Offline    -            33

  9 Power_On_Hours          

    0x0012  097   097   000    Old_age   Always     -            27718

 10 Spin_Retry_Count        

    0x0013  100   100   060    Pre-fail  Always     -            0

 12 Power_Cycle_Count       

    0x0032  100   100   000    Old_age   Always     -            3895

192 Power-Off_Retract_Count 

    0x0032  097   097   000    Old_age   Always     -            3899

193 Load_Cycle_Count        

    0x0012  097   097   000    Old_age   Always     -            3899

194 Temperature_Celsius     

    0x0002  162   162   000    Old_age   Always     -            37 (Lifetime Min/Max 19/43)

196 Reallocated_Event_Count 

    0x0032  100   100   000    Old_age   Always     -            0

197 Current_Pending_Sector  

    0x0022  100   100   000    Old_age   Always     -            0

198 Offline_Uncorrectable   

    0x0008  100   100   000    Old_age   Offline    -            0

199 UDMA_CRC_Error_Count    

    0x000a  200   200   000    Old_age   Always     -            0

 

 

R P posted this 15 September 2023

Hi S A,

The SMART data was to find which drive was failing, as it has already failed we don't need it. On the other hand, all the other drives are looking good and not showing any signs of impending doom.

This is the problem.

cliib> phydrv
===============================================================================
PdId Model        Type      Capacity  Location      OpStatus  ConfigStatus     
===============================================================================
1    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot1   OK        Array0 No.0      
2    ST3000DM001- SATA HDD  3TB       Encl1 Slot2   OK        PassThru         
3    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot3   OK        Array0 No.2      
4    TOSHIBA DT01 SATA HDD  3TB       Encl1 Slot4   OK        Array0 No.3

PD2 (second drive from the top is in passthru mode. We need to change that or we can't add it to the array.

The CLI command to remove the passthru mode is...

phydrv -a mod -s "config=unconfig" -p 2

The simplest way to start a rebuild is to make PD2 a spare.

spare -a add -p 2

 

  • Liked by
  • S A
S A posted this 16 September 2023

Thanks, PD2 is now rebuilding:

PdId: 2

OperationalStatus: Rebuilding

 

 

S A posted this 16 September 2023

 

 

Looks like everything is in order now, right?

 

 

If I wanted to replace the rest of the old drives before they fail can I just pop one out and do the same CLI commands (replacing 2 with the appropriate slot number)?

  

Thanks a lot for your help!

 

 

Close