LUN not responding on Esxi

  • 184 Views
  • Last Post 22 September 2020
Alireza Yazdanpanah posted this 21 September 2020

hi evrybidy 

 

1 have a Vess R2000fi by 4 JBOD . that si connect to ESXi host by to SAN Switch .

 

i have many problem by this storage 

 

1- When add LUN to Hosts afte some hours ,host state change to not responding in VCSA  until detached LUN as Host by LUN masking or SAN Switch zoning or disconect port physically

 

 

2-befor state (problem-1 ^)  like this error view in host log =>

 

2020-09-18T19:06:32.653Z cpu18:4860497)HBX: 3033: 'Promise_LUN_213_R10': HB at offset 4063232 - Waiting for timed out HB:

 

2020-09-18T19:06:32.653Z cpu18:4860497)  [HB state abcdef02 offset 4063232 gen 11 stampUS 14903595170423 uuid 5e819c7c-10ab9a46-505a-9c8e992cb068 jrnl <FB 4> drv 24.82 lockImpl 3 ip W.X.Y.Z

3- In storage consloe view this log


2-befor state (problem-1 ^)  like this error view in host log =>

2020-09-18T19:06:32.653Z cpu18:4860497)HBX: 3033: 'Promise_LUN_213_R10': HB at offset 4063232 - Waiting for timed out HB:

 

2020-09-18T19:06:32.653Z cpu18:4860497)  [HB state abcdef02 offset 4063232 gen 11 stampUS 14903595170423 uuid 5e819c7c-10ab9a46-505a-9c8e992cb068 jrnl <FB 4> drv 24.82 lockImpl 3 ip W.X.Y.Z

3- In storage consloe view this log 

Port 2 Ctrl 1    0x17000700   Info   Sep 21, 2020 13:43:53      Host interface link has logged out

Port 2 Ctrl 1    0x16000700  Info    Sep 21, 2020 13:43:53      Host interface link has logged in

 

Someone can help me

R P posted this 22 September 2020

Hi Alireza,

It's not possible to debug a complex issue knowing just the topology.

But I can offer a few suggestions. If you have disabled LUNaffinity (controller settings), please re-enable it. You will probably have to reboot both the Vess and ESX hosts after this as half the active path will go standby and a rescan won't help. And I would not suggest that you use round-robin as the multipathing policy. The best is to use fixed and manually balance the paths.

If the SFPs are getting old, they may be a source of more errors than data passed. Check the FC stats and make sure that the FC paths are error free or only have a small number of errors. SFP write lasers have a limited life, this is why SFPs are plugin-replaceable.

Another thing you should check for is whether some of your disks are going bad. If they are going into constant read retries to recover a bad block, a RAID is no faster than the slowest component or link. Also check BBM.

And if you have not updated to the latest firmware, please do. The firmware update process will also update the JBOD firmware.

Close