NS4600 unresponsive after checking file system

  • 81 Views
  • Last Post 04 March 2020
Jim Bob posted this 03 January 2020

I fired up my Smartstor the other day and everything was working fine. It's RAID5 with 4x drives.

I decided to run a Check file system just for fun and it ran for a while, updating it's progress to around 50% and then the whole thing stopped responding. Nothing coming out of the NIC at all (have run wireshark captures). The power button on the back does not do anything when held down for 20+seconds. I pulled the power cable out the back of it and after several pulling power cable out the back and reseating all the drives (the NAS powers up and is accessible fine when no drives are connected), I've ascertained that Disk 3 appears to be the cause of the blocked network stack. I know it's not the direct cause as that makes little sense but all I can tell you is that when disk 3 is seated, (in any slot), the IP address of the NAS does not respond to ping and the web UI is inaccessible. 

Now, if I have disk 3 unseated, what is reported in the RAID status via the webUI is very interesting; Initially, Disk4 was showing as "Free" which was very troubling as this should have been part of the RAID5 array, after several seatings of disk 3 back and power up, leave for a while, hard power down (as power button, nor web ui is working) unseat disk 3, power up, now disk 4 is showing as part of the array again and the array is showing as "Offline", rebuilding, in progress. however, I have left for several hours and the progress bar shows as 0%. 

Fed up with the lack of progress, I again seated disk 3, powered up and left for a while. I reapeat the removal of disk3 and power cycle process and now when I check the array via the Web UI with disk 3 removed, rebuild progress is showing as 2%! So, I can conclude that something appears to be happening when disk 3 is seated and the WebUI/Network stack is buggered that may be progressing the rebuild?

I will report back with progress but also very keen to gain any insight from Promise or forum users about similar experiences or advice? I'm really hopeful to not lose all of this data!!

Order By: Standard | Latest | Votes
Jim Bob posted this 03 January 2020

Just left the unit for another couple of hours (3?), powered down by yanking the plug, removed disk 3, powered up, accessed WebUI  and can see rebuild progress is now up to 9% so confirming that something is actually going on in the background when disk 3 is seated and the webUI is inaccessible and not pingable. Interesting that the NAS is totally, apparently offline from a network perspective when it's doing whatever it is doing.

Jim Bob posted this 05 January 2020

So after leaving for about a day and half, I tried the tried-and-tested method of pulling out disk 3, pulling the power cable and powering up and now this is not working - same results as before the power down = no actual network activity showing on a packet capture even though the little network led appears to indicate there should be network activity.

As I have a feeling the problem is between disk 3 and 4, I repeated the process but pulling disk 4 this time and now it actually boots up so this is the opposite to the other day when it would only boot if disk 3 was pulled. Now the unit only boots up fully and is network-accessible if disk 4 is pulled!

When I get into the GUI, it shows disks 1+2 in the array and disk 3 as "free" = not good! Data is still not available.

Jim Bob posted this 07 January 2020

So this is very annoying - It will seemingly boot with any 2 combinations of working disks from the RAID array but never 3. It will also boot with 3 disks as long as one is marked as spare (disk3). So there is seemingly no way I can get it to boot with enough RAID'd disks to recover my data.

Jim Bob posted this 03 March 2020

So, I have not given up on this yet. I managed to obtain an USB to Serial convertor with pin-outs to connect to the console port on the motherboard. Took some faffing (the RX and TX headers are around the wrong way either on my convertor or the motherboard ) but I have console output. 

https://www.amazon.com/ZYAMY-CP2102-Module-Serial-Downloader/dp/B07784SHF7

I've also hit the telnet over TCP 2380 which is responding.

When I try and log in over either console or telnet using the admin/admin default which works on the WebUI, it appears to pass authenticaion but then drops me back out again as the file system is still buggered

Console:

NS4600 (Version 02.01.0000.22) - Promise Technology, INC.

storage login: admin

Password:

warning: cannot change to home directory

admin isn't allo

NS4600 (Version 02.01.0000.22) - Promise Technology, INC.

storage login:

 

Telnet IP 2380:

NS4600 (Version 02.01.0000.22) - Promise Technology, INC.

 

storage login: admin

Password:

warning: cannot change to home directory

 

 

Connection to host lost.

 

I've also been reading up on a number of 4300 and 4600 hacking techniques to enable access to root and/or engmode shells but they all seem to rely on access to the actual user configuration menu or Plugins menu, neither of which I can actually get to as they all just display an error message graphic "The file system conatains errors. Please check file system status"

 

R P posted this 03 March 2020

Hi Jim,

I'm pretty sure that the serial port login is for developer debug. I would assume that the login itself it 'root', what the password is I have no idea.

I've seen the hacking pages you mention and they are very interesting, but I have never tested them and have no idea how accurate they are.

But it's normal that to run a filesystem check, you need root access. Any other login will let you poke around, but you won't have the permissions to fix the filesystem.

Jim Bob posted this 04 March 2020

Moved on significantly since yesterday - Finally managed to get a shell over the console and then from there, a lot of Linux jiggery-pokery later, I've mounted my files and now retrieving them via hacked SMB share. 

There doesn't appear to be anything too majorly wrong with the data (famous last words!) but the NAS seems to want to restart to do a recovery of the RAID array and whenever I restart with all 4 disks and its trying to recover it gets stuck in a crash loop:

 

/usr/sbin/lvchange -a y /dev/vg002/lv001

/etc/raid.conf is NOT vaild !!

vg001 needs to be restored

Restore vg from original backup configuration file

vg001 is mapping to /dev/sda

  Couldn't find devgcfgrestore[1216]: segfault at 30 ip b767f760 sp bf9ccf90 error 4vice with uuid ' in libc-2.3.6.so[b7614000+127000]ULOnNW-pXQF-K40y

-ddcD-gD4f-TXT3-ti5o3i'.

XFS: bad magic number

XFS: SB validate failed

mount: wrong fs type, bad option, bad superblock on /dev/vg001/lv001,

       missing codepage or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so

 

XFS mounting filesystem dm-0

XFS quotacheck dm-0: Please wait.

init invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0

Pid: 1, comm: init Tainted: G        W  2.6.32.14 #5

Call Trace:

TCP5451557 → 443 [ACK] Seq=2629 Ack=12887 Win=263168 Len=0

Close