Forum: MSmac BBS

Problem With Old Zyxel NSA 221 NASs & Seagate HDs

From Java Jive@2:250/1 to All on Monday, May 05, 2025 10:24:09

Having successfully upgraded my two primary QNAP 251+ NASs, I've handed
down two of the HDs to the Zyxel NSA221s NASs that were my original
backup solution. The disks are Seagate Iron Wolf 8TB ST8000VN004,
originally from Amazon ...

https://www.amazon.co.uk/dp/B07SZVVBBK

.... but they are giving problems in their new home, or, I should say, housing.

The problem is the same for each, they spin up too slowly to be found as
the NSA221 boots, so after a cold boot I have to reboot, so that then
they can be found on the reboot. Once that is done, they seem to be
perfectly satisfactory, and, despite the NAS specs saying that they only
work up to a max of 4TB per disk, I'm actually getting the full
(nominal) 8TB added to the capacity of the other Toshiba HDs which have
always been free of such problems.

I've had similar problems in the past with other Seagate HDs in these
NASs, which at the time I got around by using a reboot flag in a rather convoluted manner, and which now won't work anyway because I have since configured the NASs to use the combined disk space of the two disks as
one large virtual HD volume, which means that now I have nowhere to
write a reboot flag unless both HD are running and found at boot, in
which case I wouldn't need to write it anyway. I might be able to get
round this by using RAM, but it would need some investigation as to what
would survive a reboot.

This reboot requirement will be easy to forget and should be avoidable
by making the system wait longer for the disks to spin up. Setting
aside the problem of altering firmware (see next para), in a normal
Linux installation, how normally would one accomplish this? A boot
parameter? An /etc setting?

I have some scope for making changes, as in the past I've recompiled the firmware to be Gnu GPL and both NASs are running the result. Also the
command run by UBoot is stored in an environment variable, which means
that I could add a boot parameter to that fairly easily.

The Zyxel DK to build a GPL firmware dates from the Ubuntu 7 era (!),
but I was still running it satisfactorily somewhere around Ubuntu 16 or
18. Also, the NASs each have a serial header which I can use to
interrupt their boot and change things, though I wouldn't want to be
doing that on a permanent basis, only for temporary fixes to see if they
work. There are also OpenWRT versions of the firmware, but having
obtained already a pretty good result from my own firmware, I haven't
gone down that road, as it would be too time consuming for such old
hardware.

Can anyone suggest how to fix this slow spin-up problem?

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Jeff Gaines@2:250/1 to All on Monday, May 05, 2025 10:53:06

On 05/05/2025 in message <vva03q$5ulg$1@dont-email.me> Java Jive wrote:

Can anyone suggest how to fix this slow spin-up problem?

Seriously?

How much is your data worth, more or less than £163.94?

--
Jeff Gaines Dorset UK
Did you know on the Canary Islands there is not one canary?
And on the Virgin Islands same thing, not one canary.

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Andy Burns@2:250/1 to All on Monday, May 05, 2025 11:06:25

Java Jive wrote:

Seagate Iron Wolf 8TB ST8000VN004
Can anyone suggest how to fix this slow spin-up problem?

I see several complaints about that drive being slow to spin-up, the
manual says 23s (typ) to 30s (max), so I suspect there's nothing you can
do to the drives themselves.

in your grub.cfg maybe experiment with boot_delay=xxx values in ms to
delay all the kernel messages?

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Theo@2:250/1 to All on Monday, May 05, 2025 15:13:25

In uk.comp.os.linux Andy Burns <usenet@andyburns.uk> wrote:

Java Jive wrote:

Seagate Iron Wolf 8TB ST8000VN004
Can anyone suggest how to fix this slow spin-up problem?

I see several complaints about that drive being slow to spin-up, the
manual says 23s (typ) to 30s (max), so I suspect there's nothing you can
do to the drives themselves.

in your grub.cfg maybe experiment with boot_delay=xxx values in ms to
delay all the kernel messages?

boot_delay appears to slow down kernel printouts, which is likely to be a
bit dependent on how many there are. Instead I'd use rootdelay=30 to delay
the kernel start by 30 seconds, and then it'll boot at full speed.

(It'll be in the u-boot command line variable not grub.cfg, since this is a non-x86 device)

In regular Linux you could write a udev/systemd rule to mount the drive(s)
when it/they became ready, whenever that might be. This system is too old
for systemd though.

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Paul@2:250/1 to All on Monday, May 05, 2025 15:19:33

On Mon, 5/5/2025 5:24 AM, Java Jive wrote:

Having successfully upgraded my two primary QNAP 251+ NASs, I've handed down two of the HDs to the Zyxel NSA221s NASs that were my original backup solution. The disks are Seagate Iron Wolf 8TB ST8000VN004, originally from Amazon ...

https://www.amazon.co.uk/dp/B07SZVVBBK

... but they are giving problems in their new home, or, I should say, housing.

The problem is the same for each, they spin up too slowly to be found as the NSA221 boots, so after a cold boot I have to reboot, so that then they can be found on the reboot. Once that is done, they seem to be perfectly satisfactory, and, despite the NAS specs saying that they only work up to a max of 4TB per disk, I'm actually getting the full (nominal) 8TB added to the capacity of the other Toshiba HDs which have always been free of such problems.

I've had similar problems in the past with other Seagate HDs in these NASs, which at the time I got around by using a reboot flag in a rather convoluted manner, and which now won't work anyway because I have since configured the NASs to use the combined disk space of the two disks as one large virtual HD volume, which means that now I have nowhere to write a reboot flag unless both HD are running and found at boot, in which case I wouldn't need to write it anyway. I might be able to get round this by using RAM, but it would need some investigation as to what would survive a reboot.

This reboot requirement will be easy to forget and should be avoidable by making the system wait longer for the disks to spin up. Setting aside the problem of altering firmware (see next para), in a normal Linux installation, how normally would one accomplish this? A boot parameter? An /etc setting?

I have some scope for making changes, as in the past I've recompiled the firmware to be Gnu GPL and both NASs are running the result. Also the command run by UBoot is stored in an environment variable, which means that I could add a boot parameter to that fairly easily.

The Zyxel DK to build a GPL firmware dates from the Ubuntu 7 era (!), but I was still running it satisfactorily somewhere around Ubuntu 16 or 18. Also, the NASs each have a serial header which I can use to interrupt their boot and change things, though I wouldn't want to be doing that on a permanent basis, only for temporary fixes to see if they work. There are also OpenWRT versions of the firmware, but having obtained already a pretty good result from my own firmware, I haven't gone down that road, as it would be too time consuming for such old hardware.

Can anyone suggest how to fix this slow spin-up problem?

I don't really see a "fix" for this.

The disk drive has a motor controller. It is a three phase waveform
generator in a sense. It is somehow programmed for an acceleration profile,
and that profile leads to a "constrained current draw". When the spindle
is seized, there is a modulation pattern applied to the waveform generator giving a characteristic sound, but this is mostly useless and too little
torque is generated to overcome a dry bearing or head stiction.

For such a slow startup, it could be drawing 12V @ 1A into the motor
controller chip. The frequency of the waveform generator ramps up, and
the motor accelerates. Current is drawn to make the motor accelerate
like that. It looks like the current draw is still open loop (the tops
of the current draw excursions are not lopped off), so the current
draw target is mostly maintained by the firmware watching for overload.

Three phase is used, for controlling torque ripple. The signal coming
off the head, is supposed to be at a fixed frequency for a zone. The
disk has zones, and the drive has to compensate for where it is reading
data. But while it is reading data, they don't want the clock rate
to vary. And the motor controller attempts to regulate the speed and
reduce the variation in speed. The servo wedges presumably have a strong
clock signal, and afford an opportunity to keep the PLL locked. The platter alternates between servo wedges and data sectors, as the heads stay on
a cylinder.

1TB drives can accelerate to 7200 RPM in five seconds. Hitachi-HGST-IBM
drives, are pokey, and can take more than 20 seconds for spinup.
(No other brands are supposed to be quite as pokey as some of those!
WD owns the remains of that stream of ownership and owns HGST.)

The five second drive, likely draws 12V @ 2A for five seconds. The
pokey drive draws 12V @ 1A for 20 seconds. The assumption is that
the slow drives are not "boot drives", whereas the fast drives gate
forward progress, so the power footprint is allowed to be larger.
At one time, the whole disk drive power envelope was 40W total,
and drives required laminar airflow over the top to keep the
temperature down. Modern drives at idle, can be as low as 5 watts idling
(not as much current needed to maintain constant platter speed).

You're not going to change the disk drive firmware. The technical details
of that are unknown. Even the command line language for the disk
interpreter is cryptic and mostly unknown by laymen. Some of the language
was exposed, in a Seagate data recovery procedure for a broken firmware.

*******

The PC compatible BIOS, has a timer setting, and the max time is 35 seconds. This is the time constant the NAS should be using at its "BIOS level".
That is, if it had a BIOS level.

A drive will NOT respond to an ID command, until up to speed. The re-enforcement for this is simple. The drive won't do anything until
the motor controller chip says "we are now at 7200RPM". Then the heads
load onto the platter edge, using the bootstrap HDD controller ROM.
Then around 2 megabytes of additional firmware is read off the
platter, as the first read operation. This is loaded into
controller RAM. The 2 megabytes would contain items such as an
ATA command parser. Now, when the "ID yourself" command comes in,
the drive is willing to answer. "Seagate Pokey Drive ABC12345".

Normally, the PC BIOS waits the 35 seconds, using repeated ID yourself commands, to eventually get an answer.

*******

The system response then, to a rotating drive, whether done by
NAS BIOS or NAS OS, is to wait up to 35 seconds for ID to complete.

*******

The SATA bus has no RESET signal. (The IDE ribbon cable interface
did have a hardware RESET, which is handy for getting insane
controller processors to restart. The only way to fix insanity
on SATA drives, is to cycle the power!)

This is why your reboot strategy works. When the NAS is told to
reboot, the drive is up to speed, and the drive is in a sense,
unaware of the system state. It continues spinning at 7200 RPM,
the command parser is loaded, the very first "ID yourself" that
comes in during boot, will be answered. That is why your current
double-boot method is working.

Paul

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Andy Burns@2:250/1 to All on Monday, May 05, 2025 15:37:51

Theo wrote:

boot_delay appears to slow down kernel printouts, which is likely to be a
bit dependent on how many there are.

hence "experiment"

Instead I'd use rootdelay=30 to delay
the kernel start by 30 seconds, and then it'll boot at full speed.

I read that as waiting xx seconds before mounting the root fs, but if it
can't see any disks, will it wait, or bail-out?

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Theo@2:250/1 to All on Monday, May 05, 2025 20:02:44

In uk.comp.os.linux Andy Burns <usenet@andyburns.uk> wrote:

Theo wrote:

boot_delay appears to slow down kernel printouts, which is likely to be a bit dependent on how many there are.

hence "experiment"

Instead I'd use rootdelay=30 to delay
the kernel start by 30 seconds, and then it'll boot at full speed.

I read that as waiting xx seconds before mounting the root fs, but if it can't see any disks, will it wait, or bail-out?

It'll pause the boot for 30 seconds at the point the root fs is mounted.
The rootfs is not on HDD, it'll be in flash, so the boot will then proceed
once the 30s is up - it won't then fail for lack of a rootfs. If the discs still aren't up at the end of 30s (or 120s or whatever number you write
there) then they will be missing, just as they are at the moment. But if
they are not reliable enough to start given a large enough timeout then that points to a problem with the discs, rather than just a regular but slow
spinup.

(if you did have your rootfs on the HDD you could use 'rootwait' to pause
until the rootfs volume was ready. But that only applies for the rootfs and not other volumes)

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Tuesday, May 06, 2025 12:19:15

Apologies for a slight delay in replying, thanks for all the replies ...

On 2025-05-05 20:02, Theo wrote:

In uk.comp.os.linux Andy Burns <usenet@andyburns.uk> wrote:

Theo wrote:

Instead I'd use rootdelay=30 to delay
the kernel start by 30 seconds, and then it'll boot at full speed.

I read that as waiting xx seconds before mounting the root fs, but if it
can't see any disks, will it wait, or bail-out?

It'll pause the boot for 30 seconds at the point the root fs is mounted.
The rootfs is not on HDD, it'll be in flash, so the boot will then proceed once the 30s is up - it won't then fail for lack of a rootfs. If the discs still aren't up at the end of 30s (or 120s or whatever number you write there) then they will be missing, just as they are at the moment. But if they are not reliable enough to start given a large enough timeout then that points to a problem with the discs, rather than just a regular but slow spinup.

Don't think that's the problem here, rather Seagate generally just seem
to be too slow in spinning up for this box's liking.

(if you did have your rootfs on the HDD you could use 'rootwait' to pause until the rootfs volume was ready. But that only applies for the rootfs and not other volumes)

I spent yesterday evening trying to remember the name of the program
that reads and sets UBoot environment variable, and finally found it
this morning. Here are the current settings:

~ # /zyxel/sbin/fw_printenv
bootcmd=cp 0x411e0000 0x4a000000 0x25c000; bootm 0x41020000
baudrate=115200
autoload=n
netmask=255.255.255.0
bootfile="uImage"
MODEL_ID=DC01
PRODUCT_NAME=NSA-221
FEATURE_BIT=00
CONTRY_TYPE=FF
VENDOR_NAME=ZyXEL Communications Corp.
ethaddr=50:67:F0:93:49:90
ipaddr=192.168.1.233
tftpblocksize=512
tftprun=cp 0x411e0000 0x4a000000 0x25c000; tftpboot 0x57000000 uImage;
bootm 0x57000000
stdin=serial
stdout=serial
stderr=serial
bootargs=console=ttyS0,115200n8 root=/dev/ram0 rw init=/sbin/init initrd=0x4a000000,4M elevator=cfq mtdparts=physmap-flash.0:128k(uboot),1792k(kernel),1664k(initrd),448k(etc),48k(empty),8k(env1),8k(env2)
mem=256M poweroutage=yes
serverip=[anonymised]

There is a corresponding /zyxel/sbin/fw_setenv which doesn't have a
--help parameter, but which nevertheless seems understandable enough:

~ # /zyxel/sbin/fw_setenv test "This is a test"
~ # /zyxel/sbin/fw_printenv test
test=This is a test
~ # /zyxel/sbin/fw_setenv test
~ # /zyxel/sbin/fw_printenv test
## Error: "test" not defined

So I guessed that I needed to try adding 'rootdelay=30' to the bootargs setting ...

~ # /zyxel/sbin/fw_setenv bootargs "console=ttyS0,115200n8
root=/dev/ram0 rootdelay=30 rw init=/sbin/init initrd=0x4a000
000,4M elevator=cfq mtdparts=physmap-flash.0:128k(uboot),1792k(kernel),1664k(initrd),448k(etc),48k(empty),8k(env1),8k(en
v2) mem=256M poweroutage=yes"
~ # /zyxel/sbin/fw_printenv bootargs
bootargs=console=ttyS0,115200n8 root=/dev/ram0 rootdelay=30 rw
init=/sbin/init initrd=0x4a000000,4M elevator=cfq mtdparts=physmap-flash.0:128k(uboot),1792k(kernel),1664k(initrd),448k(etc),48k(empty),8k(env1),8k(env2)
mem=256M poweroutage=yes

.... but, from the sound, I was suspicious that the drives are only
powered up when the kernel loads the driver, and then the driver almost immediately expects them to be present, which would mean that this ploy wouldn't work. However, I was game to try it, as I didn't think it
could do any harm, but, as I feared, it didn't work. The delay comes
long after this point, and too late to affect things. Further
information about the actual point of failure is in the dmesg logs below.

Thanks for trying anyway.

What I think is needed is some way of telling the driver to wait longer
for the drives after spinning them up, or a way of spinning up the
drives on power on.

Appendix:
---------

Two things that I meant to add to my OP yesterday, but forgot, I add now below, in case they prove to be important ...

1) The Zyxel NSA221 NAS is based on an Oxford Semiconductor NAS
controller board design, commonly known as Oxnas. There is some
historical information about the box on the Wayback Machine, for example:

<https://web.archive.org/web/20150422213204/http://zyxel.nas-central.org/wiki/Category:NSA-221>

2) In case anybody spots anything important that I'd missed, these are
the relevant parts of a serial log from two successive boots, first from
cold where it fails to find the second drive, then from a successful
reboot when the drive is already spinning:

ox810sata: OX810 sata core.
scsi0 : oxnassata
ata1: SATA max UDMA/133 irq 18
ata_eh_reset(2207):Sleep 1 sec before any error happens
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: qc timeout (cmd 0xec)
ox810sata_bmdma_stop - aborting DMA
ox810sata aborting DMA.
ox810sata sending sync escapes
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 1 secs ata_eh_reset(2207):Sleep 1 sec before any error happens ata_wait_after_reset(3500):msleep(6000);
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: qc timeout (cmd 0xec)
ox810sata_bmdma_stop - aborting DMA
ox810sata aborting DMA.
ox810sata sending sync escapes
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 1 secs ata_eh_reset(2207):Sleep 1 sec before any error happens ata_wait_after_reset(3500):msleep(6000);
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: HPA detected: current 7814037168, native 18446744072933654192
ata1.00: ATA-8: TOSHIBA HDWE140, FP2A, max UDMA/100
ata1.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: Drive reports diagnostics failure. This may indicate a drive
ata1.00: fault or invalid emulation. Contact drive vendor for information. ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: configured for UDMA/100
scsi 0:0:0:0: Direct-Access ATA TOSHIBA HDWE140 FP2A PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 7814037168 512-byte hardware sectors (4000787 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 7814037168 512-byte hardware sectors (4000787 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
ox810sata: OX810 sata core.
scsi1 : oxnassata
ata2: SATA max UDMA/133 irq 18
ata_eh_reset(2207):Sleep 1 sec before any error happens
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata2.00: qc timeout (cmd 0xec)
ox810sata_bmdma_stop - aborting DMA
ox810sata aborting DMA.
ox810sata sending sync escapes
Port 0 High level registers

[snip long register dump]

oxnas_dma_dump_registers() - end
ox810sata core reset
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2: failed to recover some devices, retrying in 1 secs ata_eh_reset(2207):Sleep 1 sec before any error happens ata_wait_after_reset(3500):msleep(6000);
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata2.00: qc timeout (cmd 0xec)
ox810sata_bmdma_stop - aborting DMA
ox810sata aborting DMA.
ox810sata sending sync escapes
Port 0 High level registers

[snip long register dump]

oxnas_dma_dump_registers() - end
ox810sata core reset
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2: failed to recover some devices, retrying in 1 secs ata_eh_reset(2207):Sleep 1 sec before any error happens ata_wait_after_reset(3500):msleep(6000);
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata2.00: qc timeout (cmd 0xec)
ox810sata_bmdma_stop - aborting DMA
ox810sata aborting DMA.
ox810sata sending sync escapes
Port 0 High level registers

[snip long register dump]

oxnas_dma_dump_registers() - end
ox810sata core reset
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2: failed to recover some devices, retrying in 1 secs ata_eh_reset(2207):Sleep 1 sec before any error happens ata_wait_after_reset(3500):msleep(6000);
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

This is what it looks like on the reboot, when the Seagate disk is
already spun up:

ox810sata: OX810 sata core.
scsi0 : oxnassata
ata1: SATA max UDMA/133 irq 18
ata_eh_reset(2207):Sleep 1 sec before any error happens
Copy button release
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: HPA detected: current 7814037168, native 18446744072933654192
ata1.00: ATA-8: TOSHIBA HDWE140, FP2A, max UDMA/100
ata1.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: configured for UDMA/100
ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xb t4
ata1: soft resetting link
ata_eh_reset(2207):Sleep 1 sec before any error happens
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata1.00: configured for UDMA/100
ata1: EH complete
scsi 0:0:0:0: Direct-Access ATA TOSHIBA HDWE140 FP2A PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 7814037168 512-byte hardware sectors (4000787 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 7814037168 512-byte hardware sectors (4000787 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
ox810sata: OX810 sata core.
scsi1 : oxnassata
ata2: SATA max UDMA/133 irq 18
ata_eh_reset(2207):Sleep 1 sec before any error happens
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata_dev_read_id(2037): Give 100ms while getting HW ID
ata2.00: ATA-11: ST8000VN004-2M2101, SC60, max UDMA/133
ata2.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata_dev_read_id(2037): Give 100ms while getting HW ID
ata2.00: configured for UDMA/133
scsi 1:0:0:0: Direct-Access ATA ST8000VN004-2M21 SC60 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
sd 1:0:0:0: [sdb] 15628053168 512-byte hardware sectors (8001563 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
sd 1:0:0:0: [sdb] 15628053168 512-byte hardware sectors (8001563 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 1:0:0:0: Attached scsi generic sg1 type 0

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Andy Burns@2:250/1 to All on Tuesday, May 06, 2025 12:37:04

Java Jive wrote:

from the sound, I was suspicious that the drives are only powered up
when the kernel loads the driver, and then the driver almost immediately expects them to be present, which would mean that this ploy wouldn't
work. However, I was game to try it, as I didn't think it could do any harm, but, as I feared, it didn't work. The delay comes long after this point, and too late to affect things.

A wild thought (which I can't test as I no longer run openWRT so no
u-boot device) add the normal kernel as a crash kernel, let the first
kernel boot and spin the drives up too late ... then find a way to crash
it, so the crash kernel starts up affer the drives are spinning?

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Theo@2:250/1 to All on Tuesday, May 06, 2025 15:07:09

In uk.comp.os.linux Andy Burns <usenet@andyburns.uk> wrote:

Java Jive wrote:

from the sound, I was suspicious that the drives are only powered up
when the kernel loads the driver, and then the driver almost immediately expects them to be present, which would mean that this ploy wouldn't work. However, I was game to try it, as I didn't think it could do any harm, but, as I feared, it didn't work. The delay comes long after this point, and too late to affect things.

A wild thought (which I can't test as I no longer run openWRT so no
u-boot device) add the normal kernel as a crash kernel, let the first
kernel boot and spin the drives up too late ... then find a way to crash
it, so the crash kernel starts up affer the drives are spinning?

I'm not sure that's an intrinsic feature of uboot - the idea of booting into something different second time around is a feature of OpenWRT I think. I don't know how they do it. None of the other systems I've worked on with u-boot do that.

However uboot does have its own boot delay. You:

setenv bootdelay 120

to delay 120 seconds. In the Zyxel case it appears that would be:

~ # /zyxel/sbin/fw_setenv bootdelay 120

The difference is that this is prior to the kernel booting so the SATA
driver does not fire up. However, if spinup only happens when the driver begins talking to the drive then this won't help.

I suppose another option if that happens is to try to talk to the SATA drive
in uboot, which might commence spinup. Docs: https://github.com/u-boot/u-boot/blob/master/doc/README.sata

maybe the 'sata info' command is enough to wake the HDD, and even if it
fails you can then 'sleep 30' or something while the drive spins up, and
then boot Linux.

(I'm assuming the Zyxel firmware will let you edit the u-boot command
script, not just the u-boot environment variables)

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Andy Burns@2:250/1 to All on Tuesday, May 06, 2025 15:24:21

Theo wrote:

I'm not sure that's an intrinsic feature of uboot - the idea of booting into something different second time around is a feature of OpenWRT I think.

Crash kernels aren't an openẀRT thing, just a Linux thing, I'm sure
years ago they were more generic, but now seems focused just as a kdump kernel, probably too memory hungry for a uboot type of device anyway ...

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Tuesday, May 06, 2025 23:52:30

On 2025-05-06 15:07, Theo wrote:

~ # /zyxel/sbin/fw_setenv bootdelay 120

The difference is that this is prior to the kernel booting so the SATA
driver does not fire up. However, if spinup only happens when the driver begins talking to the drive then this won't help.

Yes, I tried 20 seconds, so now the message ...

Hit any key to stop autoboot:

.... displays for 20 seconds instead of the 3 seconds previously, but
this did not help because the HDs didn't spin up upon power on, only
when the driver loaded.

I suppose another option if that happens is to try to talk to the SATA drive in uboot, which might commence spinup. Docs: https://github.com/u-boot/u-boot/blob/master/doc/README.sata

maybe the 'sata info' command is enough to wake the HDD, and even if it
fails you can then 'sleep 30' or something while the drive spins up, and
then boot Linux.

(I'm assuming the Zyxel firmware will let you edit the u-boot command
script, not just the u-boot environment variables)

I may look into this, though I think the best solution would be to fix
the short delay in the SATA driver module, and I'm trying to find a
suitable place in the source files to do that. Meanwhile, a simpler possibility would be to write the autoreboot flag into the UBoot
environment, because that doesn't need the HDs to be found to provide
storage, and would survive a reboot. I may try this as a temporary fix,
until and if I can investigate a better solution.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Theo@2:250/1 to All on Wednesday, May 07, 2025 14:50:18

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

On 2025-05-06 15:07, Theo wrote:

~ # /zyxel/sbin/fw_setenv bootdelay 120

The difference is that this is prior to the kernel booting so the SATA driver does not fire up. However, if spinup only happens when the driver begins talking to the drive then this won't help.

Yes, I tried 20 seconds, so now the message ...

Hit any key to stop autoboot:

... displays for 20 seconds instead of the 3 seconds previously, but
this did not help because the HDs didn't spin up upon power on, only
when the driver loaded.

I suppose another option if that happens is to try to talk to the SATA drive
in uboot, which might commence spinup. Docs: https://github.com/u-boot/u-boot/blob/master/doc/README.sata

maybe the 'sata info' command is enough to wake the HDD, and even if it fails you can then 'sleep 30' or something while the drive spins up, and then boot Linux.

(I'm assuming the Zyxel firmware will let you edit the u-boot command script, not just the u-boot environment variables)

I may look into this, though I think the best solution would be to fix
the short delay in the SATA driver module, and I'm trying to find a
suitable place in the source files to do that. Meanwhile, a simpler possibility would be to write the autoreboot flag into the UBoot environment, because that doesn't need the HDs to be found to provide storage, and would survive a reboot. I may try this as a temporary fix, until and if I can investigate a better solution.

Are you sure this delay isn't just the drive set to spin down when not used? Perhaps they boot in the spun-down state. When you try to access a drive that's spun down, the system will often hang waiting for it to spin up.
Since the kernel wants to read the partition table stored on the disc I'm
not surprised if it hangs if the drive isn't spinning, and maybe times out.

The simplest way to adjust it with a Windows tool like SeaTools - there's
now a version for Linux and a bootable USB version too.

It's also worth checking if there's updated drive firmware, which may also address the problem.

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Carlos E.R.@2:250/1 to All on Thursday, May 08, 2025 12:57:18

On 2025-05-07 15:50, Theo wrote:

Are you sure this delay isn't just the drive set to spin down when not used? Perhaps they boot in the spun-down state. When you try to access a drive that's spun down, the system will often hang waiting for it to spin up.
Since the kernel wants to read the partition table stored on the disc I'm
not surprised if it hangs if the drive isn't spinning, and maybe times out.

The simplest way to adjust it with a Windows tool like SeaTools - there's
now a version for Linux and a bootable USB version too.

Some boxes set that OFF timeout themselves, not in the disks, so that it
is impossible to modify. At ten minutes of no activity, they power down
the disks.

--
Cheers, Carlos.

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Thursday, May 08, 2025 13:11:30

On 2025-05-08 12:57, Carlos E.R. wrote:

On 2025-05-07 15:50, Theo wrote:

Are you sure this delay isn't just the drive set to spin down when not
used?
Perhaps they boot in the spun-down state. When you try to access a drive >> that's spun down, the system will often hang waiting for it to spin up.
Since the kernel wants to read the partition table stored on the disc I'm
not surprised if it hangs if the drive isn't spinning, and maybe times
out.

The simplest way to adjust it with a Windows tool like SeaTools - there's
now a version for Linux and a bootable USB version too.

Some boxes set that OFF timeout themselves, not in the disks, so that it
is impossible to modify. At ten minutes of no activity, they power down
the disks.

It's not the problem here anyway. The disks are set to sleep, but, so
far at least, that simply means a short delay until the box responds
over the network and the directory is listed or whatever. That is
different behaviour from what is happening, or rather not happening, on
first boot.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Saturday, May 31, 2025 16:30:28

If anyone needs a reminder, the original problem is appended below, this
new thread/subthread is about my attempts to fix it.

The firmware for these Zyxel NSA221 NAS boxes is split into three binary
files and a numbers of associated checksums and scripts, described in
comments in an unpacking script as follows ...

# DATA_0000: header version
# DATA_0001: firmware version
# DATA_0002: firmware revision
# DATA_0101: model number 1
# DATA_0102: model number 2
# DATA_0200: core checksum
# DATA_0201: ZLD checksum
# DATA_0202: ROM checksum
# DATA_0203: InitRD checksum
# DATA_1000: kernel file, uImage
# DATA_1002: InitRD image, initrd.img.gz
# DATA_1004: System disk image, sysdisk.img.gz
# DATA_a000: executable, for some jobs before firmware upgrade
# DATA_a002: executable, for some jobs after firmware upgrade

Note that the last two are legacy scripts which with recent builds do
not actually do anything.

To create a firmware file, these are packaged up into a single binary
file, which is then unpacked as above when the firmware is applied. The packing and unpacking are done by shell scripts which call Zyxel
cut-down versions of a program called CONV723.EXE, which themselves are
called ram2bin and bin2ram.

I have the software development kit for the NASs, and several years ago
built an image which works on both NASs, for which I still have the
above component files, but for historical reasons lost in time cannot
now seem to replicate that build from any of the existing or backed up
build directories. In fact, rather strangely, none of them now produce anything that will boot, even unpacking the entire SDK afresh from
scratch into a new directory and building an image from that, it too
doesn't boot!

So I've been trying a different approach, that of unpacking the working
build, modifying the initrd, and repacking it, but this too crashes with
a kernel panic. Even if I simply unpack the initrd, and re-pack it
UNCHANGED, EXACTLY AS IT WAS BEFORE, even that gives a kernel panic.

The packing is done by a script called 'makeras_gpl.sh', the most
relevant section from which reads as follows:

# Updates ROM_CHECKSUM in {METADATA}, generate romfile_checksum,
zyconf.tgz and zyconf.rom
../make_zyconf.sh

# Updates CORE_CHECKSUM in ${METADATA}, generate core_checksum ../make_kernel.sh

# Update ZLD_CHECKSUM in ${METADATA}, generate sysdisk.img.gz and
zld_checksum
../make_sysdisk.sh

# Update INITRD_CHECKSUM in ${METADATA}, generate initrd.img.gz and initrd_checksum
../make_initrd.sh

# pack firmware with BETA version
../fw_pack -r ${METADATA} -o tlv.bin
../ram2bin -i tlv.bin -o ras.bin -e "${MODELNAME}" -t 4
mv ras.bin ${fBETA}
chmod 644 ${fBETA}
echo " ==> Beta version file ${fBETA} is created. --> ${vBETA}"

What I would have readers note is that initrd is the last subcomponent
to be built, so it's difficult to see how rebuilding it separately can
alter anything else, for example by having a different checksum, because everything else has already been built. The relevant section of 'make_initrd.sh' is as follows:

echo -e " \033[1;31m>> Enter Critcal Section! DO NOT CTRL+C <<\033[0m"

mv fs.initrd initrd
tar -zcf initrd.tar.gz initrd/
mv initrd fs.initrd

# Create ext2 image
mkdir initrd
dd if=/dev/zero of=initrd.img bs=1k count=8192
/sbin/mkfs.ext2 -F -v -m0 initrd.img
sudo mount -o loop initrd.img initrd/
sudo tar -zvxf initrd.tar.gz initrd
sudo umount initrd/

echo -e " \033[1;32m<< Exit Critcal Section! >>\033[0m"

sudo gzip -9 < initrd.img > initrd.img.gz
sudo rm -rf initrd
sudo rm -f initrd.tar.gz
sudo rm -f initrd.img

INITRDCHECKSUM=`./ram2bin -i initrd.img.gz -e "${MODELNAME}" -t 4 -q -f`
sed -i -e "s/^INITRD_CHECKSUM.*/INITRD_CHECKSUM\tvalue\t`echo ${INITRDCHECKSUM}`/g" ${METADATA}

I've gone through these steps individually a number of times in case I'd
made mistakes, but even with unchanged initrd files, I've never got past
the kernel panic, the relevant part of the dmesg log from which reads as follows:

physmap platform flash device: 00400000 at 41000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: Swapping erase regions for broken CFI table.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
7 cmdlinepart partitions found on MTD device physmap-flash.0
Creating 7 MTD partitions on "physmap-flash.0":
0x00000000-0x00020000 : "uboot"
mtd: Giving out device 0 to uboot
0x00020000-0x001e0000 : "kernel"
mtd: Giving out device 1 to kernel
0x001e0000-0x00380000 : "initrd"
mtd: Giving out device 2 to initrd
0x00380000-0x003f0000 : "etc"
mtd: Giving out device 3 to etc
0x003f0000-0x003fc000 : "empty"
mtd: Giving out device 4 to empty
0x003fc000-0x003fe000 : "env1"
mtd: Giving out device 5 to env1
0x003fe000-0x00400000 : "env2"
mtd: Giving out device 6 to env2
10 Dec 2004 USB 2.0 'Enhanced' Host Controller (EHCI) Driver@e7000000
Device ID register 42fa05
oxnas-ehci oxnas-ehci.0: OXNAS EHCI Host Controller
oxnas-ehci oxnas-ehci.0: new USB bus registered, assigned bus number 1 oxnas-ehci oxnas-ehci.0: irq 7, io mem 0x00000000
oxnas-ehci oxnas-ehci.0: USB 0.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
USB Universal Host Controller Interface driver v3.0
sl811: driver sl811-hcd, 19 May 2005
usb 1-1: new high speed USB device using oxnas-ehci and address 2
In hub_port_init, and number is 0, retry 0, port 1 .....
usb 1-1: configuration #1 chosen from 1 choice
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.2: new high speed USB device using oxnas-ehci and address 3
In hub_port_init, and number is 1, retry 0, port 2 .....
usb 1-1.2: configuration #1 chosen from 1 choice
usbcore: registered new interface driver usblp
Initializing USB Mass Storage driver...
scsi2 : SCSI emulation for USB Mass Storage devices
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
pcf8563 0-0051: chip found, driver version 0.4.2
pcf8563 0-0051: rtc core: registered pcf8563 as rtc0
OXNAS bit-bash I2C driver initialisation OK
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc)
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0

# Above is normal
# Below is crash

EXT3-fs: Magic mismatch, very weird !
List of all partitions:
0800 3907018584 sda driver: sd
0801 498688 sda1
0802 3906518016 sda2
1f00 128 mtdblock0 (driver?)
1f01 1792 mtdblock1 (driver?)
1f02 1664 mtdblock2 (driver?)
1f03 448 mtdblock3 (driver?)
1f04 48 mtdblock4 (driver?)
1f05 8 mtdblock5 (driver?)
1f06 8 mtdblock6 (driver?)
No filesystem could mount root, tried: ext3 ext2 vfat fuseblk
Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(1,0)

# Below *would* have been a normal continuation for a successful boot

VFS: Mounted root (ext2 filesystem).
Freeing init memory: 116K
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
Mounting file systems...
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
egiga0: PHY is Realtek RTL8211BGR
Resetting GMAC
GMAC reset complete
ifconfig: bad address 'add'
Starting udhcpc ...
INITRD: Trying to mount NAND flash as Root FS.egiga0: PHY is Realtek RTL8211BGR
egiga0: link down
...egiga0: link up, 1000Mbps, full-duplex, not using pause, lpa 0xC1E1
..scsi 2:0:0:0: Direct-Access ZyXEL USB DISK 2.0 PMAP PQ: 0
ANSI: 0 CCS

Any ideas?

On 2025-05-05 10:24, Java Jive wrote:

Having successfully upgraded my two primary QNAP 251+ NASs, I've handed
down two of the HDs to the Zyxel NSA221s NASs that were my original
backup solution. The disks are Seagate Iron Wolf 8TB ST8000VN004, originally from Amazon ...

https://www.amazon.co.uk/dp/B07SZVVBBK

... but they are giving problems in their new home, or, I should say, housing.

The problem is the same for each, they spin up too slowly to be found as
the NSA221 boots, so after a cold boot I have to reboot, so that then
they can be found on the reboot. Once that is done, they seem to be perfectly satisfactory, and, despite the NAS specs saying that they only work up to a max of 4TB per disk, I'm actually getting the full
(nominal) 8TB added to the capacity of the other Toshiba HDs which have always been free of such problems.

I've had similar problems in the past with other Seagate HDs in these
NASs, which at the time I got around by using a reboot flag in a rather convoluted manner, and which now won't work anyway because I have since configured the NASs to use the combined disk space of the two disks as
one large virtual HD volume, which means that now I have nowhere to
write a reboot flag unless both HD are running and found at boot, in
which case I wouldn't need to write it anyway. I might be able to get round this by using RAM, but it would need some investigation as to what would survive a reboot.

This reboot requirement will be easy to forget and should be avoidable
by making the system wait longer for the disks to spin up. Setting
aside the problem of altering firmware (see next para), in a normal
Linux installation, how normally would one accomplish this? A boot parameter? An /etc setting?

I have some scope for making changes, as in the past I've recompiled the firmware to be Gnu GPL and both NASs are running the result. Also the command run by UBoot is stored in an environment variable, which means
that I could add a boot parameter to that fairly easily.

The Zyxel DK to build a GPL firmware dates from the Ubuntu 7 era (!),
but I was still running it satisfactorily somewhere around Ubuntu 16 or 18. Also, the NASs each have a serial header which I can use to
interrupt their boot and change things, though I wouldn't want to be
doing that on a permanent basis, only for temporary fixes to see if they work. There are also OpenWRT versions of the firmware, but having
obtained already a pretty good result from my own firmware, I haven't
gone down that road, as it would be too time consuming for such old hardware.

Can anyone suggest how to fix this slow spin-up problem?

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Paul@2:250/1 to All on Saturday, May 31, 2025 20:55:30

On Sat, 5/31/2025 11:30 AM, Java Jive wrote:

If anyone needs a reminder, the original problem is appended below, this new thread/subthread is about my attempts to fix it.

The firmware for these Zyxel NSA221 NAS boxes is split into three binary files and a numbers of associated checksums and scripts, described in comments in an unpacking script as follows ...

# DATA_0000: header version
# DATA_0001: firmware version
# DATA_0002: firmware revision
# DATA_0101: model number 1
# DATA_0102: model number 2
# DATA_0200: core checksum
# DATA_0201: ZLD checksum
# DATA_0202: ROM checksum
# DATA_0203: InitRD checksum
# DATA_1000: kernel file, uImage
# DATA_1002: InitRD image, initrd.img.gz
# DATA_1004: System disk image, sysdisk.img.gz
# DATA_a000: executable, for some jobs before firmware upgrade
# DATA_a002: executable, for some jobs after firmware upgrade

Note that the last two are legacy scripts which with recent builds do not actually do anything.

To create a firmware file, these are packaged up into a single binary file, which is then unpacked as above when the firmware is applied. The packing and unpacking are done by shell scripts which call Zyxel cut-down versions of a program called CONV723.EXE, which themselves are called ram2bin and bin2ram.

I have the software development kit for the NASs, and several years ago built an image which works on both NASs, for which I still have the above component files, but for historical reasons lost in time cannot now seem to replicate that build from any of the existing or backed up build directories. In fact, rather strangely, none of them now produce anything that will boot, even unpacking the entire SDK afresh from scratch into a new directory and building an image from that, it too doesn't boot!

So I've been trying a different approach, that of unpacking the working build, modifying the initrd, and repacking it, but this too crashes with a kernel panic. Even if I simply unpack the initrd, and re-pack it UNCHANGED, EXACTLY AS IT WAS BEFORE, even that gives a kernel panic.

The packing is done by a script called 'makeras_gpl.sh', the most relevant section from which reads as follows:

# Updates ROM_CHECKSUM in {METADATA}, generate romfile_checksum, zyconf.tgz and zyconf.rom
./make_zyconf.sh

# Updates CORE_CHECKSUM in ${METADATA}, generate core_checksum ./make_kernel.sh

# Update ZLD_CHECKSUM in ${METADATA}, generate sysdisk.img.gz and zld_checksum
./make_sysdisk.sh

# Update INITRD_CHECKSUM in ${METADATA}, generate initrd.img.gz and initrd_checksum
./make_initrd.sh

# pack firmware with BETA version
./fw_pack -r ${METADATA} -o tlv.bin
./ram2bin -i tlv.bin -o ras.bin -e "${MODELNAME}" -t 4
mv ras.bin ${fBETA}
chmod 644 ${fBETA}
echo " ==> Beta version file ${fBETA} is created. --> ${vBETA}"

What I would have readers note is that initrd is the last subcomponent to be built, so it's difficult to see how rebuilding it separately can alter anything else, for example by having a different checksum, because everything else has already been built. The relevant section of 'make_initrd.sh' is as follows:

echo -e " \033[1;31m>> Enter Critcal Section! DO NOT CTRL+C <<\033[0m"

mv fs.initrd initrd
tar -zcf initrd.tar.gz initrd/
mv initrd fs.initrd

# Create ext2 image
mkdir initrd
dd if=/dev/zero of=initrd.img bs=1k count=8192
/sbin/mkfs.ext2 -F -v -m0 initrd.img
sudo mount -o loop initrd.img initrd/
sudo tar -zvxf initrd.tar.gz initrd
sudo umount initrd/

echo -e " \033[1;32m<< Exit Critcal Section! >>\033[0m"

sudo gzip -9 < initrd.img > initrd.img.gz
sudo rm -rf initrd
sudo rm -f initrd.tar.gz
sudo rm -f initrd.img

INITRDCHECKSUM=`./ram2bin -i initrd.img.gz -e "${MODELNAME}" -t 4 -q -f`
sed -i -e "s/^INITRD_CHECKSUM.*/INITRD_CHECKSUM\tvalue\t`echo ${INITRDCHECKSUM}`/g" ${METADATA}

I've gone through these steps individually a number of times in case I'd made mistakes, but even with unchanged initrd files, I've never got past the kernel panic, the relevant part of the dmesg log from which reads as follows:

physmap platform flash device: 00400000 at 41000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: Swapping erase regions for broken CFI table.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
7 cmdlinepart partitions found on MTD device physmap-flash.0
Creating 7 MTD partitions on "physmap-flash.0":
0x00000000-0x00020000 : "uboot"
mtd: Giving out device 0 to uboot
0x00020000-0x001e0000 : "kernel"
mtd: Giving out device 1 to kernel
0x001e0000-0x00380000 : "initrd"
mtd: Giving out device 2 to initrd
0x00380000-0x003f0000 : "etc"
mtd: Giving out device 3 to etc
0x003f0000-0x003fc000 : "empty"
mtd: Giving out device 4 to empty
0x003fc000-0x003fe000 : "env1"
mtd: Giving out device 5 to env1
0x003fe000-0x00400000 : "env2"
mtd: Giving out device 6 to env2
10 Dec 2004 USB 2.0 'Enhanced' Host Controller (EHCI) Driver@e7000000 Device ID register 42fa05
oxnas-ehci oxnas-ehci.0: OXNAS EHCI Host Controller
oxnas-ehci oxnas-ehci.0: new USB bus registered, assigned bus number 1 oxnas-ehci oxnas-ehci.0: irq 7, io mem 0x00000000
oxnas-ehci oxnas-ehci.0: USB 0.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
USB Universal Host Controller Interface driver v3.0
sl811: driver sl811-hcd, 19 May 2005
usb 1-1: new high speed USB device using oxnas-ehci and address 2
In hub_port_init, and number is 0, retry 0, port 1 .....
usb 1-1: configuration #1 chosen from 1 choice
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.2: new high speed USB device using oxnas-ehci and address 3
In hub_port_init, and number is 1, retry 0, port 2 .....
usb 1-1.2: configuration #1 chosen from 1 choice
usbcore: registered new interface driver usblp
Initializing USB Mass Storage driver...
scsi2 : SCSI emulation for USB Mass Storage devices
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
pcf8563 0-0051: chip found, driver version 0.4.2
pcf8563 0-0051: rtc core: registered pcf8563 as rtc0
OXNAS bit-bash I2C driver initialisation OK
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc)
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0

# Above is normal
# Below is crash

EXT3-fs: Magic mismatch, very weird !
List of all partitions:
0800 3907018584 sda driver: sd
0801     498688 sda1
0802 3906518016 sda2
1f00        128 mtdblock0 (driver?)
1f01       1792 mtdblock1 (driver?)
1f02       1664 mtdblock2 (driver?)
1f03        448 mtdblock3 (driver?)
1f04         48 mtdblock4 (driver?)
1f05          8 mtdblock5 (driver?)
1f06          8 mtdblock6 (driver?)
No filesystem could mount root, tried: ext3 ext2 vfat fuseblk
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)

# Below *would* have been a normal continuation for a successful boot

VFS: Mounted root (ext2 filesystem).
Freeing init memory: 116K
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
Mounting file systems...
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
egiga0: PHY is Realtek RTL8211BGR
Resetting GMAC
GMAC reset complete
ifconfig: bad address 'add'
Starting udhcpc ...
INITRD: Trying to mount NAND flash as Root FS.egiga0: PHY is Realtek RTL8211BGR
egiga0: link down
..egiga0: link up, 1000Mbps, full-duplex, not using pause, lpa 0xC1E1
.scsi 2:0:0:0: Direct-Access     ZyXEL    USB DISK 2.0     PMAP PQ: 0 ANSI: 0 CCS

Any ideas?

On 2025-05-05 10:24, Java Jive wrote:

Having successfully upgraded my two primary QNAP 251+ NASs, I've handed down two of the HDs to the Zyxel NSA221s NASs that were my original backup solution. The disks are Seagate Iron Wolf 8TB ST8000VN004, originally from Amazon ...

https://www.amazon.co.uk/dp/B07SZVVBBK

... but they are giving problems in their new home, or, I should say, housing.

The problem is the same for each, they spin up too slowly to be found as the NSA221 boots, so after a cold boot I have to reboot, so that then they can be found on the reboot. Once that is done, they seem to be perfectly satisfactory, and, despite the NAS specs saying that they only work up to a max of 4TB per disk, I'm actually getting the full (nominal) 8TB added to the capacity of the other Toshiba HDs which have always been free of such problems.

I've had similar problems in the past with other Seagate HDs in these NASs, which at the time I got around by using a reboot flag in a rather convoluted manner, and which now won't work anyway because I have since configured the NASs to use the combined disk space of the two disks as one large virtual HD volume, which means that now I have nowhere to write a reboot flag unless both HD are running and found at boot, in which case I wouldn't need to write it anyway. I might be able to get round this by using RAM, but it would need some investigation as to what would survive a reboot.

This reboot requirement will be easy to forget and should be avoidable by making the system wait longer for the disks to spin up. Setting aside the problem of altering firmware (see next para), in a normal Linux installation, how normally would one accomplish this? A boot parameter? An /etc setting?

I have some scope for making changes, as in the past I've recompiled the firmware to be Gnu GPL and both NASs are running the result. Also the command run by UBoot is stored in an environment variable, which means that I could add a boot parameter to that fairly easily.

The Zyxel DK to build a GPL firmware dates from the Ubuntu 7 era (!), but I was still running it satisfactorily somewhere around Ubuntu 16 or 18. Also, the NASs each have a serial header which I can use to interrupt their boot and change things, though I wouldn't want to be doing that on a permanent basis, only for temporary fixes to see if they work. There are also OpenWRT versions of the firmware, but having obtained already a pretty good result from my own firmware, I haven't gone down that road, as it would be too time consuming for such old hardware.

Can anyone suggest how to fix this slow spin-up problem?

What does "tune2fs" say about the parametrics of the filesystem ?

https://media.geeksforgeeks.org/wp-content/uploads/20230929130854/Image-3.png

Have you previously removed the drive and mounted it on a technician machine ? Maybe some damage was done to it, while it was out of the NAS and being probed.

There's got to be some reason that not even the magic number is correct.

*******

NOR flash can have bad bits in it, but that does not happen all that often.
The most likely place for a failure, is segments which are flashed during
each boot, and even with the high cycle count NOR flash supports, that sometimes leads to grief.

The flash load can be segmented, and each chunk has a checksum. It normally isn't possible to capture an image of an entire flash chip, and just compare
it to an entire image held in hand. The validity may only be able to be determined
by knowing the start and end address of a chunk and verifying it. Automation
in the tools would be the preferred way to determine the flash itself
wasn't causing a corruption. Normally, with flash devices, the loader
will halt, if a portion of what it is loading is defective.

*******

Processors do not normally go defective. Sometimes bad batches escape
the factory. And a NAS box is highly unlikely to have been overclocked
for most of its life.

As for your firmware kit, I would have frozen the working environment
at Ubuntu 7. In the hopes I would always have an old machine to run it on. Dragging a build environment along, say an unsupported one, on a dynamic
OS situation, that's kinda asking for trouble.

Paul

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Sunday, June 01, 2025 12:08:20

On 2025-05-31 20:55, Paul wrote:

On Sat, 5/31/2025 11:30 AM, Java Jive wrote:

If anyone needs a reminder, the original problem is appended below, this new thread/subthread is about my attempts to fix it.

The firmware for these Zyxel NSA221 NAS boxes is split into three binary files and a numbers of associated checksums and scripts, described in comments in an unpacking script as follows ...

# DATA_0000: header version
# DATA_0001: firmware version
# DATA_0002: firmware revision
# DATA_0101: model number 1
# DATA_0102: model number 2
# DATA_0200: core checksum
# DATA_0201: ZLD checksum
# DATA_0202: ROM checksum
# DATA_0203: InitRD checksum
# DATA_1000: kernel file, uImage
# DATA_1002: InitRD image, initrd.img.gz
# DATA_1004: System disk image, sysdisk.img.gz
# DATA_a000: executable, for some jobs before firmware upgrade
# DATA_a002: executable, for some jobs after firmware upgrade

Note that the last two are legacy scripts which with recent builds do not actually do anything.

To create a firmware file, these are packaged up into a single binary file, which is then unpacked as above when the firmware is applied. The packing and unpacking are done by shell scripts which call Zyxel cut-down versions of a program called CONV723.EXE, which themselves are called ram2bin and bin2ram.

I have the software development kit for the NASs, and several years ago built an image which works on both NASs, for which I still have the above component files, but for historical reasons lost in time cannot now seem to replicate that build from any of the existing or backed up build directories. In fact, rather strangely, none of them now produce anything that will boot, even unpacking the entire SDK afresh from scratch into a new directory and building an image from that, it too doesn't boot!

So I've been trying a different approach, that of unpacking the working build, modifying the initrd, and repacking it, but this too crashes with a kernel panic. Even if I simply unpack the initrd, and re-pack it UNCHANGED, EXACTLY AS IT WAS BEFORE, even that gives a kernel panic.

The packing is done by a script called 'makeras_gpl.sh', the most relevant section from which reads as follows:

# Updates ROM_CHECKSUM in {METADATA}, generate romfile_checksum, zyconf.tgz and zyconf.rom
./make_zyconf.sh

# Updates CORE_CHECKSUM in ${METADATA}, generate core_checksum
./make_kernel.sh

# Update ZLD_CHECKSUM in ${METADATA}, generate sysdisk.img.gz and zld_checksum
./make_sysdisk.sh

# Update INITRD_CHECKSUM in ${METADATA}, generate initrd.img.gz and initrd_checksum
./make_initrd.sh

# pack firmware with BETA version
./fw_pack -r ${METADATA} -o tlv.bin
./ram2bin -i tlv.bin -o ras.bin -e "${MODELNAME}" -t 4
mv ras.bin ${fBETA}
chmod 644 ${fBETA}
echo " ==> Beta version file ${fBETA} is created. --> ${vBETA}"

What I would have readers note is that initrd is the last subcomponent to be built, so it's difficult to see how rebuilding it separately can alter anything else, for example by having a different checksum, because everything else has already been built. The relevant section of 'make_initrd.sh' is as follows:

echo -e " \033[1;31m>> Enter Critcal Section! DO NOT CTRL+C <<\033[0m"

mv fs.initrd initrd
tar -zcf initrd.tar.gz initrd/
mv initrd fs.initrd

# Create ext2 image
mkdir initrd
dd if=/dev/zero of=initrd.img bs=1k count=8192
/sbin/mkfs.ext2 -F -v -m0 initrd.img
sudo mount -o loop initrd.img initrd/
sudo tar -zvxf initrd.tar.gz initrd
sudo umount initrd/

echo -e " \033[1;32m<< Exit Critcal Section! >>\033[0m"

sudo gzip -9 < initrd.img > initrd.img.gz
sudo rm -rf initrd
sudo rm -f initrd.tar.gz
sudo rm -f initrd.img

INITRDCHECKSUM=`./ram2bin -i initrd.img.gz -e "${MODELNAME}" -t 4 -q -f`
sed -i -e "s/^INITRD_CHECKSUM.*/INITRD_CHECKSUM\tvalue\t`echo ${INITRDCHECKSUM}`/g" ${METADATA}

I've gone through these steps individually a number of times in case I'd made mistakes, but even with unchanged initrd files, I've never got past the kernel panic, the relevant part of the dmesg log from which reads as follows:

physmap platform flash device: 00400000 at 41000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: Swapping erase regions for broken CFI table.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
7 cmdlinepart partitions found on MTD device physmap-flash.0
Creating 7 MTD partitions on "physmap-flash.0":
0x00000000-0x00020000 : "uboot"
mtd: Giving out device 0 to uboot
0x00020000-0x001e0000 : "kernel"
mtd: Giving out device 1 to kernel
0x001e0000-0x00380000 : "initrd"
mtd: Giving out device 2 to initrd
0x00380000-0x003f0000 : "etc"
mtd: Giving out device 3 to etc
0x003f0000-0x003fc000 : "empty"
mtd: Giving out device 4 to empty
0x003fc000-0x003fe000 : "env1"
mtd: Giving out device 5 to env1
0x003fe000-0x00400000 : "env2"
mtd: Giving out device 6 to env2
10 Dec 2004 USB 2.0 'Enhanced' Host Controller (EHCI) Driver@e7000000 Device ID register 42fa05
oxnas-ehci oxnas-ehci.0: OXNAS EHCI Host Controller
oxnas-ehci oxnas-ehci.0: new USB bus registered, assigned bus number 1
oxnas-ehci oxnas-ehci.0: irq 7, io mem 0x00000000
oxnas-ehci oxnas-ehci.0: USB 0.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
USB Universal Host Controller Interface driver v3.0
sl811: driver sl811-hcd, 19 May 2005
usb 1-1: new high speed USB device using oxnas-ehci and address 2
In hub_port_init, and number is 0, retry 0, port 1 .....
usb 1-1: configuration #1 chosen from 1 choice
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.2: new high speed USB device using oxnas-ehci and address 3
In hub_port_init, and number is 1, retry 0, port 2 .....
usb 1-1.2: configuration #1 chosen from 1 choice
usbcore: registered new interface driver usblp
Initializing USB Mass Storage driver...
scsi2 : SCSI emulation for USB Mass Storage devices
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
pcf8563 0-0051: chip found, driver version 0.4.2
pcf8563 0-0051: rtc core: registered pcf8563 as rtc0
OXNAS bit-bash I2C driver initialisation OK
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc)
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0

# Above is normal
# Below is crash

EXT3-fs: Magic mismatch, very weird !
List of all partitions:
0800 3907018584 sda driver: sd
0801     498688 sda1
0802 3906518016 sda2
1f00        128 mtdblock0 (driver?)
1f01       1792 mtdblock1 (driver?)
1f02       1664 mtdblock2 (driver?)
1f03        448 mtdblock3 (driver?)
1f04         48 mtdblock4 (driver?)
1f05          8 mtdblock5 (driver?)
1f06          8 mtdblock6 (driver?)
No filesystem could mount root, tried: ext3 ext2 vfat fuseblk
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)

# Below *would* have been a normal continuation for a successful boot

VFS: Mounted root (ext2 filesystem).
Freeing init memory: 116K
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
Mounting file systems...
MTD_open
MTD_ioctl
MTD_read
MTD_close
MTD_open
MTD_ioctl
MTD_read
MTD_close
egiga0: PHY is Realtek RTL8211BGR
Resetting GMAC
GMAC reset complete
ifconfig: bad address 'add'
Starting udhcpc ...
INITRD: Trying to mount NAND flash as Root FS.egiga0: PHY is Realtek RTL8211BGR
egiga0: link down
..egiga0: link up, 1000Mbps, full-duplex, not using pause, lpa 0xC1E1
.scsi 2:0:0:0: Direct-Access     ZyXEL    USB DISK 2.0     PMAP PQ: 0 ANSI: 0 CCS

Any ideas?

What does "tune2fs" say about the parametrics of the filesystem ?

https://media.geeksforgeeks.org/wp-content/uploads/20230929130854/Image-3.png

Have you previously removed the drive and mounted it on a technician machine ?
Maybe some damage was done to it, while it was out of the NAS and being probed.

There's got to be some reason that not even the magic number is correct.

I don't think I would be able to run a graphical tool on it. The only
access I have is via ...

Web interface
ssh command-line
USB and Ethernet

.... unlike the more modern QNAPs, which each have an HDMI output and a
remote control that I've never used.

*******

NOR flash can have bad bits in it, but that does not happen all that often. The most likely place for a failure, is segments which are flashed during each boot, and even with the high cycle count NOR flash supports, that sometimes leads to grief.

The flash load can be segmented, and each chunk has a checksum. It normally isn't possible to capture an image of an entire flash chip, and just compare it to an entire image held in hand. The validity may only be able to be determined
by knowing the start and end address of a chunk and verifying it. Automation in the tools would be the preferred way to determine the flash itself
wasn't causing a corruption. Normally, with flash devices, the loader
will halt, if a portion of what it is loading is defective.

Don't think this would explain why only particular builds show the
kernel panic. The original working build, the initrd of which I'm
trying to adapt, boots fine. It's only the attempts to customise the
initrd which crash in a kernel panic.

*******

Processors do not normally go defective. Sometimes bad batches escape
the factory. And a NAS box is highly unlikely to have been overclocked
for most of its life.

As for your firmware kit, I would have frozen the working environment
at Ubuntu 7. In the hopes I would always have an old machine to run it on. Dragging a build environment along, say an unsupported one, on a dynamic
OS situation, that's kinda asking for trouble.

Yes, perhaps, but my worst mistake seems to have been not to have copied
the build directory of the working build into a new build directory to continue development thereafter, which seems to have meant that the
backup of what was working later got overwritten by backups of what did
not. I'm a bit embarrassed and annoyed at my own stupidity there.

I'm about to try to do a new build from scratch, to see if I can work
out what is going wrong.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Theo@2:250/1 to All on Monday, June 02, 2025 13:21:34

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

mv fs.initrd initrd
tar -zcf initrd.tar.gz initrd/
mv initrd fs.initrd

# Create ext2 image
mkdir initrd
dd if=/dev/zero of=initrd.img bs=1k count=8192
/sbin/mkfs.ext2 -F -v -m0 initrd.img
sudo mount -o loop initrd.img initrd/
sudo tar -zvxf initrd.tar.gz initrd
sudo umount initrd/

echo -e " \033[1;32m<< Exit Critcal Section! >>\033[0m"

sudo gzip -9 < initrd.img > initrd.img.gz
sudo rm -rf initrd
sudo rm -f initrd.tar.gz
sudo rm -f initrd.img

INITRDCHECKSUM=`./ram2bin -i initrd.img.gz -e "${MODELNAME}" -t 4 -q -f`
sed -i -e "s/^INITRD_CHECKSUM.*/INITRD_CHECKSUM\tvalue\t`echo ${INITRDCHECKSUM}`/g" ${METADATA}

RAMDISK: Compressed image found at block 0

# Above is normal
# Below is crash

EXT3-fs: Magic mismatch, very weird !
List of all partitions:
0800 3907018584 sda driver: sd
0801 498688 sda1
0802 3906518016 sda2
1f00 128 mtdblock0 (driver?)
1f01 1792 mtdblock1 (driver?)
1f02 1664 mtdblock2 (driver?)
1f03 448 mtdblock3 (driver?)
1f04 48 mtdblock4 (driver?)
1f05 8 mtdblock5 (driver?)
1f06 8 mtdblock6 (driver?)
No filesystem could mount root, tried: ext3 ext2 vfat fuseblk
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)

# Below *would* have been a normal continuation for a successful boot

VFS: Mounted root (ext2 filesystem).

So you appear to be making an ext2 FS and gzipping it. Do you get the 'RAMDISK: Compressed image found' in the crash scenario?

Searching on "Magic mismatch, very weird" comes up with some threads. One is hardware failure, the other is about using a non-1k blocksize with (old) mke2fs and a 2007-era ramdisk implementation that doesn't support other than 1k: https://sourceforge.net/p/e2fsprogs/bugs/175/#b0df

Perhaps you could try -b1024 on the mkfs.ext2 command? Or experiment with other blocksizes?

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Monday, June 02, 2025 13:44:45

On 2025-06-02 13:21, Theo wrote:

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

RAMDISK: Compressed image found at block 0

# Above is normal
# Below is crash

EXT3-fs: Magic mismatch, very weird !
List of all partitions:
0800 3907018584 sda driver: sd
0801 498688 sda1
0802 3906518016 sda2
1f00 128 mtdblock0 (driver?)
1f01 1792 mtdblock1 (driver?)
1f02 1664 mtdblock2 (driver?)
1f03 448 mtdblock3 (driver?)
1f04 48 mtdblock4 (driver?)
1f05 8 mtdblock5 (driver?)
1f06 8 mtdblock6 (driver?)
No filesystem could mount root, tried: ext3 ext2 vfat fuseblk
Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(1,0)

# Below *would* have been a normal continuation for a successful boot

VFS: Mounted root (ext2 filesystem).

So you appear to be making an ext2 FS and gzipping it. Do you get the 'RAMDISK: Compressed image found' in the crash scenario?

Yes, doing a diff between the successful and kernel panic boots, the
RAMDISK line is the last line common between the two, thereafter they
diverge as explained.

Searching on "Magic mismatch, very weird" comes up with some threads. One is hardware failure, the other is about using a non-1k blocksize with (old) mke2fs
and a 2007-era ramdisk implementation that doesn't support other than 1k: https://sourceforge.net/p/e2fsprogs/bugs/175/#b0df

Perhaps you could try -b1024 on the mkfs.ext2 command? Or experiment with other blocksizes?

Thanks, may try that later this afternoon, but what baffles me is that I
think I've completely followed the procedure in the original scripts, so
why is the result so different?

BTW, my attempt to rebuild from scratch is failing also, when building Busybox:

CC loginutils/passwd.o
loginutils/passwd.c: In function ‘passwd_main’:
loginutils/passwd.c:93:16: error: storage size of ‘rlimit_fsize’ isn’t known
struct rlimit rlimit_fsize;
^
loginutils/passwd.c:180:2: warning: implicit declaration of function ‘setrlimit’ [-Wimplicit-function-declaration]
setrlimit(RLIMIT_FSIZE, &rlimit_fsize);
^
loginutils/passwd.c:180:12: error: ‘RLIMIT_FSIZE’ undeclared (first use
in this function)
setrlimit(RLIMIT_FSIZE, &rlimit_fsize);
^
loginutils/passwd.c:180:12: note: each undeclared identifier is reported
only once for each function it appears in
loginutils/passwd.c:93:16: warning: unused variable ‘rlimit_fsize’ [-Wunused-variable]
struct rlimit rlimit_fsize;
^
make[2]: *** [loginutils/passwd.o] Error 1
make[1]: *** [loginutils] Error 2
make[1]: Leaving directory `/home/devel/zyxel/NSA-221-GPL/TestBuild/trunk/sysapps/busybox-1.17.2'
make: *** [busybox_initrd] Error 2

Don't understand it, can't remember ever having that error before in any
of the previous builds.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Theo@2:250/1 to All on Monday, June 02, 2025 15:01:30

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

Thanks, may try that later this afternoon, but what baffles me is that I think I've completely followed the procedure in the original scripts, so
why is the result so different?

Not sure, but maybe something in the layout of the modified ext2 fs is
tripping it up that wasn't there before. Are you using a period mkfs.ext2
or a modern one?

BTW, my attempt to rebuild from scratch is failing also, when building Busybox:

CC loginutils/passwd.o
loginutils/passwd.c: In function ‘passwd_main’: loginutils/passwd.c:93:16: error: storage size of ‘rlimit_fsize’ isn’t known
struct rlimit rlimit_fsize;

Could be a compiler thing. I might try with an older compiler.
Or maybe some library is different now from when it was then?
Perhaps try building in a VM of a period-correct Linux distro?

Theo

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: University of Cambridge, England (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Wednesday, June 04, 2025 15:43:43

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 - PART
SOLVED

On 2025-06-02 13:44, Java Jive wrote:

Searching on "Magic mismatch, very weird" comes up with some threads.
One is
hardware failure, the other is about using a non-1k blocksize with
(old) mke2fs
and a 2007-era ramdisk implementation that doesn't support other than 1k:
https://sourceforge.net/p/e2fsprogs/bugs/175/#b0df

Perhaps you could try -b1024 on the mkfs.ext2 command? Or experiment
with
other blocksizes?

Thanks, may try that later this afternoon,

And it worked, adding the -b1024 parameter makes my copying manually the original procedure work! Thanks for that.

I'm making some progress now, I've managed to clean up Zyxel's original scripts somewhat, the originals gave lots of spurious errors in 'dmesg'.
However, the fundamental plan, of doing an automatic reboot one time
only if no storage is detected, didn't work from 'init', I think because 'init' can not be broken into, apparently not even programmatically.
The 'Rebooting' message is displayed, but no reboot actually occurs.

So I tried moving that bit of code to rcS, but I still can't get it to
reboot. Again all the messages are correctly displayed, but no reboot actually occurs.

Still the above problem has been solved thanks to your help, much obliged.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Friday, June 06, 2025 12:25:35

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 -
FULLY SOLVED

On 2025-06-04 15:43, Java Jive wrote:

On 2025-06-02 13:44, Java Jive wrote:

Searching on "Magic mismatch, very weird" comes up with some threads.
One is
hardware failure, the other is about using a non-1k blocksize with
(old) mke2fs
and a 2007-era ramdisk implementation that doesn't support other than
1k:
https://sourceforge.net/p/e2fsprogs/bugs/175/#b0df

Perhaps you could try -b1024 on the mkfs.ext2 command? Or experiment
with
other blocksizes?

Thanks, may try that later this afternoon,

And it worked, adding the -b1024 parameter makes my copying manually the original procedure work! Thanks for that.

[...]

So I tried moving that bit of code to rcS, but I still can't get it to reboot. Again all the messages are correctly displayed, but no reboot actually occurs.

I now have this fully working. If it's of any interest here's the code
from rcS. If on first boot, less than 2 HDs are found, it's sets a flag
in the U-boot environment, which survives a reboot, and then does a
reboot. On the second boot, it wipes the reboot flag and carries on the
boot regardless of how many HDs are found. In my case, the reboot
allows the second HD to be detected during the second boot, so the XFS
storage area spread across both HDs becomes available.

[Beware unintended line wrap, and note that the variables ECHO, WC, etc contain the full initrd path to the binaries concerned]

# Check for HDs
${ECHO} "Checking for found hard drives ..."
HDs="$(${SGMAP}|${GREP} 'ATA'|${WC} -l)"
if [ "${HDs}" -lt 2 ]
then
case "${HDs}" in
0) ${ECHO} "WARNING: No hard drives found!"
;;
1) ${ECHO} "WARNING: Only 1 hard drive found!"
;;
esac
${ECHO} "Checking firmware for reboot flag ..."
REBOOTED="$(${PRINTENV} ${REBOOTFLG})"
if [ -z "${REBOOTED}" ] || [ "${REBOOTED}" == "## Error: \"${REBOOTFLG}\" not defined" ]
then
# Set flag and reboot
${SETENV} ${REBOOTFLG} true
${ECHO} "Rebooting to try to pick up slow-spin-up drives ..."
# The following command is valid according to the help parameter, but fails
# ${UMOUNT} -a
${SLEEP} 5
${REBOOT}
exit
else
# This is already a reboot, but still less than two HDs
# nothing further that can be done here
${ECHO} "Less than 2 hard drives found even after reboot"
fi
else
${ECHO} "2 hard drives found!"
fi
# 2 hard drives were found or this is already a reboot
# so just clear the flag and continue
${ECHO} "Clearing reboot flag in firmware ..."
${SETENV} ${REBOOTFLG}

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Lew Pitcher@2:250/1 to All on Friday, June 06, 2025 13:45:13

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 -
FULLY SOLVED

On Fri, 06 Jun 2025 12:25:35 +0100, Java Jive wrote:

[snip]

I now have this fully working. If it's of any interest here's the code
from rcS. If on first boot, less than 2 HDs are found, it's sets a flag
in the U-boot environment, which survives a reboot, and then does a
reboot. On the second boot, it wipes the reboot flag and carries on the boot regardless of how many HDs are found. In my case, the reboot
allows the second HD to be detected during the second boot, so the XFS storage area spread across both HDs becomes available.

[snip]

${SETENV} ${REBOOTFLG} true
${ECHO} "Rebooting to try to pick up slow-spin-up drives ..."
# The following command is valid according to the help parameter, but fails
# ${UMOUNT} -a

Yah, assuming ${UMOUNT} resolves to something like /bin/umount, then
${UMOUNT} -a
probably would fail here. Primarily while trying to umount the filesystem
that has your scripts cwd, and (because the umount failure left that
filesystem still mounted) the root filesystem.

Remember, umount can't unmount an active mountpoint (one with mountspoints, open files or directories on it), and

a) your script's cwd is most likely located in one of the filesystems
mentioned in /etc/mtab (and, of course, open, because your active
process lives in that cwd),

b) / is probably in your /etc/mtab, and can't be umounted until all
the filesystems that reside on it are umounted, and

c) your use of the -a option effectively asks umount to unmount /all/
filesystems listed in /etc/mtab ("except the proc filesystem")

[snip]

HTH
--
Lew Pitcher
"In Skills We Trust"

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Andy Burns@2:250/1 to All on Friday, June 06, 2025 13:54:21

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 -
FULLY SOLVED

Java Jive wrote:

I now have this fully working.

Now, how long until the drives fail :-P

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Dan Purgert@2:250/1 to All on Friday, June 06, 2025 14:02:22

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 -
FULLY SOLVED

On 2025-06-06, Andy Burns wrote:

Java Jive wrote:

I now have this fully working.

Now, how long until the drives fail :-P

If it's anything like my luck, they actually failed 3 weeks ago, and all
of this fighting is BECAUSE the drives are bad :)

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Java Jive@2:250/1 to All on Friday, June 06, 2025 16:46:44

Subject: Re: Problem With Old Zyxel NSA 221 NASs & Seagate HDs - Part 2 -
FULLY SOLVED

On 2025-06-06 13:45, Lew Pitcher wrote:

On Fri, 06 Jun 2025 12:25:35 +0100, Java Jive wrote:

[snip]

I now have this fully working. If it's of any interest here's the code
from rcS. If on first boot, less than 2 HDs are found, it's sets a flag
in the U-boot environment, which survives a reboot, and then does a
reboot. On the second boot, it wipes the reboot flag and carries on the
boot regardless of how many HDs are found. In my case, the reboot
allows the second HD to be detected during the second boot, so the XFS
storage area spread across both HDs becomes available.

[snip]

${SETENV} ${REBOOTFLG} true
${ECHO} "Rebooting to try to pick up slow-spin-up drives ..."
# The following command is valid according to the help parameter, but fails >> # ${UMOUNT} -a

Yah, assuming ${UMOUNT} resolves to something like /bin/umount, then
${UMOUNT} -a
probably would fail here. Primarily while trying to umount the filesystem that has your scripts cwd, and (because the umount failure left that filesystem still mounted) the root filesystem.

Remember, umount can't unmount an active mountpoint (one with mountspoints, open files or directories on it), and

a) your script's cwd is most likely located in one of the filesystems
mentioned in /etc/mtab (and, of course, open, because your active
process lives in that cwd),

b) / is probably in your /etc/mtab, and can't be umounted until all
the filesystems that reside on it are umounted, and

c) your use of the -a option effectively asks umount to unmount /all/
filesystems listed in /etc/mtab ("except the proc filesystem")

Thanks for the explanation, which has led me to look back into PuTTY's
log files investigating further. I think your explanation probably does
fit my current situation, because I've now reinstated the command, and
this is the result as of now ...

BusyBox v1.17.2 (2017-09-14 21:33:20 BST) multi-call binary.

Usage: umount [OPTIONS] FILESYSTEM|DIRECTORY

umount: can't umount /proc: Device or resource busy

.... which originally confused me because seeing an abbreviated
explanation of the usage and not noticing the last line led me to
believe that the '-a' parameter had not been accepted. However, I have another log file of apparently the same command used in the same
situation that contains only the last line above, which is a much more reasonable message. In both cases, the system does still reboot.
However, there are other places in the boot scripts, particularly
Zyxel's original scripts, where 'umount -a' appears to fail completely
and just displays the help, here's an example of that ...

Usage: umount [-hV]
umount -a [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
umount [-f] [-r] [-n] [-v] special | node...

.... so I'm not really sure what is going on in that case, perhaps an invisible character such as a non-breaking space has found its way into
the script. Generally, the command's output is somewhat confusing and
seems to have been rather poorly written, at least in the cut-down
BusyBox version used on this NAS box.

--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- MBSE BBS v1.1.1 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

Who's Online
Recent Visitors
- Guest
  Saturday, November 22, 2025 04:58:41
  from Mos via HTTP
- Guest
  Tuesday, November 25, 2025 19:27:53
  from Mos via Telnet
- Guest
  Tuesday, November 25, 2025 00:12:21
  from Mos via HTTP
- Guest
  Wednesday, November 26, 2025 06:52:44
  from Mos via Raw

System Info

Sysop:	Luis Silva
Location:	Lisbon
Users:	766
Nodes:	10 (0 / 10)
Uptime:	66:02:04
Calls:	567
Files:	46,973
Messages:	13,570

Problem With Old Zyxel NSA 221 NASs & Seagate HDs

Who's Online

Recent Visitors

System Info