tinyapps.org / docs / Handling read errors while verifying a zeroed hard drive

Update: Replacing hdparm --security-erase with hdparm --security-erase-enhanced allowed hexdump to read the entire drive on the next run, despite smartctl reporting no increase in reallocated sectors.

0. Disk info

# fdisk -l /dev/sda
Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

1. Issue ATA Secure Erase command

# hdparm --user-master u --security-erase p /dev/sda
security_password: "p"

 Issuing SECURITY_ERASE command, password="p", user=user

2. Check with hexdump - read error reported:

# hexdump /dev/sda
0000000 0000 0000 0000 0000 0000 0000 0000 0000
hexdump: /dev/sda: Input/output error

3. Try reading errant sector

hexdump reports the offset in hexadecimal bytes. There are 512 bytes per sector on this drive as seen in step 0 above. Convert 4f5b1000 to decimal and divide by 512 to pinpoint the errant sector:

# echo $((0x4f5b1000))
# echo $((1331367936/512))

3.1 try hexdump once more on the reported sector - failed:

# hexdump -s 1331367936 -n 1 /dev/sda
hexdump: /dev/sda: Input/output error

3.2 dd - failed:

# dd if=/dev/sda bs=512 skip=2600328 count=1 | hexdump
dd: error reading '/dev/sda': Input/output error

3.3 hdparm - succeeded:

# hdparm --read-sector 2600328 /dev/sda

reading sector 2600328: succeeded
0000 0000 0000 0000 0000 0000 0000 0000

The hdparm man page states in part,

hdparm will issue a low-level read (completely bypassing the usual block layer read/write mechanisms) for the specified sector. This can be used to definitively check whether a given sector is bad (media error) or not (doing so through the usual mechanisms can sometimes give false positives).

However, these long-standing bugs may cause inaccurate results:

4. Repair the sector

# hdparm --repair-sector 2600328 --yes-i-know-what-i-am-doing /dev/sda

re-writing sector 2600328: succeeded

Despite this seeming success, hexdump and dd still failed to read sector 2600328; skipping it resulted in their aborting on yet other read errors further on.

5. Have hdparm read every sector?

# for ((i=0;i<=3907029167;i++)) ; do hdparm --read-sector $i /dev/sda 2>&1 | grep FAILED ; done
reading sector 6305325: FAILED: Input/output error

That is: 1) iterate through all 3907029168 sectors, 2) read each sector with hdparm, 3) combine stderr and stdout (to keep the failed sector number and its error message together), and 4) look for lines containing "FAILED". Optionally, save output to fail.log while maintaining terminal output by adding |& tee -a fail.log after the grep command.

While this returns all unreadable sectors, it does not verify that all readable sectors contain zeros, which we can check by replacing grep with awk,

# for (( i=0; i<=3907029167; i++ )) ; do hdparm --read-sector $i /dev/sda 2>&1 ; done | awk '!/0000 0000 0000 0000 0000 0000 0000 0000/ && !/succeeded/ && NF && !/\/dev\/sda:/ || $4~/FAILED/'

excluding lines containing 32 zeros, "succeeded", nothing, and "/dev/sda:" (in order to return only any non-zero data), and printing lines containing "FAILED" in the fourth column.

Alas, after several days the process was still around two orders of magnitude away from completion.

6. Combine badblocks and hdparm

6.1 Run badblocks in read-only mode to generate list of unreadable or non-zero sectors:

# badblocks -b 512 -sv -t 0 -o bb.log /dev/sda
Checking blocks 0 to 3907029167
Checking for bad blocks in read-only mode
Testing with pattern 0x00: done                                                 
Pass completed, 615 bad blocks found. (615/0/0 errors)

6.2 Have hdparm write/repair each bad sector:

# while read i ; do hdparm --repair-sector $i --yes-i-know-what-i-am-doing /dev/sda ; done <bb.log

All 615 sectors reported to have been rewritten successfully.

6.3 Try reading repaired sectors with hdparm:

# while read i ; do hdparm --read-sector $i /dev/sda ; done <bb.log

A handful of unreadable sectors were still reported, but repairing then reading them individually (as many as 5 or 6 times in a few cases) succeeded at last.

7. Sources

8. Related

last update: 2018.06.03