Thumper broke

One of our ZFS storage array broke this week. Unfortunately, ZFS that was supposed to warn us before something like this breaks didn’t, so now we have an unusable array. Luckily the data there was not significantly important.

What happened was that two of the disks had physical write failures that showed up in /var/adm/messages, and that didn’t show up when we do a zpool status. Doing a zpool scrub finally got these errors to show up, but by that time it was too late.

/var/adm/messages

Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Requested Block: 53478160 Error Block: 53478163
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: 9QJ3FCAT
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Sense Key: Media Error
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0

After scrubbing

-bash-3.00# zpool status
pool: tank
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: scrub in progress for 44h58m, 24.57% done, 138h3m to go
config:

NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz1-0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
raidz1-1 UNAVAIL 0 6 0 insufficient replicas
c5t0d0 ONLINE 0 0 1 1.74M repaired
c5t1d0 UNAVAIL 1 13 0 cannot open
c5t2d0 UNAVAIL 2 11 0 cannot open
c5t3d0 ONLINE 0 0 0 2K repaired
c5t4d0 ONLINE 0 0 0 2K repaired
c5t5d0 ONLINE 0 0 0 1.50K repaired
c5t6d0 ONLINE 0 0 0 1.50K repaired
c5t7d0 ONLINE 0 0 0 1.50K repaired
raidz1-2 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0 151K repaired
c4t3d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0 2.72M repaired
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0

errors: 304 data errors, use '-v' for a list

We will have to think about putting in more checks so we can catch these problems in the future.

Advertisements

One thought on “Thumper broke

  1. Pingback: Thumper broke (part 2) | Me and my musings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s