Thumper broke

One of our ZFS storage array broke this week. Unfortunately, ZFS that was supposed to warn us before something like this breaks didn’t, so now we have an unusable array. Luckily the data there was not significantly important.

What happened was that two of the disks had physical write failures that showed up in /var/adm/messages, and that didn’t show up when we do a zpool status. Doing a zpool scrub finally got these errors to show up, but by that time it was too late.

/var/adm/messages

Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Requested Block: 53478160 Error Block: 53478163
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: 9QJ3FCAT
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] Sense Key: Media Error
Jun 22 03:10:04 jackdaniels scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0

After scrubbing

-bash-3.00# zpool status
pool: tank
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: scrub in progress for 44h58m, 24.57% done, 138h3m to go
config:

NAME STATE READ WRITE CKSUM
tank UNAVAIL 0 0 0 insufficient replicas
raidz1-0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
raidz1-1 UNAVAIL 0 6 0 insufficient replicas
c5t0d0 ONLINE 0 0 1 1.74M repaired
c5t1d0 UNAVAIL 1 13 0 cannot open
c5t2d0 UNAVAIL 2 11 0 cannot open
c5t3d0 ONLINE 0 0 0 2K repaired
c5t4d0 ONLINE 0 0 0 2K repaired
c5t5d0 ONLINE 0 0 0 1.50K repaired
c5t6d0 ONLINE 0 0 0 1.50K repaired
c5t7d0 ONLINE 0 0 0 1.50K repaired
raidz1-2 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0 151K repaired
c4t3d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0 2.72M repaired
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0

errors: 304 data errors, use '-v' for a list

We will have to think about putting in more checks so we can catch these problems in the future.

Value Add

Recently, I’ve been trying to blog more recently, and also segregate my posts into two blogs – work and personal. I have been trying to do work related posts on my NUS blog, and keep this page for my personal posts. However, things did not go so well.

First of all, I tried to access the NUS blog from my mobile phone. More and more people now consume content through their mobile devices, so a blog that has a mobile theme definitely ranks quite high on my list. In short, NUS blog didn’t had a mobile theme, so it looks shitty in smartphones (To see the difference, you can try visiting wordpress.com blogs vs the NUS one).

Seeing that, I emailed the sysadmin to see whether he can put in the appropriate plugins for mobile theme. This was not just for selfish gain – I believe that we as IT professionals need to help each other keep ahead, so as to provide the best damn services to our customers (i.e. the school).

The response was disappointing – because our wordpress theme was a customized installation (outsourced to vendor), the vendor cannot do a mobile theme. Maybe next version, they say.

Fine, I can try to do without.

The next issue came when I was trying to share some source code onto the blog for NUS new openid implementation. Again, wordpress.com has came up with an elegant solution that was not available in our blog. Again, I emailed the sysadmin to help install that.

This time, the response was more hopeful. The vendor will try to put it in, hopefully before the start of the new AY. Translated – it SHOULD be ready in two months time. Two months, to install a plugin. Maybe.

Given the speed of the internet, sometimes customers can have overwhelming expectations. However, that is not the main point of my gripe.

The main point is, what is the value-add of this NUS blog? One of my colleagues has argued this point strongly before. IF we ever want to do something in house, the solution we implement MUST add value. For example, we used to run a few services – mail, unix, ldap, etc, because no one else can. We have obsoleted things that we no longer thinks is useful, like dial-up, since everyone now has broadband everywhere. We have also killed NUSForge and SoCForge, which were ridiculous since they offer no real advantage over the public SourceForge.

Back to the topic, how is this NUS Blog useful? The only thing I can see is that we get run it on our domain. Wow. But to pay more and get less features, is it really worth it?

Seriously contemplating to abandon that PoS for this.