Our thumper got some disk error about a month ago, and the vendor has been trying to fixed it over the last month. First, we thought it might just be a simple disk replacement. The data wasn’t extremely important, so we didn’t really make a big effort to supervise what the vendor was doing. However, things didn’t go so well and went progressively worse.
After numerous visits, clearing and reimporting the zpool, patches, tech support between the vendor and Oracle, today we got the news that the zpool is totally broken and needs to be reinitialized. Ah hah! We had already thought so 2 weeks ago and wanted to reinitialize, but we thought to just give the vendor a chance.
So right now we have a totally new system with no data :D. A few lessons learnt.
- Maybe sometimes it is not good to give your vendor full autonomy. They might not realize how important your data is, or might not care. If things messes up, they can just say sorry, but your ass will be on fire.
- Even with ENTERPRISE storage, things go wrong.
- Sometimes, it might be better to DIY.
Just aside, a colleague who ran the same system (SUN x4500) had a disk corruption at time same time too. The vendor’s engineer replacing the disk actually removed the wrong side of a mirrored array! Luckily, he was there and managed to stop the re-silvering in time, or the whole pool might have crashed (like ours).
So now we are planning to DIY our own (cheapo) storage. Yaojun is working on a Backblaze clone that should be quite cool. We need to figure out a use case for huge (but not so fast) storage (think Dropbox, Carbonite, etc). Anyone has any applications in mind? Please post in comments