nfsceph

We are now running a ceph cluster, which I find is awesome. Who doesn’t like distributed, easily scalable storage pools?

However, the ceph storage is pretty useless if the clients can’t mount it. Given that most clients talk NFS, SMB, iSCSI and not ceph, an intermediate node needs to be created for exporting ceph to the clients of the world. Enters nfsceph.

nfsceph is something I’ve written off and on over the past few weeks. It is a set of scripts that allows you to create rbds (rados block device) on ceph, maps them, formats them and exports them to the world. More concisely terms, rbd create, rbd map, mkfs.ext3, exportfs.

Let’s see how it makes our (my) life easier!

Creating

‘nfsceph create’ creates a filesystem on ceph

[root@nfs1 ~]# nfsceph create backup 10000
Creating rbd... Success.
Mapping rbd...Success.
Making filesystem...Success.
Mounting filesystem...Success.

Listing

‘nfsceph list’ lists our filesystems

[root@nfs1 ~]# nfsceph list
backup 10.48576 GB

Exporting

‘nfsceph export <filesystem> <ip>’ nfs exports a filesystem to the ip specified
‘nfsceph export’ shows the exports you have

[root@nfs1 ~]# nfsceph export backup 192.168.1.22
[root@nfs1 ~]# nfsceph export
backup 192.168.1.22

At this point, the filesystem is ready to be mounted on the client. You can specify multiple clients, and also netblock (192.168.1.0/24).

More Information

The ceph rbd is mounted on /dev/rbd<x>

[root@nfs1 ~]# mount | grep backup
/dev/rbd6 on /export/backup type ext3 (rw)

The filesystem is exported with the following options for best performance and compatibility.

[root@nfs1 ~]# exportfs -v | grep backup
/export/backup 192.168.1.22(rw,async,wdelay,no_root_squash,no_subtree_check)

There’s also a set of initscripts that saves the current state to a file, and makes the exports persistent across reboot. If you’d like to play with it, the source can be found on github.

With this architecture, we can scale out quite easily by just adding more intermediate nodes to ease the load. Cheap, (practically) unlimited NFS storage. Awesome. 🙂

Building a large storage for SoC

Someone once told me a interesting quote – “data grows to encompass all storage”. Although drives are getting bigger, things we store gets bigger too. For home users, this is probably fine – a 3TB external USB drive just sets you back a $100 or so. However, for enterprise storage, the growing storage is not so simple. We can’t just simply hook up 1000s of USB external drives, and hope for them to work.

Enterprise storage is crazily expensive, probably 10 to 20 times more expensive than commodity USB storage. With that in mind, and future requirements coming in (dropbox anyone?), we have decided to roll our own distributed storage to enable us to meet the computing requirements of the near future.

Our basic idea is simple. Run a distributed file system that provides the backend storage. Multiple services can layer on top of it to provide different services, e.g. NFS, SMB, volume and block storage.

We have decided to go with Ceph, as it can provide both object, block and filesystem storage. Ceph also integrates nicely with OpenStack, providing the block storage layer for OpenStack volumes. This means that a user on SoC cloud can spin up a VM, and attach a separate (bigger) volume (e.g. /dev/vdb) to it. The OS of the VM still remains on the physical machine, which the (bigger) volume is in the more redundant large storage, insulated from any single machine failure.

openstack_ceph