New Job! DevOps Engineer at University of Melbourne

I’ve started a new job! Some of you guys might know that I’ve left NUS about 1 month ago, and that I was moving to Melbourne. I left kind of abruptly and did not gave much details, as everything was pretty much unconfirmed even before I flew off. Now that the dust has finally settled, I felt that it is time to write about what’s been happening for the last one month.

I’ve started work at University of Melbourne, as a DevOps Engineer working on NeCTAR Research Cloud. NeCTAR RC is a big federated OpenStack cloud made up of nodes all around Australia. I joined the Melbourne Node, working with three other colleagues. I’ve always been interested in OpenStack so I guess it’s a good change for me!

I’m still catching up on everything, so I’m just trying to respond to support tickets whenever I can, and poking around the cloud when I have time. We just upgrade one of our sites from Precise to Trusty, and I’m currently doing some maintenance work and writing tools.

The research cloud is quite big. There are pretty graphs at the status page, and Melbourne node is one of the biggest at about 1500 instances in 3500 cores! (P.S: I hope on of the tools I’m working on can push that up ~5%!)

Most of the ‘development’ seems to come from the lead node guys (sitting a few steps away). They are really quite good at cells and messaging and all that. To my knowledge, pretty much only NeCTAR RC, RackSpace and CERN runs nova-cells now. So, it’s really cool to be able to see the workings of a big federated cloud from the inside!

Advertisements

nfsceph

We are now running a ceph cluster, which I find is awesome. Who doesn’t like distributed, easily scalable storage pools?

However, the ceph storage is pretty useless if the clients can’t mount it. Given that most clients talk NFS, SMB, iSCSI and not ceph, an intermediate node needs to be created for exporting ceph to the clients of the world. Enters nfsceph.

nfsceph is something I’ve written off and on over the past few weeks. It is a set of scripts that allows you to create rbds (rados block device) on ceph, maps them, formats them and exports them to the world. More concisely terms, rbd create, rbd map, mkfs.ext3, exportfs.

Let’s see how it makes our (my) life easier!

Creating

‘nfsceph create’ creates a filesystem on ceph

[root@nfs1 ~]# nfsceph create backup 10000
Creating rbd... Success.
Mapping rbd...Success.
Making filesystem...Success.
Mounting filesystem...Success.

Listing

‘nfsceph list’ lists our filesystems

[root@nfs1 ~]# nfsceph list
backup 10.48576 GB

Exporting

‘nfsceph export <filesystem> <ip>’ nfs exports a filesystem to the ip specified
‘nfsceph export’ shows the exports you have

[root@nfs1 ~]# nfsceph export backup 192.168.1.22
[root@nfs1 ~]# nfsceph export
backup 192.168.1.22

At this point, the filesystem is ready to be mounted on the client. You can specify multiple clients, and also netblock (192.168.1.0/24).

More Information

The ceph rbd is mounted on /dev/rbd<x>

[root@nfs1 ~]# mount | grep backup
/dev/rbd6 on /export/backup type ext3 (rw)

The filesystem is exported with the following options for best performance and compatibility.

[root@nfs1 ~]# exportfs -v | grep backup
/export/backup 192.168.1.22(rw,async,wdelay,no_root_squash,no_subtree_check)

There’s also a set of initscripts that saves the current state to a file, and makes the exports persistent across reboot. If you’d like to play with it, the source can be found on github.

With this architecture, we can scale out quite easily by just adding more intermediate nodes to ease the load. Cheap, (practically) unlimited NFS storage. Awesome. 🙂

OpenStack Active Directory / LDAP authentication

OpenStack (Grizzly) allows keystone to authenticate to different backends. The default backend is an SQL database, storing both user information (username/password) and also tenant information (which user belongs to which group). Although you can update this to a LDAP based backed, it would mean having to take care of tenant information in LDAP too (which means tedious things like creating new LDAP DC, which no self-respecting LDAP admin will let you do arbitrarily). But what if you just wants OpenStack to authenticate to a LDAP server, like Active Directory in an enterprise setting?

Luckily, keystone allows you to extend authentication easily. What the following patch does is to allow you to set up to 3 LDAP servers, which keystone will attempt to bind to using provided username / password when a user logs in. It can also fall back to use the user information in SQL if it fails to bind to LDAP servers by setting FALLBACK = True.

First of all, you need to create your own Identity backend with _check_password() function. Please check out ldapauth.py on my github. Put this file into keystone/identity/backends.

Next, you will need to update config.py to read some new configurations in your keystone.conf.

The full patch is in my github. https://github.com/waipeng/keystone/commit/8c18917558bebbded0f9c588f08a84b0ea33d9ae

After this, you can update keystone.conf to specify the LDAP servers that you want to authenticate with. Example:

[ldapauth]
server1_host = ldap://ldap1.example.com
server2_host = ldap://ldap2.example.com
server3_host = ldap://ldap3.example.com
server1_domain = DOMAIN1
server2_domain = DOMAIN2
server3_domain = DOMAIN3
fallback = True

Building a large storage for SoC

Someone once told me a interesting quote – “data grows to encompass all storage”. Although drives are getting bigger, things we store gets bigger too. For home users, this is probably fine – a 3TB external USB drive just sets you back a $100 or so. However, for enterprise storage, the growing storage is not so simple. We can’t just simply hook up 1000s of USB external drives, and hope for them to work.

Enterprise storage is crazily expensive, probably 10 to 20 times more expensive than commodity USB storage. With that in mind, and future requirements coming in (dropbox anyone?), we have decided to roll our own distributed storage to enable us to meet the computing requirements of the near future.

Our basic idea is simple. Run a distributed file system that provides the backend storage. Multiple services can layer on top of it to provide different services, e.g. NFS, SMB, volume and block storage.

We have decided to go with Ceph, as it can provide both object, block and filesystem storage. Ceph also integrates nicely with OpenStack, providing the block storage layer for OpenStack volumes. This means that a user on SoC cloud can spin up a VM, and attach a separate (bigger) volume (e.g. /dev/vdb) to it. The OS of the VM still remains on the physical machine, which the (bigger) volume is in the more redundant large storage, insulated from any single machine failure.

openstack_ceph

Centos openstack + cinder + ceph

If you are looking to run OpenStack with ceph as the backing storage for cinder, you will need the following.

  1. yum -y install openstack-cinder
  2. Follow the instructions at http://ceph.com/docs/master/rbd/rbd-openstack/
  3. Create the firewall rules to allow compute nodes to connect to cinder-volumes
  4. You might run into the following error while attaching
    internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized
    This is because qemu in CentOS does not have rbd build in by default.
  5. To solve this problem, download qemukvm and qemu-img from http://ceph.com/packages/ceph-extras/rpm/.
  6. Install the packages, e.g.
    rpm --oldpackage -Uvh qemu-kvm-0.12.1.2-2.355.el6.2.x86_64.rpm qemu-img-0.12.1.2-2.355.el6.2.x86_64.rpm

Building CentOS images for OpenStack

Here’s an easy way to roll your own images for OpenStack. The build machine runs CentOS 6.

  1. Install EPEL repository if you haven’t already
  2. Install oz
    yum -y install oz
  3. Create a kickstart file. Download example.
  4. Create a tdl file. Download example.
  5. Run oz-install
    oz-install -p -u -d1 -a centos6.ks centos6.tdl
  6. Convert the image to qcow2
    qemu-img convert /var/lib/libvirt/images/centos6_x86_64.dsk -O qcow2 centos6.qcow2
  7. Import the newly create image into glance
    glance image-create --name centos6 --disk-format=qcow2 --container-format=ovf < centos6.qcow2</li>
  8. Boot it up to see whether it works!
    nova boot --flavor 1 --image centos6 --key_name sshkey centos6

VLAN bug in CentOS 5.8

For the impatient – bug id 5572.

A colleague called the other day – one of our machines had rebooted, and quite a number of network interfaces were not working. This particular machine has quite a number of vlans connected to it – most of them can’t work. The primary interface (eth0), still works fine; only eth0.xxx interfaces were affected.

Tcpdump shows that the arp requests were being made to the vlan interfaces, and replies were sent. However, the replies were not getting to other machines. After quite some troubleshooting, we decided to downgrade the kernel. That fixed the problem!

Seems like a bug has crept into tg3 kernel module in the 2.6.18-308 kernel. Hope this helps someone.