nfsceph

We are now running a ceph cluster, which I find is awesome. Who doesn’t like distributed, easily scalable storage pools?

However, the ceph storage is pretty useless if the clients can’t mount it. Given that most clients talk NFS, SMB, iSCSI and not ceph, an intermediate node needs to be created for exporting ceph to the clients of the world. Enters nfsceph.

nfsceph is something I’ve written off and on over the past few weeks. It is a set of scripts that allows you to create rbds (rados block device) on ceph, maps them, formats them and exports them to the world. More concisely terms, rbd create, rbd map, mkfs.ext3, exportfs.

Let’s see how it makes our (my) life easier!

Creating

‘nfsceph create’ creates a filesystem on ceph

[root@nfs1 ~]# nfsceph create backup 10000
Creating rbd... Success.
Mapping rbd...Success.
Making filesystem...Success.
Mounting filesystem...Success.

Listing

‘nfsceph list’ lists our filesystems

[root@nfs1 ~]# nfsceph list
backup 10.48576 GB

Exporting

‘nfsceph export <filesystem> <ip>’ nfs exports a filesystem to the ip specified
‘nfsceph export’ shows the exports you have

[root@nfs1 ~]# nfsceph export backup 192.168.1.22
[root@nfs1 ~]# nfsceph export
backup 192.168.1.22

At this point, the filesystem is ready to be mounted on the client. You can specify multiple clients, and also netblock (192.168.1.0/24).

More Information

The ceph rbd is mounted on /dev/rbd<x>

[root@nfs1 ~]# mount | grep backup
/dev/rbd6 on /export/backup type ext3 (rw)

The filesystem is exported with the following options for best performance and compatibility.

[root@nfs1 ~]# exportfs -v | grep backup
/export/backup 192.168.1.22(rw,async,wdelay,no_root_squash,no_subtree_check)

There’s also a set of initscripts that saves the current state to a file, and makes the exports persistent across reboot. If you’d like to play with it, the source can be found on¬†github.

With this architecture, we can scale out quite easily by just adding more intermediate nodes to ease the load. Cheap, (practically) unlimited NFS storage. Awesome. ūüôā

Watching Netflix in Singapore

roku netflix

Recipe for Netflix in Singapore

Ingredients

Steps

  • Sign up for MyRepublic Fibre Broadband Service
  • Sign up for Netflix free trial through their website [3]
  • Purchase Roku 3 through Amazon (free shipping to Singapore)
  • Twiddle thumbs till Roku 3 arrives
  • IMPORTANT: Set up a Roku account with a US Country and Zip Code. Use credit card.
  • Plug in Roku 3 (might need a 220v to 110v step down transformer, but users have reported success¬†without)
  • Run through setup.
  • Start Netflix
  • Watch Netflix

Optional

  • Cancel Starhub ūüôā

[1] My Republic Teleport is free till 31 Dec, $5 a month afterwards (I really hope they don’t charge!)
[2] I’ve heard Apple TV works, and WD TV Live too. Let me know if your device works for you
[3] Free trial for 1 month, so that you don’t lose anything if it doesn’t work. You need to pay for it after free trial.

[edits:] added information that you need to create a Roku US account BEFORE activating Roku

How MyRepublic Teleport works

I’ve just signed up with MyRepublic on their Pure HD service, mostly due to their Teleport service. Briefly, Teleport allows you to watch US only service like Netflix and Hulu+ from Singapore.

In addition, I also purchased a WD TV Live to watch Netflix on my big screen TV. However when I set it up, I realized that the WD TV Live does not work with Netflix! ūüė¶

After feeling sorry for myself, I decided to figure out how Teleport works, and maybe try to fix the issue with Netflix and WD TV Live.

First of all, I heard that many WD TV Live users have managed to let Netflix work using Unblock-Us. I went ahead and tried configuring Unblock-Us, and sure enough, it works! This made me further believe that the issue is not with WD TV Live nor Netflix, and surely is with Teleport.

I set up my laptop to NAT all traffic in and out of the WD TV, so that I could listen to all the traffic.

In short, MyRepublic Teleports uses their DNS to redirect you to an Amazon instance in the US for specific domains – mostly the authentication / setup part of streaming services like Netflix. The main bulk of the streaming content afterwards comes from CDNs, which I believe does not need to go through the US link. Let’s take a look.

The WD TV Live starts off by connecting to nccp-nrdp-31.cloud.netflix.net. If you look it up using MyRepublic DNS servers, you can see that it resolves to an Amazon EC2 instance in the US WEST.

$ dig @103.11.48.190 nccp-nrdp-31.cloud.netflix.net.
<snip>
;; ANSWER SECTION:
nccp-nrdp-31.cloud.netflix.net. 0 IN A 54.215.3.116

$ dig -x 54.215.3.116
<snip>
116.3.215.54.in-addr.arpa. 300  IN      PTR   ec2-54-215-3-116.us-west-1.compute.amazonaws.com.

After that, it connects to 2 other domains, uiboot.netflix.com and api-global.netflix.com. This is where the problem lies – MyRepublic still resolves these two to the same EC2 instance.

uiboot.netflix.com. 0 IN A 54.215.3.116
api-global.netflix.com. 0 IN A 54.215.3.116

As far as I can tell, both¬†nccp-nrdp-31.cloud.netflix.net and¬†uiboot.netflix.com connections are HTTPS, which means they can’t share the same IP. To test my theory, I set up a DNS server that responds with the Unblock-Us DNS servers for¬†uiboot.netflix.com and¬†api-global.netflix.com. It works!

I guess the fix for MyRepublic is simple – they just have to create another 2 instances to take care of the traffic going to the 2 affected domains, and everything should work!

I’ve forwarded them the information, hopefully it’ll help them.

Building a large storage for SoC

Someone once told me a interesting quote – “data grows to encompass all storage”. Although drives are getting bigger, things we store gets bigger too. For home users, this is probably fine – a 3TB external USB drive just sets you back a $100 or so. However, for enterprise storage, the growing storage is not so simple. We can’t just simply hook up 1000s of USB external drives, and hope for them to work.

Enterprise storage is crazily expensive, probably 10 to 20 times more expensive than commodity USB storage. With that in mind, and future requirements coming in (dropbox anyone?), we have decided to roll our own distributed storage to enable us to meet the computing requirements of the near future.

Our basic idea is simple. Run a distributed file system that provides the backend storage. Multiple services can layer on top of it to provide different services, e.g. NFS, SMB, volume and block storage.

We have decided to go with Ceph, as it can provide both object, block and filesystem storage. Ceph also integrates nicely with OpenStack, providing the block storage layer for OpenStack volumes. This means that a user on SoC cloud can spin up a VM, and attach a separate (bigger) volume (e.g. /dev/vdb) to it. The OS of the VM still remains on the physical machine, which the (bigger) volume is in the more redundant large storage, insulated from any single machine failure.

openstack_ceph

Linux Malware

Lots of users getting malware on their linux computers lately. Most of the time, the infection vector is due to a weak password. That aside, let’s look at a typical malware.

Below is a print out of the particular malware that resides in /var/tmp

var/tmp/ /.m/
var/tmp/ /.m/LinkEvents
var/tmp/ /.m/1.user
var/tmp/ /.m/Makefile
var/tmp/ /.m/.m.tar.gz
var/tmp/ /.m/2.user
var/tmp/ /.m/m.set
var/tmp/ /.m/m.help
var/tmp/ /.m/genuser
var/tmp/ /.m/src/
var/tmp/ /.m/src/com-ons.c
var/tmp/ /.m/src/combot.c
var/tmp/ /.m/src/channel.c
var/tmp/ /.m/src/config.h
var/tmp/ /.m/src/defines.h
var/tmp/ /.m/src/function.c
var/tmp/ /.m/src/link.o
var/tmp/ /.m/src/combot.o
var/tmp/ /.m/src/dcc.c
var/tmp/ /.m/src/Makefile
var/tmp/ /.m/src/xmech.c
var/tmp/ /.m/src/link.c
var/tmp/ /.m/src/xmech.o
var/tmp/ /.m/src/dcc.o
var/tmp/ /.m/src/main.c
var/tmp/ /.m/src/cfgfile.o
var/tmp/ /.m/src/h.h
var/tmp/ /.m/src/cfgfile.c
var/tmp/ /.m/src/userlist.o
var/tmp/ /.m/src/parse.o
var/tmp/ /.m/src/userlist.c
var/tmp/ /.m/src/structs.h
var/tmp/ /.m/src/mcmd.h
var/tmp/ /.m/src/socket.o
var/tmp/ /.m/src/vars.o
var/tmp/ /.m/src/parse.c
var/tmp/ /.m/src/gencmd.c
var/tmp/ /.m/src/global.h
var/tmp/ /.m/src/debug.o
var/tmp/ /.m/src/Makefile.in
var/tmp/ /.m/src/text.h
var/tmp/ /.m/src/com-ons.o
var/tmp/ /.m/src/main.o
var/tmp/ /.m/src/trivia.c
var/tmp/ /.m/src/gencmd
var/tmp/ /.m/src/usage.h
var/tmp/ /.m/src/socket.c
var/tmp/ /.m/src/trivia.o
var/tmp/ /.m/src/debug.c
var/tmp/ /.m/src/vars.c
var/tmp/ /.m/src/function.o
var/tmp/ /.m/src/commands.c
var/tmp/ /.m/src/commands.o
var/tmp/ /.m/src/config.h.in
var/tmp/ /.m/src/channel.o
var/tmp/ /.m/checkmech
var/tmp/ /.m/bash
var/tmp/ /.m/configure
var/tmp/ /.m/3.user
var/tmp/ /.m/go
var/tmp/ /.m/r/
var/tmp/ /.m/r/raway.e
var/tmp/ /.m/r/rversions.e
var/tmp/ /.m/r/rkicks.e
var/tmp/ /.m/r/rsay.e
var/tmp/ /.m/r/rsignoff.e
var/tmp/ /.m/r/rpickup.e
var/tmp/ /.m/r/rinsult.e
var/tmp/ /.m/r/rtsay.e
var/tmp/ /.m/r/rnicks.e
var/tmp/ /.m/mkindex

As you can see. they have cleverly hidden it by using a directory name with 2 spaces. Some interesting files are

$ cat 1.user
handle Santo
mask *!*@91.210.81.78
prot 4
aop
channel *
access 100

handle Ciao
mask *!*@Ciao.users.undernet.org
prot 4
aop
channel *
access 100

$ head src/cfgfile.c
/*
EnergyMech, IRC bot software
Parts Copyright (c) 1997-2001 proton, 2002-2003 emech-dev

The malware looks to be an IRC bot, which is quite typical for linux. Anyway, at this point in time I lost interest. If you want a closer look at this thing, feel free to email me. ūüôā

Mobile data caps can be senseless

So finally it has happened. Singtel launched their new price plans for mobile with data bundle. In short, they have decrease mobile data for the lowest tier to 2GB from 12GB, but tries to make up by giving more free SMS.

This can’t be a good idea, for many reasons. First of all, what you have give, you can’t take away. Never give a child a toy, and he might throw a bit of a tantrum. Take away his toy, and you are going to have a tough time.

Secondly, data usage is just going to go up in the future. When M1 first launched a plan with 12Gb, I was shocked. There was no way anybody could consume that much! Your battery will not last a day for you to try to download 400Mb everyday. However, nowadays it is quite possible to go past 1 or 2 GB. Within half a month of getting my new SGS III, I have already consumed 500Mb. This is just for Facebook, Twitter, Google Reader, etc. I don’t even instagram religiously like some.

Singtel has countered argue against these common perceptions by saying that 10% of their users use 64% of their bandwidth, hence they need to limit them. I feel that this is crap.

The Pareto principle (commonly known as 80-20 rule) is commonly found in most areas. 80% of wealth is owned by 20% of people, 80% of software bugs are written by 20% of developers :D. Hence, it is normal for 80% of bandwidth to be used by 20%. If you are not able to support such a distribution, please don’t sell such a plan at the beginning!

Secondly, and most importantly, it is important to look at how networks work.

A data network link is like a road. At any time, there can be few cars, or many cars. There are periods of time when there are many cars (peak hours), causing traffic congestion. These are times when you might feel that the network is shitty. If I was a betting man, many times when you feel that the 3G network is crap, you are probably commuting to and from work, or while waiting for lunch or dinner? But these are the times also when everybody else is trying use the network! Again, Pareto comes to mock us Р80% of data is transmitted during 20% of the time. Try during 3am, I bet you will get a fantastic speed!

Capping data is like limiting how many kilometres a car can drive per month. Do you think this is a good idea to reduce traffic congestion during the peak hours? In essence, people who still need to drive at that time will continue to do so. They might cut back on their off hours usage, like driving for supper or for pleasure. However, off hours are the times when it is OK to use the road! When the highway is empty, I don’t give a damn how much you want to drive, from Pasir Ris to Jurong and back again 10x. It doesn’t matter.

Hence, it doesn’t matter how much data one uses a month. What matters is that heavy downloaders don’t do it when everyone else needs to use the network.

How do you implement that? A naive way will be have 2 caps, one for peak hour and one for off-peak. Telcos are already doing this, by charging more for peak hour call minutes (of cos, the other reason why they have peak call minutes is because of willingness-to-pay, which is another topic altogether). Another way is just to rate limit based on how congested the network is. If congestion is detected, start rate-limiting the heavy 20% downloaders. This is why I also against blocking P2P Рit is okay to do P2P, when nobody needs to use the network. Once the network is build, the incremental cost of sending each packet is so minuscule, it makes no sense not to fully utilise the pipe.

Of course, technically, there exists some challenges. More money have to be put into equipment. There are no standard protocols to do token-based network congestion control yet. I am aware that network operators are between a rock and a hard place, but really, dropping bandwidth cap is not the way to go.

How SSL Work

Recently, there was a bit of a discussion in the office on how SSL works. I think this stems from SSL (OpenSSL) being one of the most sparsely documented library in the open source world. Hopefully this will help someone, and also serves to remind me next time I want to fix things.

Basics

The following presumes you have public-key crypo knowledge. To set up the secure channel, the steps are as such:

  1. Client connect to SSL server
  2. SSL server sends client its cert
  3. Client randomly generate a key, and encrypt it with the server’s cert and sends it to server. Since encrypted, only server and client knows this key.
  4. Server gets client’s key, and encrypts remaining of the data with key

In this scenario, there is one loophole Рhow do you know the server sending you the cert is valid? A bad guy on the internet can intercept the data stream and give you his own cert, creating a man-in-middle attack.

To solve this problem, SSL uses signed certs. The cert that the server have is signed by another cert (typically call Certification Authority, CA). This CA cert can be signed yet by another cert, etc, etc. So how do we verify the top level certs (those that sign everybody else)? These certs are actually installed in client’s browser/OS, since the client trusts its browser and OS, the chain of trust can extend down to the server cert.

Verifying Certificate

You can verify a certificate using openssl on linux.
$ openssl s_client -connect http://www.comp.nus.edu.sg:443
CONNECTED(00000003)
depth=0 serialNumber = fqi84NUg7JCvWph5RiPhVWj76ujT39uq, C = SG, O = *.comp.nus.edu.sg, OU = GT21833570, OU = See http://www.rapidssl.com/resources/cps (c)10, OU = Domain Control Validated - RapidSSL(R), CN = *.comp.nus.edu.sg
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 serialNumber = fqi84NUg7JCvWph5RiPhVWj76ujT39uq, C = SG, O = *.comp.nus.edu.sg, OU = GT21833570, OU = See http://www.rapidssl.com/resources/cps (c)10, OU = Domain Control Validated - RapidSSL(R), CN = *.comp.nus.edu.sg
verify error:num=27:certificate not trusted
verify return:1
depth=0 serialNumber = fqi84NUg7JCvWph5RiPhVWj76ujT39uq, C = SG, O = *.comp.nus.edu.sg, OU = GT21833570, OU = See http://www.rapidssl.com/resources/cps (c)10, OU = Domain Control Validated - RapidSSL(R), CN = *.comp.nus.edu.sg
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
0 s:/serialNumber=fqi84NUg7JCvWph5RiPhVWj76ujT39uq/C=SG/O=*.comp.nus.edu.sg/OU=GT21833570/OU=See http://www.rapidssl.com/resources/cps (c)10/OU=Domain Control Validated - RapidSSL(R)/CN=*.comp.nus.edu.sg
i:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEATCCA2qgAwIBAgIDFTFqMA0GCSqGSIb3DQEBBQUAME4xCzAJBgNVBAYTAlVT
MRAwDgYDVQQKEwdFcXVpZmF4MS0wKwYDVQQLEyRFcXVpZmF4IFNlY3VyZSBDZXJ0
aWZpY2F0ZSBBdXRob3JpdHkwHhcNMTAxMTE0MDcyMzU0WhcNMTIxMTE2MDM0MjQ3
WjCB6TEpMCcGA1UEBRMgZnFpODROVWc3SkN2V3BoNVJpUGhWV2o3NnVqVDM5dXEx
CzAJBgNVBAYTAlNHMRowGAYDVQQKDBEqLmNvbXAubnVzLmVkdS5zZzETMBEGA1UE
CxMKR1QyMTgzMzU3MDExMC8GA1UECxMoU2VlIHd3dy5yYXBpZHNzbC5jb20vcmVz
b3VyY2VzL2NwcyAoYykxMDEvMC0GA1UECxMmRG9tYWluIENvbnRyb2wgVmFsaWRh
dGVkIC0gUmFwaWRTU0woUikxGjAYBgNVBAMMESouY29tcC5udXMuZWR1LnNnMIIB
IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuYOujSH4is6LLvp2oZfXXuxs
h7uNvfq6XDaD5gVOhKSYaso8nsCU245fuzP96qAZOecD+olvj2JAN8nJfYSw2cYb
r2RJIrOvZz8HWNYKWza+CNm/XahX+XBfjKagc5XLv4eTm9h3ll0QjxZjQrLi+Gl7
YeaiW8BffRAUUp6R5kdiBpENI44MhNL+yLC5u+hbzPv3DjVRLtbmqQ3uIqXodvqM
nCWsbb4rh8tVGsTOyG75ftybmq2QXgTPm+F7AH92WunpOWGJpo4cVa1qMxxC1Z4+
bu9bmO222S3ZnuyMErhuyPlSTnAzycG6eu3iS1L+Ou/JZBvjSWNXvDHcX6MqKwID
AQABo4HMMIHJMB8GA1UdIwQYMBaAFEjmaPkr0rKV10fYIyAQTzOYkJ/UMA4GA1Ud
DwEB/wQEAwIE8DAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHAYDVR0R
BBUwE4IRKi5jb21wLm51cy5lZHUuc2cwOgYDVR0fBDMwMTAvoC2gK4YpaHR0cDov
L2NybC5nZW90cnVzdC5jb20vY3Jscy9zZWN1cmVjYS5jcmwwHQYDVR0OBBYEFKug
5ZrqygIj3fWISLGecoOEmhnaMA0GCSqGSIb3DQEBBQUAA4GBAKUlIBbQk94PFgIJ
44kZR9P5eeM7XZnmGC5BJzDKwpnfVABCoQ3SMURUNxchDY63xqSaVltGbLIuTJCW
6DkBDBuYFQm1JgtYwUSrifNzUi4KtTXS1XpdePJ1g2JlreY9nwAUqLOfLHQ/oMSg
7siIkD3TmkD4PRq8NByqra8Qns2I
-----END CERTIFICATE-----
subject=/serialNumber=fqi84NUg7JCvWph5RiPhVWj76ujT39uq/C=SG/O=*.comp.nus.edu.sg/OU=GT21833570/OU=See http://www.rapidssl.com/resources/cps (c)10/OU=Domain Control Validated - RapidSSL(R)/CN=*.comp.nus.edu.sg
issuer=/C=US/O=Equifax/OU=Equifax Secure Certificate Authority
---
No client certificate CA names sent
---
SSL handshake has read 1744 bytes and written 353 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : SSLv3
Cipher : DHE-RSA-AES256-SHA
Session-ID: 046976837AFB333337300D2AE0CFA9BFB92ACB262857F98632E1F4327A5D5A73
Session-ID-ctx:
Master-Key: C5038788C147F16260760064E379BE5B948CB3E65D33EBC0B99110D92DE4FCB7F6E86BD26FF3FB75589FA915EE578A12
Key-Arg : None
PSK identity: None
PSK identity hint: None
Start Time: 1332384818
Timeout : 7200 (sec)
Verify return code: 21 (unable to verify the first certificate)

You can see from the output (blue, in Certificate chain)¬†that the server returned one cert. From the last line, we are not able to verify the cert. This is because we didn’t provide the top level certs directory for openssl to verify again. In Ubuntu, the certs are at¬†/etc/ssl/certs/.

$ openssl s_client -CApath /etc/ssl/certs/ -connect http://www.comp.nus.edu.sg:443
<snip>
Verify return code: 0 (ok)

Single Root

In our example above, we can see that the server cert is signed by a root CA (“Equifax Secure Certificate Authority”). This is what we call “Single Root” cert. In the last few years, single root certs are becoming less common, and most certs that you buy are chained certs (server cert signed by intermediate cert, which is in turned signed by root cert). This is the confusing part to many sysadmins. Instead of just installing a server cert, now a sysadmin have to install but the server certs and all the intermediate certs, to ensure that the chain of trust can be verified.¬†An example of this is

$ openssl s_client -connect mysoc.nus.edu.sg:443
<snip>
Certificate chain
0 s:/C=SG/ST=Singapore/L=Kent Ridge/O=National University of Singapore - School of Computing/OU=Webserver Team/CN=mysoc.nus.edu.sg
i:/C=US/O=Thawte, Inc./CN=Thawte SSL CA
1 s:/C=US/O=thawte, Inc./OU=Certification Services Division/OU=(c) 2006 thawte, Inc. - For authorized use only/CN=thawte Primary Root CA
i:/C=ZA/ST=Western Cape/L=Cape Town/O=Thawte Consulting cc/OU=Certification Services Division/CN=Thawte Premium Server CA/emailAddress=premium-server@thawte.com
2 s:/C=US/O=Thawte, Inc./CN=Thawte SSL CA
i:/C=US/O=thawte, Inc./OU=Certification Services Division/OU=(c) 2006 thawte, Inc. - For authorized use only/CN=thawte Primary Root CA

You can see that the server now returns 3 certs. (0) is server cert, which is signed by (2), which is in turn signed by (1).

If only the server cert is installed, then you will only be able to see 1 certificate here, and the chain of trust will fail!

In summary, always check your certs after installing. You can also check them easily using web tools, e.g. http://www.sslshopper.com/ssl-checker.html