Load Balancing with HAproxy

I’ve been talking about load balancing already a bit, but that was about the Amazon Elastic Load Balancing. It’s a super easy way to do load balancing, with management now also through the EC2 management console, I believe. Then again, you have to use a CNAME to point to the load balancer which is a restriction as most of the cool guys have their site as http://mysite.com and this is why I did a HAproxy installation too.

So I have this http://vkaiser.com site which is a social site with video upload capabilites and connections to Twitter and Facebook but mostly it’s just my hobby and a test site on how to run Drupal. It’s a basic LAMP installation with EBS based image with the thumbnails and videos in S3 bucket. I’ve been wanting to add a load balancer, multiple web servers and a separate database server for a long time, but now as the t1.micro instances have become available, I have the financial possibilities to add them.

I first started with the load balancer. There are multiple good tutorials out there on how to do this, such as this. That tutorial even has instructions on how to install a high availability load balancing with heartbeat. I did not do that as I could not figure out how assign the virtual ip which the load balancers should share. One other thing which did not seem to work, was the web farm listening ip. For some reason it did not work with the elastic ip I had given for the HAproxy. I had to use a wild card to get connection to web servers working through the HAproxy. It might have something to do with virtual hosts, but I have not tested that.

It might be good to mention, that the connections after the HAproxy are done through the private address space as this does not consume the bandwidth. It might be interesting to see how the system can work with multiple availability zones, given there is a way a round the virtual ip problem. Well, one thing which might work is to have a hot stand-by HAproxy which would check the running HAproxy for availability and then start doing tricks with the AWS api if the other zone would not be available.

Then the file uploads. As it is a video site with the possibility to upload videos, I need to have some way to get the same uploaded files to all of the web servers. A scalable way would have been to install yet two more file servers with high availability, but at this stage I did not do that. I only did rsync with public key authentication between the servers. A good tutorial on how to do the public key stuff can be found here.

I actually have three web servers, which one of them is the database server because I did not add the wordpress installation (this blog) to the web farm yet at least. Thus, the vkaiser web farm has three nodes where the db server is kind of the root. All theme updates are done there and synced forward to the other two nodes. File uploads are synced from the other nodes to the root and from the root to the other two nodes. The slave nodes don’t sync directly between each other because there is no real need as they hop through the db server. In case the db server would be down, the site is gone anyway.

Oh yeah, one thing was the video conversion to flash. I have ffmpeg on the web servers, which is bad bad bad, but now as there are three nodes it should be a slightly better situation.

Next up would then be the master-slave replication for the database, investigating if there is a way to do that HAproxy virtual ip or elastic ip reassignment, move to a file share instead of rsync and get Puppet to take care of the configuration management. Possibly a separate cluster for ffmpeg would be so cool as well. A lot more to do!

Tags: , , ,

Leave a comment