30
Jan 12

Recovering a non responsive AWS instance

I could not ssh in to one of my AWS instances last evening and it wasn’t serving any pages either. AWS management console said it was up, though. Rebooting did not help. The second reboot did not help either. Shutdown and start did not help. I was running out of tricks here!

For some reason, the instance had been running on 100% CPU utilization for days:

(I better do some monitoring in future!)

Even though the CPU usage had dropped after the restarting, the instance would not accept any connections. The only thing I could think of was to either ping the AWS forum, or to get the running volume on some new instance as the instance was an EBS based one. I decided to go with the new volume if the database would not mind too much. Steps I needed to do were:

  1. Snapshot the running volume
  2. Create a new volume out of the snapshot on the same availability zone
  3. Start a new instance with the Launch more like this
  4. Shutdown the new instance
  5. Detach the volume on the new instance
  6. Attach the volume which was created from the snapshot to new instance (need to have the correct  attachment information, like /dev/sda1)
  7. Start the new instance
  8. Disassociate the Elastic IP from the old instance
  9. Associate the correct Elastic IP on the new instance
  10. Test and wish for the best

This actually worked and did not even take too much time. Actually, really cool when thinking about this and imagining I would have had a physical server instead…


08
Dec 11

AWS reboots, oh the drama

I, as well as many others, received today an email from Amazon about the need to reboot one of my instances. Actually, Twitter was already aware of this and was a bit upset of the need. For me, this was the second time since 2009 when Amazon has asked to reboot one of my instances. Once the HW was degraded and now this. I would say it’s quite a decent score since I have averaged something like five instances running all the time.

I am not upset, on the contrary I am happy AWS keeping the infrastructure up to date, be the reason for the reboot what ever. Besides, the systems should be designed so, that rebooting an instance should not take the service down, if you don’t accept it (like I do).

The actual process how AWS did inform the customers did feel ok. At first it was of course just rumours, but then I received an email stating the need which gave an acceptable time to react. When I logged in to the AWS Dashboard, I saw this kind of a message:

Scheduled Events

Which had a link to further information:

And even more information:

There was an option to do the reboot right now if I wanted, so I did it. At first after the reboot, I was looking at the instance in the dashboard, but the notification icon was still there. I would have thought it would disappear. Then I had a look of the details of the event and it actually had [Completed] written infront of the event:

Which now probably means it’s ok and I am done with this.


19
Nov 11

My new best AWS feature, CloudFormation

I just realized AWS has a feature called the CloudFormation which allows users to script their technology stack in a convenient and easily understood JSON formatted text files which can then be used to deploy the stack over and over again, always the same way. Fantastic! This eases a the burden of managing a bunch of customized AMIs or other ways of having some custom features introduced to the AMIs. I wonder how I did not notice this feature before. It even has a tab in the AWS Management Console. There are also some sample templates which for example install Drupal or a basic Ruby Hello World example.

As a test, I ran the Drupal installation script and I have to say this was by far the easiest Drupal installation I have ever done. From start to finish in 5 minutes where most of it was just waiting for the deploy to finish. Absolutely great! Minor thing might be to remember that the security keys are not available in all the Regions, at least not in US East (Virginia) my keys were not available which caused the stack deployment to fail without any good reason except key was not found… I was of course first thinking of a typo in the key name. The other thing is that the user must know the instance type name, such as t1.micro while a drop down menu would be great.

There is also a possibility to modify an existing stack which is actually a relatively new feature. This makes it even more usable. It would be interesting to see if I could do a stack for a simple Aegir installation as lately that’s the platform I have been installing the most and doing the manual installation has become kind of boring. CloudFormation would help lot with that!


20
Jul 11

Migrating a site from another Aegir installation

In case you ever need to migrate a site from some other Aegir installation to a different one, you can follow the instructions here. If you are like me, who has little time between day time job, looking after a kid and sleeping, you tend to skip to the section where it has the code you need to copy-paste-edit. Well, if you get something like this when executing your pasted code:

PHP Fatal error: Allowed memory size of 201326592 bytes exhausted (tried to allocate 20 bytes) in /var/aegir/.drush/provision/provision.context.inc on line 31

Fatal error: Allowed memory size of 201326592 bytes exhausted (tried to allocate 20 bytes) in /var/aegir/.drush/provision/provision.context.inc on line 31
Drush command terminated abnormally due to an unrecoverable error. [error]
Error: Allowed memory size of 201326592 bytes exhausted (tried to
allocate 20 bytes) in /var/aegir/.drush/provision/provision.context.inc,
line 31

You had the platform name wrong because you thought it would be the same as the one you gave it when you created it. It’s not. Do this:

drush sa | grep platform

To find the real name of the platform and rerun your command and it should work.


16
May 11

Amazon Web Services used in Sony PSN attack

Today’s breaking news have been Bloomberg’s story about the Sony PSN attack been conducted by using Amazon Web Services. I read the story and feel confused, like how on earth can the source of the servers be any kind of relevancy if they’ve been using a public cloud provider? Come on, Amazon can’t and really should not, follow what their customers do with their servers. This whole thing Bloomberg is writing about is like saying the bank was robbed by a Smith&Wesson and it was Smith&Wesson’s fault.

Of course, there will be a subpoena for getting all the information of the account used in managing the account and I guess they had to use some stolen credit card as well which is interesting. Also, the statement in the Bloomberg’s article about anyone anonymously going and getting an account in AWS is kind of not totally true. Maybe it can be managed somehow if using a stolen credit card, but it’s not an anonymous service as such. And how are you going to prevent that “flaw” in the system of the possibility using stolen cards and false identities? Scan your id and send that as well or visit them at AWS personally? Huh?

In the end of the article, there is a thought-provoking paragraph of “Rethinking the Cloud” because a cloud can be used also for malicious purposes. Yep. I’ll do think about this for a while…

Thinking…

Thinking…

…and it should not matter for the most parts. Say, the whole AWS would be used only for attacks and the service level would degrade and my IPs would be black listed, then I probably would switch to some other provider, but, right now, I am not worried the least bit. I have my application and the service level I need in a good and healthy balance.