Deploying a Django web app with Ansible
I've just started on a new project, it's a short project, around 7 weeks. We had 3 days for iteration 0 to get a few things setup. One of those things was a reliable means to deploy the exisiting web app to a new instance on AWS. The codebase had some exisiting Chef scripts that would install all of the dependencies (Postgres, Nginx, and things like that). While this looked great from the outset we quickly found that these scripts weren't working as expected and spent so much time debugging it that we decided to switch to Ansible as it's known for it's simplicity.
We started by looking at the server config. There were quite a few things that required installing on a blank Ubuntu box. We started with the simple things, but quickly decided the amount of time required to configure things like Postgres from Ansible with our knowledge and time frame was not feasible. So we decided to manually configure the server and leave that as something to come back to, the priority was a working test environment that we could reliably deploy to as we needed.
Side note: we actually found it was faster to provision an EC2 instance in AWS than it was locally using Vagrant (but maybe that was a combination of slow internet an the Virtual machine not having enough memory). When I was first writing a playbook and running them through Vagrant it was so slow that I was sure it had failed as Ansible wouldn't give me any sort of output until it completed.
With our manually configured working we started looking at how we could deploy the Django web application. The things we identified as required during deployment were
1. Checkout specific version from git (I'm not familiar with Python so it looked like a simpler option than trying package the application in any way)
2. Install pip requirements (this might change during development so it needs to be done each deploy)
3. Sync db (as the database changes we need Django to sync)
4. Migrate DB (run change scripts against the database)
5. Start the server (obviously :P, we ignored Nginx as it added more complexity than we need for now in a test server)
To begin with we started with the bare essentials for Git which was checking out the HEAD version. We came back putting it all together.
Run the Python tasks took a little bit to get going. Essentially there were 3 things that needed to be run:
pip install requirements.txt
(the Pip Ansible module takes a virtual environment and will create it if it already does not exist)
python manage.py syncdb --noinput
python manage.py migrate
We wanted to separate the logical tasks so we used 3 separate tasks although it could have been possible to use a single task for the last two commands.
We looked at a couple of different Ansible modules to achieve the last two tasks - Command, Shell, and Script. We had some problems trying use Command and Shell, the problem seemed to do with Source (before each command we needed to run Source blah/bin/activate to activate the virtual environment). We didn't spend too much time trying to figure out why they failed because the script task solved the problem, two short scripts later we had the database in the required state.
Running the server became quite a challenge. There were two solutions that both should have been quite simple. First is just running the server nohup, the second is using Upstart on Ubuntu. Upstart is pretty new to me, I'd used it once before to start a Selenium Grid hub. We decided to go with the more known option which was nohup. It was simple in the sense that I should have been able to again use Ansible's script module to activate the virtual environment and then start it using nohup. We started by SSHing into the EC2 intance and verified that we could indeed run the application using nohup. Awesome. When Ansible got added into the mix though it failed for an unknown reason. Ansible didn't report any sort of errors, it just wasn't running. So we modified the script to start writting stdout and stderr logs to the user's home directory. Same again. We tried manually after SSHing into the server and it would work fine. Again being impatient I decided to look into the other method for starting the server, Upstart. Upstart meant that we would have to create a .conf file and copy it /etc/init then get Upstart to reload configurations by running `initctl reload-coniguration` then start the new Upstart service by running `sudo start SERVICE_NAME` or `sudo restart SERVICE_NAME` if the service is already running. Again we used Ansible's copy module to copy the file the script module to reload the config and start the service. This proved much more troublesome than I anticipated. I added echo statements everywhere to debug it. Sometimes it would make make it some way. Sometimes Ansible would hang while trying start the service. Sometimes I couldn't even kill the service manually on the instance. I eventually narrowed down my problem and removed a single line from the .conf file - `expect fork`. For some reason if your script doesn't actually fork it just causes a whole bunch of problems.
So now we had a method for deployment that would always deploy the latest version of the application to our test environment. There were two more things that we wanted to do from here. First we want to be able to deploy specific versions so we have finer grain control over what is running on our test environment. Second we want to deploy from the end of our CI build (using Snap CI). These two task fit nicely together, Snap CI gives us an environment variable to let us know which commit triggered the current build (and which revision all of our tests are running against). The SNAP_COMMIT environment variable contains the SHA1 hash for the commit which is exactly what we needed in the playbook.This was then easily fed into the Ansible playbook using `{{ lookup('env','SNAP_COMMIT') }}`. The second part was also remarkably simple to achieve. Snap CI needed a copy of the private key file used to SSH into teh EC2 instance, which is trivial to do as it provides a secure way to store exactly that.
With that we had a relatively simple way to deploy our web application to EC2. This was my first foray into using Ansible and to a lesser extent AWS (I'd played with it but never in anger). I'm extremely happy with the simplicity of Ansible and the flexibility we have in our deployment method. I want to do this again with a Clojure application, I think the JVM will simplify the configuration the instance in the first place and I can improve this even more. All of the code is available in github, it's bit of a mess at the moment, but it will get attention over the next few weeks.
There are a lot of future improvements that can and will happen with this however Ansible gives us the flexibility to be able to achive this. The obvious improvement is add another playbook to configure a server from scratch so that we can spin instances up even easier, and adding Nginx into the mix. I want to make some changes to our inventory and use the Ansible EC2 module, I don't want to keep track of the IP address or multiple addresses for EC2 instances. I think I should be able to tag particular instances that run the application then get ansible to deploy to all of them. I think it would be useful very soon to have the database running on a different server so that we can scale, it also means that tearing down a Django server won't result in data being lost, it also allows us to move to blue-green deployment.
We started by looking at the server config. There were quite a few things that required installing on a blank Ubuntu box. We started with the simple things, but quickly decided the amount of time required to configure things like Postgres from Ansible with our knowledge and time frame was not feasible. So we decided to manually configure the server and leave that as something to come back to, the priority was a working test environment that we could reliably deploy to as we needed.
Side note: we actually found it was faster to provision an EC2 instance in AWS than it was locally using Vagrant (but maybe that was a combination of slow internet an the Virtual machine not having enough memory). When I was first writing a playbook and running them through Vagrant it was so slow that I was sure it had failed as Ansible wouldn't give me any sort of output until it completed.
With our manually configured working we started looking at how we could deploy the Django web application. The things we identified as required during deployment were
1. Checkout specific version from git (I'm not familiar with Python so it looked like a simpler option than trying package the application in any way)
2. Install pip requirements (this might change during development so it needs to be done each deploy)
3. Sync db (as the database changes we need Django to sync)
4. Migrate DB (run change scripts against the database)
5. Start the server (obviously :P, we ignored Nginx as it added more complexity than we need for now in a test server)
To begin with we started with the bare essentials for Git which was checking out the HEAD version. We came back putting it all together.
Run the Python tasks took a little bit to get going. Essentially there were 3 things that needed to be run:
pip install requirements.txt
(the Pip Ansible module takes a virtual environment and will create it if it already does not exist)
python manage.py syncdb --noinput
python manage.py migrate
We wanted to separate the logical tasks so we used 3 separate tasks although it could have been possible to use a single task for the last two commands.
We looked at a couple of different Ansible modules to achieve the last two tasks - Command, Shell, and Script. We had some problems trying use Command and Shell, the problem seemed to do with Source (before each command we needed to run Source blah/bin/activate to activate the virtual environment). We didn't spend too much time trying to figure out why they failed because the script task solved the problem, two short scripts later we had the database in the required state.
Running the server became quite a challenge. There were two solutions that both should have been quite simple. First is just running the server nohup, the second is using Upstart on Ubuntu. Upstart is pretty new to me, I'd used it once before to start a Selenium Grid hub. We decided to go with the more known option which was nohup. It was simple in the sense that I should have been able to again use Ansible's script module to activate the virtual environment and then start it using nohup. We started by SSHing into the EC2 intance and verified that we could indeed run the application using nohup. Awesome. When Ansible got added into the mix though it failed for an unknown reason. Ansible didn't report any sort of errors, it just wasn't running. So we modified the script to start writting stdout and stderr logs to the user's home directory. Same again. We tried manually after SSHing into the server and it would work fine. Again being impatient I decided to look into the other method for starting the server, Upstart. Upstart meant that we would have to create a .conf file and copy it /etc/init then get Upstart to reload configurations by running `initctl reload-coniguration` then start the new Upstart service by running `sudo start SERVICE_NAME` or `sudo restart SERVICE_NAME` if the service is already running. Again we used Ansible's copy module to copy the file the script module to reload the config and start the service. This proved much more troublesome than I anticipated. I added echo statements everywhere to debug it. Sometimes it would make make it some way. Sometimes Ansible would hang while trying start the service. Sometimes I couldn't even kill the service manually on the instance. I eventually narrowed down my problem and removed a single line from the .conf file - `expect fork`. For some reason if your script doesn't actually fork it just causes a whole bunch of problems.
So now we had a method for deployment that would always deploy the latest version of the application to our test environment. There were two more things that we wanted to do from here. First we want to be able to deploy specific versions so we have finer grain control over what is running on our test environment. Second we want to deploy from the end of our CI build (using Snap CI). These two task fit nicely together, Snap CI gives us an environment variable to let us know which commit triggered the current build (and which revision all of our tests are running against). The SNAP_COMMIT environment variable contains the SHA1 hash for the commit which is exactly what we needed in the playbook.This was then easily fed into the Ansible playbook using `{{ lookup('env','SNAP_COMMIT') }}`. The second part was also remarkably simple to achieve. Snap CI needed a copy of the private key file used to SSH into teh EC2 instance, which is trivial to do as it provides a secure way to store exactly that.
With that we had a relatively simple way to deploy our web application to EC2. This was my first foray into using Ansible and to a lesser extent AWS (I'd played with it but never in anger). I'm extremely happy with the simplicity of Ansible and the flexibility we have in our deployment method. I want to do this again with a Clojure application, I think the JVM will simplify the configuration the instance in the first place and I can improve this even more. All of the code is available in github, it's bit of a mess at the moment, but it will get attention over the next few weeks.
There are a lot of future improvements that can and will happen with this however Ansible gives us the flexibility to be able to achive this. The obvious improvement is add another playbook to configure a server from scratch so that we can spin instances up even easier, and adding Nginx into the mix. I want to make some changes to our inventory and use the Ansible EC2 module, I don't want to keep track of the IP address or multiple addresses for EC2 instances. I think I should be able to tag particular instances that run the application then get ansible to deploy to all of them. I think it would be useful very soon to have the database running on a different server so that we can scale, it also means that tearing down a Django server won't result in data being lost, it also allows us to move to blue-green deployment.