Developing a git workflow

Our development process was the same for a long time. We wrote features, ran our test suite, deployed to dev, clicked some buttons and then deployed to production. That worked for us until we wanted to integrate our QA team into the development cycle. This is where having one deployable branch (think of a branch as a bundle of code) no longer worked for us.

Three environments

We have three server clusters for three different purposes. There is the Production cluster, the QA cluster and the Dev cluster. Production will (ideally) always be running and stable. QA is where our new code goes to get checked over by the QA team. The Dev server is what we use for everything else.

In order to develop a process we came up with a few requirements:

  1. Developers get to push our code often.
  2. Developers can integrate the QA team into our workflow.
  3. Everyone always knows exactly what code is on Production.
  4. Everyone always knows the differences between QA and Production.
  5. Likewise, everyone knows the differences between QA and master.

Based on those requirements, we came up with this git-workflow:

  • Developers work on feature branches based off of master.
  • When the developer has finished testing/writing their code, they merge that code into the master branch.
  • Jenkins runs the test suite on master after every change.
  • During the day we can merge master into the QA branch.
  • The QA cluster always reflects the code in the QA branch.
  • QA gives sign off to deploy the difference of QA and master to production.
  • The production cluster always reflects the code in the production branch.
  • Each production push generates a new git tag.

This deploy strategy fills all the requirements we had.

  1. Developers can push to master as soon as we finish a feature (and we can push to our feature branch whenever we want)
  2. QA now has a place to test our code
  3. Since we git tag production, we always know exactly what is on the production server
  4. Using GitHub’s compare view, we always know the difference between production and qa
  5. GitHub’s compare view also gives us the difference between QA and master

Two gotchas

The two “gotchas” that have come out of this workflow are stable-time on the QA cluster and a weekly freeze on the master branch.

Having time when the code on the QA server is not changing is good for the QA team. Stable-time gives them confidence that the code is not changing under their feet.

The code freeze came about when we had introduced a bug into QA and we could not push to production until the bug was fixed. But in order to fix the bug we had to merge master and QA again. Unfortunately, while the QA team was discovering the bug, we had already pushed new code to the master branch. That means the master branch contained both the fix for the bug and new code which added to the list of features QA needed to check. In order to fix this problem, we introduced a freeze on master just before the production release.

It’s not a perfect deployment strategy and people do make mistakes. In order to reduce mistakes, we added several safe guards to ensure the correct branch goes to the correct cluster. Think simple sanity checks: “are you deploying the QA branch to the QA cluster?”, etc.

I think this deployment strategy has worked well in practice. The most exciting part is that scripting this whole process would not be too hard. We could even present QA with a button and say “just push this button when you’re ready for the code on QA to go live”. I’m looking forward to being a part of building the tools that get us ever closer to a one-step deploy.

J, the Programming Language

I’ve been an imperative programmer ever since I wrote my first “Hello, world!” program in 2007. I dabbled a bit in Scheme in one of my undergraduate courses and recently revisited Scheme following The Little Schemer and The Seasoned Schemer. But all of my jobs have been in an imperative, object-oriented style of programming.

I recently saw a video of a sudoku solver written in APL. I was immediately fascinated with APL due to its conciseness and the non-ASCII characters that seemed to magically find the values of a sudoku board.

After doing a bit of reading and watching another video of an APL program, I realized the magic was not coming from the strange syntax of APL, but from the style of programming. APL is an array-oriented, functional programming language. The shift from object-oriented to array-oriented is what was causing everything to seem like magic.

Array programming feels much more math-y than other styles of programming. It’s based on arrays, vectors, scalars, matricies and other concepts I can vaguely recall from my Linear Algebra course. There is a focus on applying functions to entire sets of data instead of specific instances of a data object.

Determined to demystify the wonders of APL, I dug around for an interpreter and a way to make those fun symbols. It is much harder to get a working installation of APL on a MacBook than I would have thought. I ended up with a buggy, open source version of APL running on Wine. After reading some forum posts, Stack Overflow and Wikipedia, I came to the conclusion that J was a better fit and an easier barrier to entry to array-oriented programming.

Instead of the unique symbols of APL, all of the notation in J is ASCII based. APL was also fairly expensive for a developer license and the open source implementation only seemed to work on Windows. With J, I was up and running with a console and a vim syntax plugin within 10 minutes on my MacBook.

The next challenge was actually doing something in J. I settled on Project Euler and am making some head bashing attempts at problem 1.

The Problem: Find the sum of all the multiples of 3 or 5 below 1000.
This is what I’ve come up with so far (note: this is probably not the J way to do things):

+/ ((0= (3 | 1+i.1000)) # 1+i.1000)

1+i.1000 yields an array from 1 -> 1000

3 | 1..1000 yields an array of modulo 3 (1, 2, 0, 1, 2, 0 ...)

0= (1, 2, 0..) yields an array of boolean values where the input array is equal to 0 (0, 0, 1, 0, 0, 1 ...)

(0, 0, 1 ..) # (1, 2, 3, 4 ...)  picks each value of the right array where the left array is 1 (3, 6, 9, 12, ...)

+/ is a fold with a + operator.

Now I just have to figure out how to sum an array that mod3 or mod5 == 0…

Here are some resources that I’ve been finding helpful to learn J:

Avoiding “Too many keys specified” in MySQL

Yesterday I tried to deploy some code to a relatively new server. Unfortunately I was greeted with this:

django.db.utils.DatabaseError: (1069, 'Too many keys specified; max 64 keys allowed')

What does that mean? First, we have to figure out what a key is. Stackoverflow is helpful for this sort of thing: http://stackoverflow.com/questions/924265/what-does-the-key-keyword-mean. So a key is an index. Great. We definitely have a lot of Foreign Keys and each table has a Primary Key, but we certainly don’t have 64 Foreign Keys + Primary Keys.

Now that we know that a key is really an index, how do we see what indices exist? Luckily the stack trace that produced the error above at least had a table it seemed to be having trouble with (table: penguin). Running this command shows us the indices:

mysql> show index in penguin;
+---------+----------+-------------+
| Table   | Key_name | Column_name |
+---------+----------+-------------+
| penguin | PRIMARY  | id          |
| penguin | username | username    |
| penguin | email    | email       |
| penguin | email_4  | email       |
| penguin | email_2  | email       |
| penguin | email_3  | email       |
| penguin | email_5  | email       |
| penguin | email_6  | email       |
| penguin | email_7  | email       |
...
| penguin | email_62 | email       |
+---------+----------+-------------+
64 rows in set (0.00 sec)

Ah ha! There are the the 64 keys but why in the world are there 62 email indices? What could be adding an index to the email column during a deploy? Well, the traceback said that it failed during a run of Django’s syncdb, which creates new database tables to align with a model. A little code snooping (thank you git grep) yielded a piece of our code that hooked into the django post_syncdb signal. This was the offending code:

def update_email_field(sender, **kwargs):
  from django.db import connection, transaction
  cursor = connection.cursor()
  cursor.execute("ALTER TABLE penguin MODIFY email VARCHAR(255) NOT NULL UNIQUE")
  transaction.commit_unless_managed()

post_syncdb.connect(update_email_field)

I learned that adding UNIQUE to an ALTER TABLE command will actually add a new index every time that command is run. Thus, we hit the MySQL max and could not add anymore and a deploy failed because of it.

Removing the UNIQUE index and ensuring it is run once on penguin table creation and not again later seemed the simplest solution to this problem.

Testing Chef Cookbooks with Vagrant

When first entering the Chef (an open-source, automated infrastructure framework put out by Opscode) world there is a lot to learn. The cooking metaphor introduces new concepts and the vocabulary can be a stumbling block at first. Unfortunately, there are no shortcuts here.  I find the Architecture Introduction wiki page to be helpful in building a mental image of how Chef works. Once the Chef architecture feels comfortable, it might seem like most of the work will be creating a good cookbook. But how do you test a cookbook without having to worry about setting up a chef-server, chef-client and workstation?

Enter Vagrant. Vagrant is a tool that allows developers to create and configure virtual environments and share the configuration via a file aptly named, Vagrantfile. Vagrant will be your Chef Solo.

There are only a few Vagrant commands needed in order to test a cookbook.

  • vagrant init: Initializes (creates) the Vagrantfile
  • vagrant up: Starts the virtualized environment and runs the configuration code (Chef cookbooks or Puppet scripts)
  • vagrant reload: Re-runs the configuration code after an edit to Vagrantfile
  • vagrant ssh: Gives ssh access to the virtualized environment
  • vagrant destroy: Completely destroys the virtualized environment

The Vagrantfile is where all the Chef configuration will happen. There is a section inside the Vagrantfile that can be uncommented to use Chef (or Puppet) as the provisioning tool.

Before continuing, make sure you have a cookbook or two to test. Cookbooks can be found all over Github, but the opscode-cookbooks are the standard cookbooks to get started with.

Inside the Vagrantfile, a few configuration steps need to happen. First, Vagrant has to know where the cookbooks are. Secondly, Vagrant needs to know which cookbooks to run. Once these two pieces of information are in place, run vagrant up and you will see the virtual environment’s output as it runs the cookbooks along with any errors and stacktraces that might happen.

Overriding attributes is a common task when installing cookbooks. Attributes are used to specify a software version and other customizations that take place during software installation. Here is an example Vagrantfile that shows the use of overriding attributes to install Ruby 1.9.3 (using the chef-rvm cookbook) and make it the default Ruby on the virtual environment’s system.

Vagrant::Config.run do |config|
  config.vm.box = "lucid64"

  config.vm.provision :chef_solo do |chef|
    chef.cookbooks_path = "cookbooks"
    chef.add_recipe "rvm::vagrant"
    chef.add_recipe "rvm::system"
  # You may also specify custom JSON attributes:
    chef.json = { 
  #  Override attributes here. Each cookbook will specify which attributes to override.
      'rvm' => {
        'rubies' => ['1.9.3'],
        'default_ruby' => '1.9.3',
        'vagrant' => {
          'system_chef_solo' => '/opt/vagrant_ruby/bin/chef-solo'
        }
      },
    }
  end
end

The testing cycle for me goes something like this:

  1. Make changes to Vagrantfile
  2. vagrant reload
  3. vagrant ssh
  4. See if stuff is working how I want it to
  5. Repeat 1-4 until satisfied

Here is the SSH part. It shows that the default ruby interpreter is indeed running Ruby 1.9.3.

$ vagrant ssh
Linux lucid64 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59 UTC 2012 x86_64 GNU/Linux
Ubuntu 10.04.4 LTS
Welcome to the Ubuntu Server!
 * Documentation: http://www.ubuntu.com/server/doc
Welcome to your Vagrant-built virtual machine.
Last login: Thu Jun 7 01:57:06 2012 from 10.0.2.2
vagrant@lucid64:~$ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
vagrant@lucid64:~$

So that’s pretty much how I work on cookbooks I’m writing. I find it to be a nice, simple way to get started writing cookbooks.