I’ve never liked the waterfall based model for IT project delivery. Even 20 years ago, it was obvious that a linear approach just didn’t work well when you didn’t know what it is that you wanted to build or how to go about it. It didn’t work for the large scale data warehouse projects that I was working on at the time nor the software development projects that I was involved in later. These projects almost always overran and the pressures on the teams were horrendous when the release dates loomed.
Time-boxing
Way back in 2001, I started using a technique called time-boxing for the projects that I was running. The basic idea is that instead of trying to plan out the entire project, you rather use an iterative approach. You start off by picking a period (your time-box), say three months and agree what it is that you can reasonably accomplish in that time frame.
One of the key features of a time-box, as the name implies, is that the end date or release date is inviolate. If you realise that you are going to over-run then you start dropping items from the scope of the release rather than changing the release date. In a perfect world this meant that we always released on time, the development team didn’t have to work overtime and we got the majority of the high priority items into each release. Everybody’s happy…right?
Problems with time-boxing
There were however a number of problems with this approach. First, 95% of the time our estimates for work items were spot-on (in fact we usually completed the items in less time since we were being conservative). The problem was however with the other 5%, because generally with these items we didn’t just get the estimate wrong, we got it wildly wrong (we are talking about 5 – 10x longer)! We put various processes in place to try and mitigate these including midway checkpoints (i.e. when 50% of the estimated time has been spent we re-estimated the work item). If there was a big disparity we made a decision at that point whether or not to defer the item to the next iteration.
The other problem we found was that because we started working on the highest priority items first, technical work including architectural changes and getting rid of technical debt tended to get left until last and then dropped from the scope of the iteration. To address this, we implemented what we called a tick-tock development cycle (a nod to Intel’s model of alternating manufacturing improvements and processor improvements). We would therefore do a user-facing iteration and follow that up with a “technical” iteration.
Finally, even though we could move items to the next iteration, we still found that there was a crunch towards the end of the release to get everything in by the “dev complete” cut-off date. We also saw that when we got to our final QA and integration testing phase (typically the last week), we often battled to get the QA done or we identified serious problems which resulted in a lot of rework or worse, having to shift an item to the next release that we were sure was ready to go!
Speeding up
Although the time-boxing approach described above worked fairly well, we felt that we could do better. Intuitively we felt that reducing the size of the time-boxes would address some of the challenges that we were experiencing. This was about the time that Agile and Scrum started becoming all the rage and we saw a lot of what we had been doing and the pains that we were experiencing addressed in these methodologies.
We therefore changed our time boxes to 2 weeks. This was a big mind shift! Two weeks is not enough to get a major piece of functionality into production. We therefore had to start implementing feature switches and breaking our work down into ever smaller pieces. We also had to invest heavily to automate our deployment process.
As we matured our processes it became clear that this was indeed the right approach and we decided to move to a weekly release cycle. It still however resulted in “crunches” on a Friday, and it was immediately obvious to us what we needed to do next – continuous deployment.
Continuous deployment
By this time we had gotten pretty good at doing deployments. As our trust in our deployment process matured, it became obvious that the next step would be to deploy continuously. Up to this point, we had always deployed during off-peak hours but we were now confident enough in our automated processes that we started deploying during peak hours. At the moment we are deploying several times a day and we don’t even have to give it a second thought.
Once again it felt like we had been liberated! No more crunch times on a Friday. Changes went into production as soon as they were ready.
But what about sprints?
At around this time, we also started adopting SCRUM terminology and practices into our development. We introduced daily meetings to touch base with each other and identify who needed help. We introduced weekly retrospectives to understand what we had accomplished during the past week and to plan the next week’s sprint. Although we were not following traditional SCRUM in the sense that we only deployed at the end of every sprint, we still found utility in implementing some of its principles.
The benefits
If you have not yet implemented continuous deployment and applying modern Agile practices, do so immediately! You will never look back. Some of the key benefits that we obtained include:
- No more stress! When you are deploying several times a day it becomes routine.
- Increased code quality – since each change is so small, it is easy to review, test, QA and if need be, roll back. We have however found that it is exceedingly rare for us to roll back a change. The number of bugs that are introduced as a result of a deployment has significantly decreased.
- No more branching. We now only develop on the main branch. Any changes that are committed have to either go into production by the end of the day or it is rolled backed. This enormously simplifies the build process.
- We can reprioritise on a weekly basis and because major changes are behind a feature switch we can even park them temporarily to address more critical changes.
- Problems can be identified quickly and action taken to either rethink the approach or make changes to the proposed solution.
The pain
As the saying goes, “there ain’t no such thing as a free lunch” or TANSTAAFL. In order to be successful in implementing some of the practices described above, you need to have some building blocks in place:
- Deployment needs to be highly automated. You need to be able to deploy to your QA/test, staging and production environments at the push of a button. Furthermore, that deployment needs to happen “live”, with zero down-time.
- Rapid roll-back. If something goes wrong, you need to be able to roll back the change rapidly.
- Breaking up epics into small chunks of work. Generally we do not want our work items to take longer than about 8 hours. Breaking a mammoth epic which is expected to take several hundred hours of work can be challenging. I’ll write more about how we do this in a future post.
- Separating database and application tier changes. Due to the zero down-time requirement as mentioned above, all changes that require a database schema change needs to be decoupled from the application tier. This was one of the most difficult issues for us to get right but after a while it also becomes second nature. It just means that every change that requires a schema change needs to be rolled out in multiple steps, e.g. making the application tier forward compatible, making the schema change, removing the forward compatibility.
- Implementing feature switches. Some changes require a lot of development work in order to implement. You therefore need to get comfortable with and implement the necessary tooling in order to keep changes behind a feature switch. That way you can do incremental releases, do usability testing and update documentation before doing a general release.
- Since any change that is currently in QA/test will go into production as soon as it has passed, you cannot have more than one change in QA at a time and you do not want it to sit there for too long. It requires a certain amount of discipline but is not too difficult to accomplish as long as you keep your development team small and ensure that there is good communication between the team members.
- Seeing the wood for the trees. Since you are generally working with very small, incremental changes, you need to have a mechanism to see the bigger picture, i.e. what you are working towards. We use various mechanisms for this, including periodic workshops to review progress on “epics” and weekly release management meetings to coordinate with other departments, e.g. marketing, sales, support, etc.
Conclusion
Generally our adoption of DevOps, Agile, Scrum and Continuous Deployment has been an overwhelmingly positive experience. I believe that the main factors that contributed to this were the fact that we are a small team, the principles align well with what we had already been doing and we are not dogmatic about any of these philosophies.