Last week I attended Codemania (full disclosure, we are a sponsor), so I thought I’d do a writeup of some of the great talks I attended. There were a lot of fantastic speakers this year, and I really got a lot from the day. They’re moving to a two day format next year which I’m really looking forward – twice as much to learn and think about!
Codemania talks – Reproducibility – Gary Benhardt @garybernhardt
Gary’s talk was all about the benefits of reproducible tooling. He described the way that tools like Git, React, and Bundler use immutability to reliably do work. In Git this is achieved by having the signature for each commit be the hash of the parent commit plus the hashes of each file included in the commit, so that each commit is “named” by its contents. This ensures that the data stored in Git is correct, and that repeating the same work given the same current state will result in the same commit objects across multiple machines. This gives us a high degree of confidence that our data has been stored and retrieved correctly, which is pretty important for a distributed source control system with no single source of truth.
Bundler was another example of a reproducible tool. It allows you to create a lock file which specifies the exact versions of each gem and its dependencies to use. This gives you a 100% reproducible artifact to use for builds / deployments. Since some dependencies will be specified in terms of acceptable ranges, you would want to lock these down to specifics once you’re ready to test or deploy your app.
Gary contrasted Bundler to the Node Package Manager (NPM). NPM has a similar command (shrinkwrap) but it is not reproducible. NPM shrinkwrap does not specify version numbers peer dependencies (the things that your dependencies depend on) so it is possible for those things to change. This means that later deployments may end up with a different version of some dependent package further down your dependency tree, a version which hasn’t been tested at all! Not a great outcome.
Gary’s conclusion in his Codemanaia talk was that we should look for reproducible tooling when choosing our platforms, and that we should try to build reproducibility into any tools we build ourselves.
DevOps: Learning to go fast without tripping over the cables – Peter Goodman @petegoo
Peter’s Codemania talk was about how they do DevOps at PushPay, and how that allows them to deploy code quickly and safely (important when you’re in the payments processing business!). his slides are here: https://speakerdeck.com/petegoo/devops-learning-how-to-go-fast-without-tripping-over-the-cables
Peter talked a lot about the tooling that they use, but the real insight was the culture they have at PushPay. Everything is piped into Slack – errors, deployments, post-mortems, ops firefighting, etc. They use tools like Raygun to give problems visibility, and to make sure that the right people know about issues and everyone has the information they need to solve them. In this way, they don’t allow silos to form.
A big takeaway for me was their post-mortem process. Their philosophy is based on the work of Sidney Dekker, who has written a number of books on safety. His view can be summed up as this: “You can’t understand why an accident occurred until you discover why the actions taken that led up to it made sense at the time”. At PushPay, they examine all errors that had a customer impact, and do so in a way that avoids placing the blame on one thing. It may feel better to try and blame on one action, or one person, but unless you believe your employees are acting maliciously then they likely thought they were doing the correct thing at the time. Freed from the blame game, the postmortem can be directed to more productive discussions; what happened, what led to the failure, and what can be done to avoid it in the future.
How we make software: a new theory of teams – Sarah Mei @sarahmei
Sarah’s Codemania talk was all about our team model – how we talk and think about software teams. We’re all here to produce great software, but somehow 68% of software projects still fail. We don’t really know why – we’ve developed a lot of different metrics, processes, and systems for minimising this failure but it keeps happening. Developers have typically responded by focusing on the code side of the equation – more unit tests, more refactoring, better code reviews, etc. Sarah thinks we need to widen our focus to more than just the code, and says we have control over more than we realise.
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
This is one of those pithy quotes that gets thrown a lot. We can think about these problems as people problems. Cache invalidation is hard because the business rules around when it is acceptable to returned cached data vs paying the cost of recalculation may not be clear. Naming things is hard because what a thing is and what it is for is requires domain knowledge, and it needs to be done well so you can communicate your intentions well to the next programmer to touch your code. Both of these problems are hard because understanding the business and understanding people are hard.
Sarah says software development is about people, and only incidentally about computers.
So bring it back to software teams, what are some of the hard problems in building a high functioning team?
- Hiring – getting the right people into your team (and knowing what they are) is hard.
- Turnover – losing someone means losing knowledge and work capacity.
- Growth – growing pains in an expanding team is a real problem.
- Productivity – number of units/things produced (will get back to this one).
Our current model of software development is modelled around Industrial era thinking – “building a product”, “ship it”, etc. These words describe working on something that is physically delivered to a customer and considered done at that point – but there’s very few software products or services now that can ever be considered done, given the auto-update systems of most popular platforms. Models are supposed to explain and predict behaviour, but we’re still using one that treats software like products rolling off a factory assembly line.
Essentially, all models are wrong, but some are useful.
— George E. P. Box
There are still useful things we can gain from the current model of software development (obviously, otherwise no projects would succeed), but it’s time we developed a new model that more accurately represents what we now know. One popular model has been the Workshop model, where people are treated as artisans and craftsmen working on their own parts of a system. This helps with hiring (hire for team dynamics, which are well defined) and turnover (the apprentice system helps keep a constant movement of information in the team). It doesn’t address the growth or productivity problems. We aren’t dealing with physical products, and you can’t just add more people to produce more of the same thing.
Sarah’s suggested model is of software development as the Stage – a group of talented individuals, working together creatively to achieve a goal with a fixed deadline. The process of putting on a stage show has parallels to software development – developing the script is coming up with the idea, the actors are your team. Actors will do read-throughs where they determine how they are going to play their role, given how everyone else is playing theirs. The performance itself is like release day – everyone brings their work together to show the world (and subsequent performances build on the previous ones).
This model addresses the 4 points in the following ways:
- Hiring – actors need to trust their co-workers enough to be able to try out ideas, even if they might be bad. Hiring in this model means finding people you can trust to perform their roles, and letting them work out the details with their peers.
- Turnover – because this team model is build on collaboration, you should find that everyone knows the roles of their coworkers, and has the flexibility to renegotioate shared ideas and rules when someone is replaced.
- Growth – a director cannot make twice as much show with twice as many actors. They could produce two different shows at the same time though. Rather than adding people to go faster on existing work, create new teams to build complimentary products.
- Productivity – doesn’t apply. The endgame is not a product, but an experience. You can’t quantify individual impact.
These days we are building experiences rather than pure products, so it might be time to look at changing the model we use to describe our work.