Working at ezCater is an unbelievable opportunity to witness scaling close up. Requests, transactions, revenue, it’s hard to find something not going up and to the right. That’s great, but in each area of growth requires figuring out how to scale it.
The Hardest Thing to Scale
The particular metric that was keeping me up at night was the “number of engineers trying to work together in harmony”. Ten, twenty, forty engineers are simply different scales and they require different systems to ensure things move smoothly. But how would we even describe a “happily scaling” organization from a developer productivity standpoint?
Tom Petr addresses this in his slides on Maximizing Developer Productivity in a Microservices Environment. This presentation shows the KPIs & shape of the curves I want for our organization.
As you can see, Petr illustrates fewer people and fewer machines runing ops for more developers who are more productive. Yes, please. So, what does that mean for you the end developer? Will this all be a massive PITA? If we do it right, it should be just the opposite.
How Long Does it Take to Put a New App in Production?Our particular pain? Last October, launching a new service, with a new DB, LB, etc., was about a week long turnaround. It was all automated and didn’t take much effort, but it required a bit of Chef, some terraform, a PR and… a human in DevOps. It’s amazing how that little bit of friction really adds up. When you know you’re going to have to bother someone to run one little thing it restricts your thinking and cramps your design. For example if you want to release an independent service, but realize you can tack it onto an existing service to release sooner, wouldn't you? So what would a better world look like?
Our new goal was pretty easy to define:
Going from rails new to scalable monitored app in production should take me < 1 hour.
A few pieces had to come together to make this happen:
- Something to run them
- A developer describing how they wanted their containers run
I’ll leave our decisions and implementation of Docker & Kubernetes to another blog post. Here I’m going to focus on #3. As a developer, how do I easily declare how I want my app run. An example here is worth many words, so let’s start there. Here’s where we ended up.
Hopefully it's pretty clear what the intention of this is, since being clear is this file's singular goal. We specify the image we want, and then there are a few "types" of process we can run. The type of the process determines whether we'll add LBs, Cron, ReplicaSets etc. The slack tells Jenkins where to put messages about deploys.
Of course this service.yaml is not valid Kubernetes YAML, so we need some “glue” to make things happen. For that we use an internal gem hephaistos. It has a simple task read these service.yaml files and output Kubernetes YAML.
How does this fit in to a full deployment process? A couple steps:
- A developer commits
- Jenkins detects the commit and pulls the repo
- Jenkins reads the service.yaml and build a Docker image
- Jenkins runs hephaestus which translates the service.yaml to Kubernetes YAML
- Jenkins applies the service.yaml to our Kubernetes cluster
Did it Work?
As a fun acid test we had a developer try to get an app into production ONLY reading the wiki (ie no DevOps handholding). And check it out! Literally a buzzer beater (we had a couple odd bumps in our VPN setup but "hard" bit all took 15 minutes.
All new apps at ezCater use our new Kubernetes cluster and define their jobs with a simple service.yaml. We launched in March and there are at least 7 apps in production already, with more coming online almost daily.
I like to term the results the “Cambrian explosion” of services. Now that it’s easy to uncouple things we have embraced that with gusto.
Kubernetes is a fabulous runtime and ecosystem for your containers. But there is a “missing” piece about “how do I actually deploy”. There are a lot of tools in the space and most are very opinionated. We like the idea that a robust, declarative yaml file can let developers describe their intent while not tying our hands on implementation. In practice this has proved to work well.
Subscribe for updates to the blog if you are interested in hearing more.