Aditya Agarwal & Ryan Atkins | June 23, 2020
Bootstrapping Engineering Operations
The why, what, and how of building an Eng Ops function
Editor’s note: COVID-19 has had a dramatic impact on the way we live and work. This post was written just before these rapid changes took effect. While we are not addressing the current challenges of distributed work directly, we hope this post provides you with a valuable framework for how to think about effectively operating an engineering organization in these unprecedented times.
From 2013–2018, Dropbox 10x’d its engineering team from under 100 to over 1000 engineers. Aditya, former CTO, and Ryan Atkins, former Head of Engineering Operations share the why, what, and how of building an Eng Ops function.
This post will be the first in a series of content related to effectively operating an eng org through scale.
Part I: Growing Pains
If you’ve been a part of an engineering org that has gone from, let’s say 20 to 200 engineers, then this evolution of an eng onboarding program may look familiar:
- In the earliest start-up days, you have a completely ad hoc process: one person, probably one of your first engineers, has the full tech stack stored in her head as “tribal knowledge.” She tells new engineers what to install and what commands to run to spin up their development environment and get access to production.
- Eventually, to save time and reduce errors, a volunteer (or volun-told) senior engineer (either IC or EM) emerges who will document the process and expectations, turning eng onboarding into a structured checklist. “Follow these steps and you’ll be up and running on your own.” But soon the checklist falls out of date and doesn’t keep up with the complexity of your product or infra.
- You realize there’s a problem and a committee forms and creates a pseudo-program: they rotate delivery of the onboarding architecture and coding workflow talks. But over time, this content becomes stale, your various presenters deliver divergent information, and the one-size-fits-all approach doesn’t account for the diverse engineers you hire.
- You hire a dedicated technical program manager who builds out a complete curriculum for new hires, accounting for their experience and expertise, complete with hands-on code labs and formal mentorship. This TPM is your first Eng Ops hire.
Are you in, or in between, one of these phases currently? At each point along the scaling process, eng leaders typically face two interesting questions: A) when is it worth the investment to move from one stage of maturity to the next and B) what is the best way to do it?
To answer A), ask yourself these questions to diagnose the timing:
- How long does it take new engineers to ramp up and how much do they slow down experienced engineers in the process?
- Do new Engineers still ask basic questions about architecture and workflow after their first month?
- If you’re in phase 2, is there one senior engineer who is already spread thin who is also shouldering the load of managing eng onboarding?
- If you’re in phase 3, is the operational execution of the committee either sloppy or inefficient? Are your most valuable engineers doing program management or coordination work instead of setting technical strategy?
When you recognize these symptoms and know you’ll have more engineers to onboard soon, it’s time to advance stages. Now it’s on to question 2): how best to do it?
Eng Ops is a new technical function dedicated to improving the effectiveness, efficiency, and happiness of your engineering organization.
The answer is relatively simple: dedicate someone to the task. Someone who knows how engineers work, communicate, and learn. Someone who knows how to create systems and build programs. This person might initially be, but doesn’t need to be, an Eng Manager. Instead consider making this person your first Eng Ops hire. Eng Ops is a new technical function dedicated to improving the effectiveness, efficiency, and happiness of your engineering organization. Unless you’re onboarding 6+ engineers a month, you might be thinking that there is no way you could justify a full time hire for Eng Onboarding. But, if you think more broadly about what this hire and a new Eng Ops function can do for your org, you’ll quickly change your mind.
Part II: Why Eng Ops?
As an engineering team scales into an engineering organization, new complexities emerge that, if only informally addressed, can start to dismantle productivity and culture. While scaling, inter-team collaboration and communication breaks down, and individual learning slows, as new engineers struggle to understand technical tribal knowledge. Typically, ad hoc, unscalable solutions arise, often cooked up by eager, yet over-subscribed eng leaders. So how do you sustainably solve these problems to maintain your development speed, protect and evolve your culture, and keep your engineers engaged and productive?
The answer for us at Dropbox was to establish a team dedicated to solving these problems. We called this team Engineering Operations or Eng Ops, which took engineer-centered processes, like communication, collaboration, and learning, and turned them into scalable programs and systems. We’ll be the first to admit that embedding an ops team isn’t that novel of an idea: sales, marketing and customer support functions have been doing it for awhile — why not eng?
Eng Ops exists to systematically improve the effectiveness, efficiency, and happiness of an engineering organization through scale. Effectiveness means engineers and engineering teams are able to solve the right problems for their users, measured by product quality and customer satisfaction. Efficiency means engineers solve the right problems quickly, with minimal duplicated effort, measured by go-to-market velocity and overall shipping speed. Happiness means engineers have a sense of connectedness to and autonomy in their work, feel like their growth rate is accelerating, and are recognized for their efforts. Happiness is measured by engineering retention rate and overall satisfaction scores.
So that all sounds great, but what does Eng Ops actually do, other than eng onboarding? Eng Ops builds and manages Programs and Systems. Concrete examples of Programs are: ongoing mentorship, hackathons, eng interviewer training, technical leadership development, tech talks, community building (i.e. for Senior Staff engineers or front-end developers), etc. Example Systems include: dynamic org charts, ownership maps, go-to-market coordination plans, communication standards, efficiency dashboards, headcount tracking, etc.
You might be thinking, “Wait a second, how is this different than what a Developer Productivity, Dev Ops, or even an HR team would deliver?” Developer tooling and workflow are critical areas to staff, but this is not the focus of Eng Ops. HR teams solve global, company-wide problems and set policy, which are rarely eng-specific. Eng Ops, directly embedded within engineering, is focused on eng org health and the development of engineers. Eng Ops isn’t just making it easier to write code; it’s making it easier to navigate the software development process at scale and to accelerate engineering careers. Eng Ops team members need to be technical enough to know how software is built, and how engineers learn, communicate, and operate.
Part III: How-to Eng Ops: Deep Dive
Of the numerous programs and systems within scope for Eng Ops, let’s dive deep into another one of them: scaling systems of ownership. A well-executed strategy around product and service ownership makes your engineers more empowered by establishing clear swimlanes and decision-making authority, and more efficient by tracking down the right people to solve the right problems. A clear system of ownership also helps identify staffing gaps or “bus-factors” and makes triaging incidents infinitely easier.
The critical ownership tool developed by Eng Ops at Dropbox was (uncreatively) called “OwnIt”. This tool was designed to streamline and highlight ownership of infra services and product components, while literally echoing one of Dropbox’s earliest engineering values: “Own it!”.
OwnIt started, as many systems do, as a disconnected spreadsheet, thrown together by a percipient eng manager. He knew that, as the product and infra surface areas expanded, without a source-of-truth for ownership, there would be lots of thrash, duplicative work, and worst of all, wide gaps in accountability. Eng Ops formalized the spreadsheet, building out a web app with snappy search and browse UI, and alerting on top of a postgres database with ownership info on 100s of product components and infra services. If you were an engineer on call and you got paged about a broken log-in flow when accessing a shared link on mobile, you could quickly find the team who owned mobile log-in. OwnIt was just one example of a tool that promoted ownership and helped engineers navigate Dropbox to do their jobs more effectively and efficiently.
Part IV: Going from zero to one
The development of the OwnIt App didn’t happen overnight. It followed a very predictable pattern in how organizational problems are addressed, illustrated in the figure below. Basically, you get what you pay for, in terms of how the result is dependent on how you “staff” the problem.
You’ve read this far (thanks!) and hopefully you’re starting to be convinced by the value of Eng Ops. If you’re thinking about where to start, first list out the organizational pain points you feel. Describe what stage they’re in, and which problem, if solved by a program or system, would give you the biggest return.
Next, find the right person to build this program or system: an Eng Ops Technical Program Manager. Look for someone who understands how engineering teams work, and how people communicate and are motivated. Find a leader who loves both operating efficiency and people, wants to build programs and systems, and communicates effectively with engineers.
A successful Eng Ops team will help the engineering org in a number of measurable ways. Engineers should not only feel more productive, but be more productive by having more hours in the week to write, read, review, and ship code. Engineers should be able to find accurate information about the org faster. Engineers should feel like their learning is accelerating and actually accelerate their learning demonstrated by increased leadership and scope of impact. Engineers should feel more engaged and be more likely to stay with your organization for longer. A successful Eng Ops team builds the programs and systems that increase effectiveness, efficiency, and happiness.