Scaling Agile: How to measure progress? with Johanna Rothman
Why is velocity such a problematic measure of progress? What are the classic mistakes people make when thinking about Agile Program metrics? Which metrics are the most effective at communicating progress and status?
Johanna Rothman is among the few people who witnessed firsthand the emergence of Agile applied to programs, and pioneered the approach with a variety of customers.
She is the current agileconnection.com technical editor. She is the author of several books, including "Create Your Successful Agile Project: Collaborate, Measure, Estimate, Deliver"
- Johanna’s blog on Managing Product Development
- Will Hayes' article on Agile Metrics
- Johanna's profile on Amazon.com
(lightly redacted for clarity)
Why is measuring progress in Agile programs so different from other sorts of Agile measurements? (1:48 to 3:24)
Well, in some ways it's not that different. We still want to measure progress. We still want to measure what we think we have delivered against what we had hoped to deliver.
But, I think one of the big things is that we cannot just take team metrics and expand them, right? So, we cannot take, for example, velocity from one team and velocity from another and add them together, right? That's like adding... well we're both women I can say this. This is like adding our weights together and dividing by two and getting an average weight for an average woman. Sorry, doesn't work, right?
There's a lot of other stuff about velocity that's kind of crazy: what you can infer from one team does not expand linearly to other teams. So, that's really the issue in programs: even if one team has a reliable velocity and they understand what they can do when they deliver, and they deliver, and they deliver, that's not the same as being able to say, "What does this other team do?" And, we have the advantages of Agile in programs: the fact that we have empirical measurements. We can measure what people have done and make very small predictions about what they can do.
The metrics at a team level: do they have the same aims as at the program level or how do they articulate? (3:24 to 06:02)
So, this is the really interesting thing: a lot of us use burn-up and burn-down [charts] for a team for velocity for example. And, even if you didn't do story points, even if you do a burnup and burndown on features, I mean I happen to think that that's better. But, how do you see the overall progress of a program?
Back in the 80s when I used a Feature Driven Development form of lifecycle it was actually called Staged Delivery. We could see progress every month because everybody delivered into the program every month, but it was uneven.
The Product Backlog Burnup Chart
Maybe I can show one of the metrics that I happen to really like. The top of this slide says “Measure features, not velocity”. The interesting thing about measuring features is that even if teams complete features at a regular pace, they might not complete them across the entire feature set. And, especially in a program, I have found that sometimes you want more in places, sometimes you want less. Also, the rate of arrival for features is not even across the program. So, I happen to really like looking at what the features are.
If you look at every individual team every individual team might be doing exactly what they wanted to do. But because we have these different... Sometimes the product owners add more features. Sometimes they actually take features away. Okay, that's not very common, but sometimes they do. And, because you have this uneven arrival and departure of features, even if every single team is doing exactly what they thought, it's not what we had thought at the beginning of the program. So, in a project we might have something like this, but it's smaller.
And then, you think about “How many feature sets are we doing? How many teams do we have?” If you have a program of 40 teams you cannot do one of these for every single team and understand what they're doing. You have to increase the totality of everything and really look at everything together.
So, you mentioned here specifically: "Measure features not velocity." But, we see a lot of temptation sometimes to just say "What's the average velocity of our program? Why's that team's velocity not matching that other?" Can you give us a more detailed idea of why a consolidating velocity at the program level is such a bad idea? (06:02 to 08:09)
Let me give you an example: when you drive somewhere you guys think in kilometers because you're from Europe. And, I think in miles because I'm from the US. Now, we actually have a translation of miles to kilometers. We understand what the fixed length is of a mile and the fixed length is of a kilometer.
We don't understand that for story points. And features are of uneven sizes. Intuitively, we tend to think that velocity is a standard measure. But that's not how we're using it in Agile projects and programs. Because we're not using it that way, that makes everything that relies on velocity suspect.
Velocity is personal to each team
Any given two teams have a personal velocity. They don't have a global velocity. We cannot translate your team's velocity of either X stories or Y story points because it doesn't really matter to my team's A stories and B story points. There's no common denominator.
Which is why I prefer looking at the Product Backlog Burnup chart instead of looking at velocity: because it shows the different sizes of the different feature sets. [FS means "Feature Set"] And you can see how much of a given feature set team or teams have accomplished. This allows you to look at program progress and not use velocity.
This shows a difference between Agile programs and Waterfall programs because some of the more important metrics of the Agile project are not uniform across the teams. Are there other differences between Waterfall measurements and Agile measurements at the program level? (08:09 to 09:57)
Well, I think so. I mean the biggest one I see is the Gantt chart. I mean I've been managing projects and programs for a gazillion years and I have yet to use a Gantt chart. Because a Gantt chart is a wish list; that's what we hope to accomplish. And, we can put a stick in the sand and say we really want to achieve this accomplishment but until we actually get there, we don't know.
Now, the interesting thing about Agile approaches is that we can say exactly the same thing from a roadmap perspective, and then say: “What does it take for us to get there?” and measure ourselves along the way. I think that that "measuring ourselves along the way" piece is what's really, really different from an Agile program to a more Waterfall program.
And, I would actually suggest that very few successful program managers have really used Waterfall. They've used iterative approaches, they’ve used incremental approaches because that's how you manage the risks. Now, they might have had to do that inside of a Waterfall SDLC [Software Development Life Cycle], but most program managers have actually used some combination of iterative and incremental approaches.
Which shifts in mindset do you think are necessary when approaching Agile metrics? (09:57 to 13:21)
I think the biggest thing is to say what is it that you want. For example, I'm always saying I want to see completed features.
You mentioned synthetic / surrogate measurements. For example, earned value is a surrogate measurement. In software especially I don't know how to actually calculate or estimate earned value because we can change anything whenever we want. I understand earned value from a hard product perspective. For instance: you put together a table. You have legs, you have a top, you have some value. But, you don't really have any value until you have the entire table. In software we have value way before then because our customers and our users can actually use our software because we have features. So, I mean, this is for those of you who are listening who are creating hard goods I think you can do some form of earned value. But, for software products I'm just not so sure.
So, the mindset I think of is: "How do we use our empirical data? What data do we want to collect? And, how do we use that so that we can actually understand where we are and where we need to be?"
The Product Backlog Burnup Chart
This chart, the Product Backlog Burnup, is the one I really wanted to talk about.
Here's one of the things that I see a lot in programs: there's the organizational lead in cycle time: “When do you actually take an item and put it on a team's backlog? When do people start to work on it? When do people all think it's done?” That's the cycle time for a given item.
In a program, especially, we can see that often we have one team starts to work on one thing and then they need another team to finish it. And so, that's what these arrows are about that they require two teams to finish it.
Measuring the cycle time for given features is a lot more useful than measuring almost anything else because that will tell you: "When do we think we can actually get something out of this team?" I shouldn't say out of this team. The team wants to release as much as I want them to release it. But, what are the organizational issues? And, this is something that program managers really need to look at. How do they figure out what's really going on across the program?
I see you've got two things up there. You've got the cycle time which is really the aspect of how you get the feature out. And then, you've got the lead time. What would be the distinction between the two? How would you use them differently? (13:21 to 15:53)
Cycle time is: "How long does it take for a feature to get out of a given team or teams and really agree with people that it's done?"
The lead time on the other hand is: "How long does it take for us to put something on a backlog, right, where we cue it up for a team to actually start working on it?" Not in a roadmap. A roadmap is a different beast altogether. And then: "When do we release it to customers?"
An example of cycle vs. lead time
So, I have a client - who shall absolutely remain nameless - where it takes them about two weeks to release anything give their deployment capabilities. So, the teams churn out features. The teams work on stuff all the time but it takes them two weeks to actually get anything to a customer.
They brought me in and said: “We really need to speed up our program management.” I thought, "Okay, I can understand that." So, I created a chart like that and when they realized that the time between t4 and t5 was a consistent two weeks. That's when they said: "It almost doesn't matter what the teams do. We needed to see this kind of a chart."
I think one of the issues with program management is to say – and we talked about what the program looks like the last time –: “There's the technical teams, what the feature teams do or if you don't have feature teams then what the technical teams do. And then, there's the greater core team, where you shepherd the business value across the organization.” The core team needs to know that the time between t4 and t5 is two weeks because that's an organizational impediment.
I think that this mindset of looking at: "It's not just software, it's not just the feature teams or the technical teams and it can't be just the business, what the business wants. But, it has to be how do we all work together as an organization to make this work."
These are sort of the metrics that you would find specifically in Agile projects. Is there anything that can be rescued from Waterfall? Any types of metrics that you'd be able to use in an Agile program even though they actually were originated in a Waterfall mindset? (15:53 to 18:20)
I think that one of the things that we might be able to rescue from Waterfall is the idea that there is work that's done that's not yet released. I think that we saw that a lot in Waterfall projects, and I still see this a lot in Agile programs and projects.
I think understanding: "What have we done, right? What's waiting for release?" – in the form of a board as opposed to anything else, because I think that the board is more transparent than numbers for something like this – is important.
Aside from that, I would definitely look at defect metrics: I think it's really important to say: "What kind of defects are we creating? How many are we closing? What's the story with our defects?" Defect escape rates, all that stuff. That's exactly the same from the Waterfall approach.
So, we actually know how to calculate the cost, at least, of a software program. Because software programs almost never have capital expenses. They have operational expenses. So we know the run rate of the teams. And often, our managers want to know: "How much are we spending?" So, we can tell them, right? We know what each team costs. So, we can absolutely tell them what the cost is for a given program.
Product performance metrics
Finally, you’ve got product performance metrics, anything you use to build your software product. All of those measurements are exactly the same from a Waterfall perspective as they are from an Agile perspective because we're talking about the product. And, we're talking about how long it takes for us to re-release and all that stuff. So, I think in that sense the product-kinds of measurements are exactly the same from the Waterfall programs as they are for the Agile programs.
In the last interview we spent some time talking about why the concept of a product was important and the role that it played in Agile programs. Can I ask you to tell us a bit more about that? (18:20 to 21:17)
So the interesting thing when you take a product perspective as opposed to a project or program perspective is that you start to think holistically. I've been using the terms "Optimizing up". And, you can't see me but I have my hands going up.
Because when you start to think about the product level you think about: "What do I need to release this product to my customers?" And then, "What is a Minimum Viable Product? Possibly a Minimum Viable Experiment? What is the minimum – how little can I do and still have something of value?" And finally: "How often can I re-release for a given product so that I incrementally add value?"
An example of thinking in products
I was working with a company on their particular product which is in the education space. It was a small program, six or seven teams plus a couple people from marketing and sales and training, because they had to run some customer training on a regular basis.
And, they had thought that because the grades component of the product could not change during a semester, they could not change anything in the product during a semester.
But, if you actually figure out how to take a product perspective and say: “How can we release this as different little products?”, like layered products on top of a platform. That way, you don't actually have to write the platform as a platform you can write the platform as you do these other layered products. If you think of grades as a layered product, if you think of homework as a layered product and these are verticals on top of this platform. Then, you can say: "How often can we release every single piece here and have it make sense for our customers?"
So, when you start to think about a platform and layered products you say: "Oh, okay so we cannot release grades during a semester. But, we can release homework, we can release alumni, we can release all the other components. As long as we are not releasing grades, we could release that.”
And, that came from thinking of it as a product not as a project or a program.
Different people on the program will have different perspectives. How can you reconcile what the management cares about and what the team cares about? (21:36 to 24:50)
This is a really tough one because what managers often care about is meeting the roadmap targets. So, you have a roadmap and managers love quarter by quarter roadmaps. I understand why they love them because they want to be able to predict, just a little bit, and plan out just enough so that they can have conversations with their customers, with potential buyers.
The problem is the quarter by quarter roadmap is a wish list. It looks really nice, but it's a wish list. And, the team really cares about: "What do I need to do now that I can deliver?" So, the management is looking at the very big picture. And, the team's focused on what they have to do now. If they're backlogged – and it does not matter if you're working in iterations or flow – it's the backlog now so that they can produce value on a regular basis.
The key role of demos
The way I reconcile that is to have managers come to demos as often as possible. So, part of that means I really want the teams to be able to release, at least internally, at least once a month.
Now, I suspect that many of you in the audience are actually saying: "Once a month? Oh Johanna, I release every week or every day, multiple times." Great, that's terrific. But what I have seen in a number of my clients who are trying to use Agile and Lean Program Management is that they are not yet releasing that often.
So, my thing is: the way you keep the management and potential customers engaged (or anyone who might fall into the category of stakeholder or sponsor) is by showing them demos all the time, showing them how the product is growing. And, this has to be a product based demo, not "I changed that screen over there." Because, "Well, I changed that screen over there." is very interesting to the team, it's not really that interesting to anybody looking at a demo from the management perspective.
Creating a bridge between the managers and the teams
So, what kind of a story can you tell? What kind of a walking skeleton can you create and grow? And that's how you bridge the gap: managers are always gonna care about that quarter by quarter roadmap. They're always gonna care about the big picture because the big picture helps them plan for: "When will we get revenue? When should we add more people?" Managers have a fiduciary responsibility to the entire organization. So, let's help them by keeping them engaged and helping them see what's really actually happening here.
Metrics as used by management is often perceived as becoming a target. Can you prevent that? Is it something that you can avoid? (24:50 to 27:14)
Yes, and no. So, if you insist on measuring story points and reporting on story points they will absolutely be used as a target. There is no question about it. People will say: "Well, that team did 37 story points, surely you can do 45." I mean, that's just a ridiculous thing to say. I was gonna say stupid, but maybe it's not stupid maybe the problem is people don't understand what story points are.
Measure features, not story points
That's why I say measure features, because people tend to understand that features are different sizes and different complexities. And so, even if a team can do on a regular basis three features a week, then at least managers know that they can depend on that team to keep producing roughly three features a week. That's the cycle time. That's where cycle time is really helpful. Managers might not understand cycle time, but they understand intuitively what cycle time buys them.
And, the interesting thing about using targets is that targets are so easy to game. So, if you use story points as a target we know what teams will do: teams will say: "Oh, we'll just double all the points on every story. We can make that target no problem."
What are you measuring?
For this reason, it's really important to say: "What are you measuring? Are you attempting to measure what you want?" If we are measuring features and we want features, we are less likely to game that because people can see we have features. And, if we measure product performance because we want to make sure we have the speed or the performance or the reliability or the security in some way, that's something that we want.
Avoid surrogate measurements
We have to make sure we don't have surrogate measurements. Anything that's not a direct measurement that we want is a surrogate measurement. And so, we really need to say: "How do we avoid surrogate measurements and keep the measurements that we really want?"
So, there's really this aspect of "Measuring what you want to see more of. Measure also what you want to see less of." (27:14 to 27:48)
Well yes, because if you measure defects and the arrival and close rates, you want to increase the close rate and decrease the arrival rate. It's all about what you want to see. And, when you start measuring this stuff over time, you don't just have a point measurement, you have trends. Trends you can do something about. Point measurements are interesting but not all that helpful.
What would be the horizon that you would measure these things for? Does it actually make sense to measure over long periods in Agile where because of the self-organizing nature of teams, things tend to change quite quickly? (27:48 to 29:49)
In my experience, even in programs people don't do enough retrospectives.
And my experience is that if you measure, if you look at trends over the entire program, people can actually say: "Oh, we made a change over here. And then, this is when we started to see the effect of it."
So, I look at defect escape rates often, because I want to know: are defects escaping into our customer base? I don't really want that. I want to make sure we prevent defects. And, if the teams say: “Well, we started to work differently with say either BDD [Behavior Driven Development] or TDD [Test Driven Development] here, you might not even see the effects for a couple of months.”
I do want to take the long term view and make sure that I'm still measuring what I want more of and what I want less of to see if I've lessened it. And then, to say: “Can we trace any of this back to any changes that we made?” So, I really like the long term trends.
Now, can you take trends from one program to another? If the same teams are working on the next release for the product and install another program - it's an ongoing program where we have another release - I think you can and I think then you have to say: “When do re-baseline? Is two years of data at least a year too much? Possibly, so let's think about that." But, I think if you really had two years of data it might show you some very interesting ideas about your process and your product.
This brings us to another aspect of metrics which is their visibility. How important is it for the metrics to be visible and available to everybody on the team? What would be your recommendations? (29:49 to 31:13)
So, I really like metrics on a board where people gather. So, I'm big on paper. I really like building team based measurements. If you do stand-ups, at the stand-up and if not then you do them on a fairly regular basis.
For program measurements, I want to have a program manager put the metrics in the hallway on a board so everybody walking by can see it. If you have a geographically distributed program, post them online.
But there is nothing like walking past a board with a Product Backlog Burnup chart. When you put something like this up and you raise that bar every so often and for every milestone or iteration or whatever it is you want, people feel like they've really accomplished something. So why not put something up there like this, show the value of what people are doing and make it as public as possible.
The Product Backlog Burnup chart seems like the sort of chart that would be also really useful to explain what the program is about to people who are not in the program. And, I'm thinking of those in the attendees who work in setups where the program office or the people controlling the program are not side by side with teams or within the same entity such as in aerospace and defense or government programs. What would you recommend in terms of communicating this outside of the team? (31:13 to 32:41)
I think that that depends on how much transparency you have and you're comfortable having with your customers and with the people who are buying a program. If you are in defense and you're contracting through an organization, I would expect that you would want to show that organization your progress in some form. And, this might be a really good way to show it.
As opposed to the feature chart which is the first one I showed which I’m not sure is so useful, the Product Backlog Burnup I think can be very useful.
I'm not sure how much transparency is right for any given program. I think that this is all about your relationship with your sponsor and your stakeholders. I always advocate for more transparency but you have to know whether or not that fits for you.
So, to summarize it a little bit if there was only five metrics that you could use for a program which ones would be the ones that you'd recommend? (32:41 to 34:28)
So, the first one I recommend is, of course, the combination burnup and burndown chart where you get to see everything I want to see. And, I just count the features. I don't try and say: "Oh, this one is really complex and will take three weeks, and this one is a one day thing." I just count.
I really like the Product Backlog Burnup chart because it shows us where are we making progress in the program and where have we not yet started. So, feature set three, we have not yet even started. Although, we're making pretty good progress on feature sets one and two.
I happen to like, at least internally, to show the organizational lead and cycle time.
I also happen to like cumulative flow, but I did not create a slide for that.
I like to know: "What have we done but not yet released?" Because that's a very important piece. What we've done and not yet released it that goes back to the lead time here. That t4 to t5.
And then, I almost always like performance measurements of some sort, but I have seen that many products need a particular performance or security. We used to call these the "ilities" and so the ilities and requirements are what's really, really important here.
And, if you're worried about others – Alexandra said I could show you this slide – So, I'm happy to take emails from anybody at any time and answer any questions you have about metrics.