The ultimate software development tool

Best practices on project management, issue tracking and support

Month: July 2019

Software Development Metrics – Interview with David Nicolete.

 

We’ve interviewed Dave Nicolette, a consultant specializing in improving software development and delivery methods and author of ‘Software Development Metrics’. We dive into what factors to consider when selecting metrics, examples of useful metrics for Waterfall and Agile development teams, as well as common mistakes in applying metrics. He writes about software development and delivery on his blog.

Content and Timings

  • Introduction (0:00)
  • About David (0:21)
  • Factors When Selecting Metrics (3:22)
  • Metrics for Agile Teams (6:37)
  • Metrics for Hybrid Waterfall-Agile Teams (7:37)
  • Optimizing Kanban Processes (8:43)
  • Rolling-up Metrics for Development Management (10:15)
  • Common Mistakes with Metrics (11:47)
  • Recommended Resources (14:30)

Transcript

Introduction

Derrick:
David is a consultant specializing in improving software development and delivery methods. With more than 30 years experience working in IT, his career has spanned both technical and management roles. He regularly speaks at conferences, and is the author of ‘Software Development Metrics’. David, thank you so much for taking your time out of your day to join us. Why don’t you say a bit about yourself?

About David

David:
I’ve been involved with software for a while and I still enjoy it, though I’ve been working as a team coach and organizational coach, technical coach, for a few years. I enjoy that.

Derrick:
Picking up on the book ‘Software Development Metrics’, what made you want to write the book?

David:
It’s an interesting question, because I don’t actually find metrics to be an interesting topic, but I think it is a necessary thing. It’s part of the necessary overhead for delivering. If we can detect emerging delivery risks early, then we can deal with them. If we don’t discover them until late, then we’re just blindsided and projects can fail. It’s important to measure the right things and be sure we’re on track.

Secondly, I think it’s important to measure improvement efforts, because otherwise we know we’re changing things and we know whether they feel good, but we don’t know if they’re real improvements if we can’t quantify that. I noticed several years ago that a lot of managers and team leads and people like that didn’t really know what to measure. I started to take an interest in that, and I started to give some presentations about it, and I was very surprised at the response because quite often it would be standing room only and people wouldn’t want to leave at the end of the session. They had more and more questions. It was as if people really had a thirst for figuring out what to measure and how. I looked at some of the books that were out there and websites that were out there, and they tended to be either theoretical or optimistic.

Derrick:
Metrics for measuring and monitoring software development have been around for decades, but a lot of people still don’t use them effectively. Why do you think that is?

David:
I often see a pattern that when people adopted new process or method, unfamiliar one, they try to use the metrics that are recommended with that process. There are a couple of issues that I see. One is that they may only be using the process in name only, or they’re trying to use it but they’re not used to it yet, and the metrics don’t quite work because they’re not quite doing the process right.

The other issue is that people tend to use the measurements that they’re accustomed to. They’ve always measured in a certain way, now they’re adopting a new process. They keep measuring the same things as before, but now they’re doing the work in a different way. There’s a mismatch between the way the work flows and the way it’s being measured. They have numbers and they rely on the numbers, but the numbers are not telling truth because they don’t line up with the way the work actually flows.

Factors When Selecting Software Development Metrics

Derrick:
Picking the right metrics is paramount. What are some of the factors that we should consider when selecting metrics?

David:
Look at the way work actually flows in your organization and measure that. I came up with a model for that in the course of developing this material which I would look at three factors to try to judge which metrics are appropriate. The first factor is the approach to delivery. The basic idea there is if you try to identify all the risks in advance, all the costs, you identify all the tasks, lay out a master plan, and you follow that plan. That’s what I’m calling traditional.

What I call adaptive is a little different. You define a business capability, you’ve got your customer needs, and you set a direction for moving toward that, and you steer the work according to feedback from your customer, from your stakeholders. You don’t start with a comprehensive plan, you start with a direction and an idea of how to proceed. Then you solicit feedback frequently so you can make course corrections. That’s the first factor I would look at: traditional versus adaptive, and that I think has the biggest impact on which metrics will work.

The second factor to look at is the process model. I don’t have to tell you that there are a million different processes and nobody does anything in a pure way, but if you boil it down I think there’s basically four reference models we can consider, or process models. One is linear, you can imagine what that is. The canonical steps that go through from requirements through support. The next one would be iterative, in which you revisit the requirements multiple times and do something with them. The third one that I identify is time-boxed. It’s really popular nowadays with processes like Scrum and so on. The fourth one is continuous flow. This is becoming popular now with the Kanban method, and it’s also being adapted into Scrum teams quite a lot. We’re really interested in keeping the work moving smoothly.

Now a real process is going to be a hybrid of these, but it’s been my observation that any real process will lean more toward one of those models than the others, and that’ll give us some hints about what kind of metrics will fit that situation. The third thing probably has the least impact on is whether you’re doing discrete projects or continuous delivery. What some people call a continuous beta, or some people just don’t have projects. You have teams organized around product lines or value streams, and they continually support them, call it a stream. Between those two there are some differences in what you can measure. Well I look at those three factors, and based on that you can come up with a pretty good starter set of things to measure, and then you can adapt as you go from there.

Metrics for Agile Teams

Derrick:
Let’s take a couple of example scenarios. If we have an agile team working in short sprints who are struggling to ship a new product, what kind of metrics should they consider to identify areas for improvement?

David:
If they’re using Scrum basically correctly, they could probably depend on the canonical metrics that go with that, like velocity and your burn chart. You might look for hangover, incomplete work at the end of the sprint. You might look for a lot of variation in story size, when you finish a story in one day and the next story takes eight days. When it comes to metrics as such they could use, as I said, velocity and so on, and you can always use lean-based metrics because they’re not really dependent on the process model. What they might consider is looking at cycle times. They could look at the mean cycle times as well as the variation in cycle times and get some hints about where to go for root—cause analysis. Metrics don’t tell you what’s wrong, but they can raise a flag.

Metrics for Hybrid Waterfall-Agile Teams

Derrick:
What about a hybrid waterfall agile team working on a long term project, wanting to know what it’s possible to deliver by a certain date?

David:
To know what’s possible to deliver you can use the usual things like a burn chart, burn up or burn down as you prefer, to see according to their demonstrated delivery, their velocity, I’ll call it that, how much scope they can deliver by a given date. Conversely you could see by what date approximately they could deliver a given amount of scope. It depends on what’s flexible. In this kind of a project, usually neither is flexible, but at least you can get an early warning of delivery risk. If it looks like the trend line is way out of bounds with the plan, well now you’ve got a problem.

One thing that might surprise some people is the idea that agile methods can be used with traditional development. We need to decouple the word “agile” from “adaptive,” because quite often it is used in a traditional context.

Optimizing Kanban Processes

Derrick:
What are some metrics relevant to those working in a bug queue? Say they’re wanting to optimize their working practices to stay on top of incoming bugs.

David:
For that I usually like to use little metrics, mainly cycle time, because you want to be somewhat predictable in your service time so when a bug report comes in people have an idea of when they can expect to see it fixed. How do you do that? Well, you can use empirical information from past performance with fixing bugs, and your mean cycle time will give you approximately how long it takes to fix one.

I like to use the Kanban method for these kind of teams because it defines classes of service. You’ll find that every type of bug doesn’t take the same amount of time to fix. Based on your history, pick out different categories. You can identify the characteristics of particular kinds of bug reports that tend to fall together, and you can track cycle times differently for each of those classes of service. If someone calls in and says, “Well we got this problem.” “Well that looks like that’s in category two. Whatever that means, that typically takes us between four hours and eight hours.” That can give them a little bit of warm fuzzy feeling about when they’re going to see it fixed. I think that using empirical data and tracking cycle time, is the simplest, most practical way toward that workflow.

Rolling-up Metrics for Development Management

Derrick:
What about a CTO who wants to monitor how teams are performing, and ensure code is of high quality? How can metrics be rolled-up for those in more senior positions?

David:
How the teams are performing, you need measurements that are comparable across teams and comparable across projects. The lean-based metrics needed to compare across teams and projects and across development and support, those kinds of things. If you’re tracking throughput, cycle time, there’s another one I haven’t mentioned that I wanted to: process cycle efficiency. If you track those, those roll up nice. Some other metrics don’t roll up so well. Some of the agile metrics, particularly velocity is really different for each team. Percentage of scope complete, that may not roll up very well either.

The other question about code quality, I think that if we can let that be a team responsibility then they can use metrics, but they don’t need to report outward from the team. Usually things that they can get out of static code analysis tools will help them spot potential quality issues, but I wouldn’t share very detailed things like static code analysis kind of stuff and code coverage outside the team, because then team members will feel like they’re going to get judged on that and they’ll start gaming the numbers thinking they’re going to be needed those. Those kind of metrics are really for the team’s own use.

Common Mistakes with Software Development Metrics

Derrick:
What are some mistakes you often see people make when applying software development metrics?

David:
People seem to make a couple of mistakes over and over. The one I think I mentioned earlier, people apply metrics that won’t fit the context. Maybe a company wants to ‘go agile’, and so they start tracking agile metrics, so whatever is recommended. The metrics that are recommended are safe or something like that, but they haven’t really fully adopted these new methods. They’re still in the transition, and so the numbers don’t mean what they’re supposed to mean. For instance, they may track velocity, but they might not have feature teams. They might have component teams, and the work items that those teams complete are not vertical slices of functionality. Whatever they’re tracking as velocity isn’t really velocity. You start getting surprised by delivery issues.

You can also have the opposite situation where teams are working in an agile way, but management is still clinging to traditional metrics. They will demand that the teams report percentage of still complete to date, but you’re doing adaptive development so you don’t have 100 percentage scope defined. You have a direction. You may be 50% complete this week based on what you know of the scope. Next week you might be 40% complete because you’ve learned something, and then the management says “Well what’s going on? You’re going backwards,” and they don’t understand what the numbers mean. What I see happen in that case is it drives the teams away from adaptive development, and causes them to try to get more requirements defined upfront.

The second mistake I see a lot is that people either overlook or underestimate the effects of measurement on behavior. We can be measuring something for some objective reason, but it causes people to behave differently because they’re afraid they’re going to get their performance review will be bad because they didn’t meet their numbers. I think we have to be very conscious of that. We don’t want to drive undesired behaviors because we’re measuring things a certain way. That not only breaks morale, but it also makes the measurements kind of useless too when they’re not real.

Those are really the two main mistakes: that they don’t match up the metrics with their actual process, or they neglect the behavioral effect of the metric.

Recommended Resources

Derrick:
What are some resources you can recommend for those interested in learning more about managing projects and improving processes?

David:
I like this, an older book called ‘Software By Numbers’. That’s a classic, but one of David Anderson’s earlier books is called ‘Agile Management From Software Engineering’. That has a lot of really good information to apply economic thinking to different kinds of process models. He covers things like feature driven development, extreme programming. Another guy whose work I like is Don Reinertsen. He combines really deep expertise in statistics with deep expertise in economics and applies that to software, and can demonstrate mathematically why we don’t want to over-allocate teams. How it slows down the work if you load everybody up to 100% actually slows things down. What’s counter intuitive to a lot of managers is if you load your teams to 70% capacity they’ll actually deliver better throughput, but it’s very hard for a lot of managers to see somebody not busy. It’s really hard for them to get there.

Derrick:
Really appreciate your time today Dave, some great stuff here. Thank you.

David:
Well I enjoyed the conversation, thanks.

10X Programmer and other Myths in Software Engineering


We’ve interviewed Laurent Bossavit, a consultant and Director at Institut Agile in Paris, France. We discuss his book ’The Leprechauns of Software Engineering’, which debunks myths common in Software Engineering. He explains how folklore turns into fact and what to do about it. More specifically we hear about findings of his research into the primary sources of theories like the 10X Programmer, the Exponential Defect Cost Curve and the Software Crisis.

Content and Timings

  • Introduction (0:00)
  • About Laurent (0:22)
  • The 10X Programmer (1:52)
  • Exponential Defect Cost Curve (5:57)
  • Software Crisis (8:15)
  • Reaction to His Findings (11:05)
  • Why Myths Matter (14:44)

Transcript

Introduction

Derrick:
Laurent Bossavit is a consultant and Director at Institut Agile in Paris, France. An active member of the agile community he co-authored the first French book on Extreme Programming. He is also the author of “The Leprechauns of Software Engineering”. Laurent, thank you so much for joining us today, can you share a little bit about yourself.

About Laurent

Laurent:
I am a freelance consultant working in Paris, I have this great privilege. My background is as a developer. I try to learn a little from anything that I do, so after a bit over 20 years of that I’ve amassed a fair amount of insight, I think, I hope.

Derrick:
Your book, “The Leprechauns of Software Engineering”, questions many claims that are entrenched as facts and widely accepted in the software engineering profession, what made you want to write this book?

Laurent:
I didn’t wake up one morning and think to myself, ‘I’m going to write a debunkers book on software engineering’ but it actually was the other way around. I was looking for empirical evidence anything that could serve as proof for agile practices. And while I looked at this I was also looking at evidence for other things which are in some cases, were, related to agile practices for instance the economics of defects and just stuff that I was curious about like the 10X programmers thing. So, basically, because I was really immersed in the literature and I’ve always been kind of curious about things in general, I went looking for old articles, for primary sources, and basically all of a sudden I found myself writing a book.

The 10X Programmer

Derrick:
So, let’s dig into a few of the the examples of engineering folklore that you’ve examined, and tell us what you found. The first one you’ve already mentioned is the 10X programmer. So this is the notion that there is a 10 fold difference between productivity and quality of work between different programmers with the same amount of experience. Is this fact or fiction?

Laurent:
It’s actually one that I would love to be true if I could somehow become or if I should find myself as a 10X programmer. Maybe I would have an argument for selling myself for ten times the price of cheaper programmers. When I looked into it, what was advanced as evidence for those claims, what I found was not really what I had expected, what you think would be the case for something people say, and what you think is supported by tens of scientific studies and research into software engineering. In fact what I found when I actually investigated, all the citations that people give in support for that claim, was that in many cases the research was done on very small groups and not extremely representative, the research was old so this whole set of evidence was done in the seventies, on programs like Fortran or COBOL and in some cases on non-interactive programming, so systems where the program was input, you get results of the compiling the next day. The original study, the one cited as the first was actually one of those, it was designed initially not to investigate productivity differences but to investigate the difference between online and offline programming conditions.

So how much of that is still relevant today is debatable. How much we understand about the concept of productivity itself is also debatable. And also many of the papers and books that were pointed to were not properly scientific papers. They were opinion pieces or books like Peopleware, which I have a lot of respect for but it’s not exactly academic. The other thing was that some of these papers did not actually bring any original evidence in support of the notion that some programmers are 10X better than others, they were actually saying, “it is well known and supported by ‘this and that’ paper” and when I looked at that the original paper they were referencing, they were in turn saying rather than referencing their own evidence, saying things like “everybody knows since the seventies that some programmers are ten times more than others” and very often after chasing after all the references of old papers, you ended up back at the original paper. So a lot of the evidence was also double counted. So my conclusion was, and this was the original leprechaun, my conclusion was that the claim was not actually supported. I’m not actually coming out and saying that its false, because what would that mean? Some people have taken me to task for saying that all programmers are the same, and that’s obviously stupid, so I can not have been saying that. What I’ve been saying is that the data is not actually there, so we do not have any strong proof of the actual claim.

Exponential Defect Cost Curve

Derrick:
There is another folklore item called the “exponential defect cost curve”. This is the claim that it can cost one dollar to fix a bug during the requirements stage, then it will take ten times as much to fix in code, one hundred times in testing, one thousand times in production. Right or wrong?

Laurent:
That one is even more clear cut. When you look at the data and you try to find what exactly was measured, because those are actual dollars and cents, right? So it should be the case, at some point a ledger or some kind of accounting document originates the claim. So I went looking for the books that people pointed me to and typically found that rather than saying we did the measurements from this or that project, books said or the articles said, ‘this is something everybody knows’, and references were ‘this or that article or book’. So I kept digging, basically always following the pointers back to what I thought was the primary source. And in many cases I was really astonished to find that at some point along the chain basically someone just made evidence up. I could not find any solid proof that someone had measured something and came up with those fantastic costs, sometimes come across like, fourteen hundred and three dollars on average per bug, but what does that even mean? Is that nineteen-nineties dollars? These claims have been repeated exactly using the same numbers for I think at least three decades now. You can find some empirical data in Barry Boehm’s books and he’s often cited as the originator of the claim. But it’s much less convincing when you look at the original data than when you look at the derived citations.

The Software Crisis

Derrick:
There is a third folklore, called “The Software Crisis”. Common in mainstream media reporting of large IT projects. These are studies that highlight high failure rates in software projects, suggesting that all such projects are doomed to fail. Are they wrong?

Laurent:
This is a softer claim right, so there’s no hard figures, although some people try. So, one of the ways one sees software crises exemplified is by someone claiming that software bugs cost the U.S. economy so many billions, hundreds of billions of dollars per year. A more subjective aspect of the notion of the software crisis, historically what’s interesting, is the very notion of the software crisis was introduced to justify the creation of a group for researching software engineering. So the initial act was the convening of the conference on software engineering, that’s when the term was actually coined and that was back in 1968, and one of the tropes if you will, to justify the interest in the discipline was the existence of the software crisis. But today we’ve been basically living with this for over forty years and things are not going so bad, right? When you show people a dancing bear one wonders not if the bear dances well. That it dances at all. And to me technology is like that. It makes amazing things possible, it doesn’t always do them very well but its amazing that it does them at all. So anyway I think the crisis is very much over exploited, very overblown, but where I really start getting into my own, getting on firmer ground is when people try to attach numbers to that. And typically those are things like a study that supposedly found that bugs were costing the U.S. sixty billion dollars per year and when you actually take a scalpel to the study, when you read it very closely and try to understand what methodology they were following and exactly how they went about their calculations, what you found out is that they basically picked up the phone and interviewed over the phone a very small sample of developers and asked them for their opinion, which is not credible at all.

Reaction to His Findings

Derrick:
What is a typical reaction to your findings debunking these long held claims?

Laurent:
Well, somewhat cynically it varies between, “Why does that matter?” and a kind of violent denial. Oddly enough I haven’t quite figured out why, what makes people so into one view point or the other and there’s a small but substantial faction of people who tell me ‘oh that’s an eye opener’ and would like to know more, but some people respond with protectiveness when they see for instance the 10X claim being attacked. I’m not quite sure I understand that.

Derrick:
So how do myths like these come about?

Laurent:
In some cases you can actually witness the birth of a leprechaun. Its kind of exciting. So, some of them come about from misunderstandings. I found out in one case for instance that an industry speaker gave a talk at a conference and apparently he was misunderstood. So people repeated what they thought they had heard and one thing led to another. After some iterations of this telephone game, a few people, including some people that I know personally, were claiming in the nineties that the waterfall methodology was causing 75% failure rates in defence projects, and it was all a misunderstanding when I went back and looked at the original sources the speaker was actually referring to a paper from the seventies which was about a sample of nine projects. So not an industry wide study. So I think that was an honest mistake, it just snow-balled. In some cases people are just making things up, so that’s one way to convince people, just make something up. And one problem is that it takes a lot more energy to debunk a claim than it takes to just make things up. So if enough people play that little game some of that stuff is going to just sneak past. I think the software profession kind of amplify the problem by offering fertile ground, we tend to be very fashion driven, so we enthusiastically jump onto bandwagons. That makes it easy for some people to invite others to jump, to propagate. So I think we should be more critical, there has been a movement towards evidence-based software engineering which I think is in some ways misguided, but is good news to my way of thinking that anyone is able to think, maybe we shouldn’t be so gullible.

Why Myths Matter

Derrick:
Even if the claims are in fact wrong, why does it matter?

Laurent:
To me, one of the key activities in what we do, its not typing code, its not design, its not even listening to customers, although, that comes close. The key activity that we perform is learning. We are very much a learning-centered profession so to speak. Because the very act of programming when looked at from a certain perspective is just about capturing knowledge of the world about businesses, about… other stuff that exists out there in the real world or is written about that is already virtual, but in any way capturing knowledge and encoding that knowledge, capturing it in code, in executable forms, so learning is one of the primary things that we do anyway already. I don’t think we can be good at that if we are bad at learning in general in a more usual sense. But what happens to those claims which are easy to remember, easy to trot out, they act as curiosity stoppers, basically. So they prevent us from learning further and trying to get at the reality, at what actually goes on in the software development project, what determines whether a software project is a success or failure, and I think that we should actually find answers to these questions. So it is possible to know more than we do right now, I am excited every time that I learn something that makes more sense for me that helps me have a better grip on developing software.

Derrick:
Are there any tell-tale signs that we can look out for to help stop ourselves from accepting such myths?

Laurent:
Numbers, statistics, citations, strangely. I know that citations are a staple inside of academic and scientific writing but when you find someone inside the software engineering profession that is whipping out citations at the drop of a hat, you should take that as a warning sign. And there is more but you know we would have to devote an hour or so to that.

Derrick:
Thank you so much for joining us today, it has been great.

Laurent:
Thanks for having me, bye