Some Thoughts on Software Estimation

"All programmers are optimists." Fred Brooks's adage is as true today as it was when he first penned the words. The truth is that people are naturally very poor estimators. We all have a tendency to believe everything will go more smoothly than it probably will. Things will work out just as we planned, assuming we bothered to plan at all.

I encounter the reality of this often in my work. The first deadline passes, and we assure ourselves that we can catch up to meet the next one. The next deadline passes, and we finally come to terms with the reality that we're not going to make it. Then, we push the schedule back a week, when if we really reflected on the situation, we would realize we were probably a couple of months out from actually completing the project.

The product gets later and later one day at a time. Lots of yelling and finger-pointing ensues. Management demands better estimates, but rejects any that don't fit with their view of when we "should" be done, largely because there is pressure coming down the chain of command all the way from the top, far removed from the realities of technical issues that are hampering release. Thus, the schedule never really reflects the state of the software. When it comes time to plan the next release, we say we'll do better. What we often really mean is that we'll do better at racing to meet the schedule we've prescribed for ourselves rather than do better at estimating the amount of time it's actually going to take to build the thing.

It's too easy to lose sight of the fact that schedule estimation should be descriptive rather than prescriptive. We can talk endlessly about how long it theoretically or ideally should take us to finish and what the marketing department thinks would be the ideal time to launch, but at the end of the day, we have the requirements and the team and the legacy code that we have, and these circumstances largely determine the amount of time required to build the software. If we recognize and acknowledge these facts, we can do a better job of scheduling.

For many of us though, we operate on a smaller scale, planning and developing one small piece of the software at a time, but even in this context, we can make a contribution to help solve the issue. The next time you prepare to give an estimate of the number of hours or days a given task is going to take you, ask yourself how long it will take to not only write the code for the functionality your are tasked with delivering, but:

How much time will it take to:

write tests to validate the code?
debug and fix errors you make in writing the code?
debug and fix regressions you might introduce in other areas of the code as a result of your changes?
overcome tricky issues you didn't anticipate?
run tests and verify they all pass?
resolve issues identified during code review?

These are easy items to overlook when you are coming up with an estimate. Not all of these items will consume your time for every task, but almost certainly they will all come up from time to time. If you don't take them into account when devising an estimate, you will exceed your estimate more often than not.

There are several techniques for estimating. A few that have been used in software are WAGs (wild-ass guesses), Boehm's Constructive Cost Model (COCOMO) and the program evaluation and review technique (PERT). I won't go into details, but it is not hard to find more information about them. I prefer PERT, primarily because it results in a range estimate.

That brings me to the next aspect of estimation. Beyond taking all aspects of a task into account in order to provide an estimate, there are a couple other characteristics of an estimate to consider. Should it be a single value or a range? I would generally advocate for a range because it better reflects the uncertainty of the estimate. A single-valued estimate is highly likely to be wrong. In some sense, people will often translate it into a range anyway. However, the way others translate your single-valued estimate into a range may not be the way you would prefer it were perceived, so it often makes sense just to state the range explicitly.

Sometimes the consumers of your estimate want the greater sense of certainty that they believe comes with a single number. In these cases, a good question to ask oneself is how biased should the estimate be? If someone gave you an estimate to deliver something to you that you wanted, and that estimate turned out to be incorrect, would you rather the item was delivered in less time or more time than was estimated? In less time of course; that is, most of us would prefer the estimate was longer than the actual time required to deliver. This implies people favor receiving pessimistic estimates even though they have a propensity to give optimistic ones.

Now, clearly there has to be a limit on how pessimistic an estimate can be. I don't think "the end of time" is likely to be received well as an estimated time of completion. I like to go back to the Pragmatic Programmers' advice of gently exceeding expectations when attempting to determine how pessimistically to estimate.

There are two competing factors. One is providing an estimate that you will be able to meet or exceed, i.e. deliver at or prior to the estimate, most of the time, and thus, exceed expectations. Contrary to that, we want to avoid providing one that is either perceived as or proves to be in actuality a gross overestimate, which will reduce your credibility. I often settle on aiming for the 95th percentile. This means I am 95% confident I will meet or exceed my estimate. People often look at me a little cross-eyed when I give a 95%-confident estimate, but they tend to enjoy the fact later that the time to execute the task rarely exceeds the estimate given. Going beyond 95%, however, tends to require egregious overestimation and has vastly diminishing returns.

None of the above is intended to be too prescriptive. Estimation is a topic that the software industry has struggled with since its inception and will probably continue to struggle with for many years to come. The main point here is to attempt to consider all of the relevant factors in estimating the time required to complete a software task and deliver estimates that can be met or exceeded more often than not. I hope these ideas assist you in your estimation. Best of luck meeting your next deadline.

When All Else Fails, Use the GUI

In this blog post, I show you how I automated entering credit card transactions in QuickBooks using the PyAutoGUI package for Python. Automating routine tasks is a great way to save yourself a lot of time. Whenever I spend a significant amount of time doing a partiuclar task on a computer, I ask myself if there is a way to automate it. Generally, as long as the input to the task is located on the computer, the answer is yes. The kinds of tasks that evade automation are those that have some kind of physical medium as an imput. While there are technologies that could help automate those tasks, it is much more complicated and difficult to do so. For tasks that can be automated, my favorite language for doing so is Python. Python has a wonderfully readable, expressive syntax, and there are an astoundingly large number of libraries and packages available that make the job of automating jobs much easier. One of my favorite simple examples of how Python saved me a lot of time is when I...

Late Blooming Code

Search This Blog

Some Thoughts on Software Estimation

Labels

Comments

Post a Comment

Popular posts from this blog

Books That Have Influenced Me and Why

Don't Build it Yourself

When All Else Fails, Use the GUI