Traditional estimates attempt to answer the question, "how long will it take to develop X?" I could ask you a similar question, "How long does it take to get the nearest train station? The answer, measured in time, depends on two things, the distance and the speed. Depending on whether I plan to go by car, by foot, by bicycle [...], the answer can vary dramatically. So it is with software development. The productivity of a developer can vary dramatically, both as a function of innate ability and whether the task at hand plays to his strong points, so the time to produce a piece of software can vary dramatically. But the complexity of the problem doesn't depend on the person solving it, just as the distance to the train station does not depend on how I get there.However, I think this ignores a fundamental aspect of human nature (at least of modern day humans): our need for instant gratification. We don't really care how far away something is (except for concerns about fuel). When we ask someone how far away they live from some particular point, they usually give an answer in hours and minutes, not miles. They view it as a question of how long they have to wait before they get what they want. Similarly, product people and especially the customer don't particularly care how complex a problem is to solve. They just want it solved, and want to know how long they have to wait before they get what they want.
It seems to me that eventually someone will have to make the translation to hours (even if only done implicitly). Users care about release dates -- "when are the features I want going to be available?". Even if you schedule sprints with story points, someone is going to have to figure out how long those will take to complete in order to set a reasonable release date. When you make this conversion, you will have to account for variances in velocity (the speed at which people work). Even if you do decide to use story points, this correlation will be necessary at first to establish how many story points are possible in a sprint.
Still, there are some advantages to this approach that come to mind. When you estimate story points, you don't have to estimate both complexity and your ability to solve the complexity. You only have to worry about the complexity. The hours for big things are often pulled out of the air and aren't very accurate until they are broken down into tasks. Story points provide a good way of looking at the big picture without fleshing out all the details. One additional cool thing about story points is that they have built in variances, as Scrum Breakfast points out
The thing to realize about about estimates is that they are very imprecise. +/- 50%. One technique for dealing with a cost ceiling is to define an estimate such that the actual effort needed will be <= the estimate in 90% of the cases, regardless of whether they are measured in days, hours or point. So Story Points are usually estimated on the Cohn Scale (named for Mike Cohn, who popularized the concept): 0, 1, 2, 3, 5, 8, 13, 20, 40, 100. Why is there no 4? Well a 3 is a 3 +/- 50%, so a three actually extends from 2 to 5, a 5 from 3 to 8, etc. The difference between 3 and 4 is nowhere near as significant and between 1 and 2, so we don't give estimates that give us a false sense of precision. Summed together, many imprecise estimates give a total that is a remarkably accurate estimate of the work to be performed (the errors tend to cancel each other out, rather than accumulate).Mule also had an interesting approach to creating a product backlog, where they used a sort of bucket sorting rather than choosing actual numbers for story points. This is what Chris Sterling calls affinity estimating.
Mike Cohn says that sprint backlogs and product backlogs should have different units to prevent confusion, since if you use hours for both it doesn't show that hours on a sprint backlog have been thought about a lot more than hours on the product backlog. Speaking of Mike Cohn, he has an interesting notion for the role of story points. He suggests they are a good long-term indicator, but the short term should focus on the product backlog and prioritize stories, then break them into tasks and estimate those using hours. Some have suggested using task points in a similar way as story points, but for individual tasks. This might provide some room for variance if you really suck at estimating. Additionally, you will only have to update the estimates when complexity changes not when velocity changes. This also might be a more lean approach since it doesn't waste time on an artifact that is not needed. However there are some downsides to this approach. One is that there is no easy way to track the progress of the task (e.g. in Jira), unless a conversion to hours is first made. It also presents a challenge to HR, which may wish to account for hours for financial reporting purposes.
I think it makes a bit of sense to use hours to estimate at the task level, since you should be able to give more detail at that point (as opposed to the bigger items, like stories where any hours ascribed would basically be pulled out of thin air). This is what I'm interested in. How can I make my task estimates more accurate to make sure I can deliver what I think I can deliver? To me, it seems the issue with both of these methods of task estimation is that they do not address the real problem with why tasks get off schedule in the first place. Actually, there are two, but both deal with an unknown. The first is if there are other tasks competing for the same resource (e.g. your time) that weren't initially accounted for: either they came up after the estimation or were a result of oversight. The second is because you have misjudged the complexity of a task. You look at a problem, it seems pretty simple, you ascribe a few hours/points/whatever to it, but once you start digging into the problem you realize its going to take longer than you thought. Then you are left with the question of whether or not to re-estimate. I think while the original estimate should be retained for posterity's sake, its useful for the product people to have a new estimate for planning purposes...perhaps pushing back to another sprint.
The only certainty is the certainty of uncertainty, but that's probably why you're doing agile in the first place. This is especially true when new technology is involved. While there are some who advocate task points for breaking down tasks (pieces of a story), most advocate to use hours at this level. And this is what I do not understand. Why are we going to all the bother to make all these estimates about minutia? Get a commitment from developers, keep a backlog, and get to work! I can't believe some agile teams are willing to spend several hours (some say 8 hours) on sprint planning. That doesn't sound agile to me at all. A practice that seems to work pretty well for some of my colleagues here at OCLC is to use units that are a bit fuzzier than an hour. They tend to estimate in 1/2 days, which is a bit less precise, but allows for greater accuracy because it is easier to ballpark. But unless you are continuously updating estimates (as the Scrum Primer suggests) so that volunteers can move around on tasks (perhaps in a paired situation), I don't see much point in having exact hours.
Honestly, I'd rather do away with task estimates altogether. Maybe its because I don't like submitting something that I know I can't do a good job on (at least not yet). But it also seems to me that you are spending time creating an artifact that doesn't necessarily help you finish the sprint. There are some out there who have suggested this. Including Jeff Sutherland here. Jurgen De Smet has an interesting blog post on it here. I feel that as long as I'm reasonably certain I can get tasks X, Y, Z done in sprint N, I don't' really need to make up numbers for how long they'll take. Maybe I'd feel differently if I were on multiple teams and close to being overworked. But I feel developing this sense is developing a skill that isn't really useful for anything. I can't even get good at estimating to apply it to other areas to improve productivity, this 'gut' sense is a rather domain specific intuition. What's the point? Am I completely missing something?
P.S. This didn't really fit with anything else, but I wanted to pass it along. It is about building a common definition of done: