All Lemma teams follow a collaborative engineering estimation process, the details of which are configured at the beginning of a project (during what we call “Sprint 0”) and documented so it’s clear and explicit for everyone.
Because we want teams to have flexibility to configure this process for the needs of the project, our focus in this article will be to describe the characteristics we find most important in an effective and efficient estimation process, as opposed to prescribing a one-size-fits-all process.
Engineering estimations can be part of several stages of the SDLC, even from the moment an idea is considered, to form conjectures about feasibility and prioritization. Asking questions like “could we implement X? Would that be hard? How would we do it?” is important and one of the fundamental ways in which Engineering teams contribute to Product strategy.
We must not let dogmatic approaches to engineering estimations or Agile methodologies in general get in the way of having a fluid and prolific dialog between Product and Engineering.
If a team member’s understanding of the required effort or complexity of a task changes during the implementation, they communicate this to the team immediately and document it. However, the original estimation is preserved and not directly overwritten. Afterwards, the original estimation can be compared with the actual implementation in order to improve the team’s estimation accuracy and process.
In practice, engineering estimations need to be done at different phases of the software development life cycle, potentially even during the ideation process. This is ok as long as everyone on the team is aware of the level of confidence with each iteration of the estimations and the team continues to update early estimations as they learn more.
For example, if someone asks “could we allow users to also pay using PayPal?”—the team needs to be able to provide high level estimations to support and facilitate ensuing strategy and ideation conversations.
The team shouldn’t be afraid of these early estimations, but they need to ensure alignment on what the estimate represents. To do this, a simple approach is to rely on “t-shirt sizes” so the analysis is limited to explaining if something is a “Small”, “Medium”, “Large” or “Extra Large” type of task.
Another approach is to rely on relative sizing and compare the work with something else the team has done that is commensurable. This could be a descriptive statement like “it would be more or less like when we implemented payments via Amazon Pay”.
Backlog Grooming Estimations
Team estimates backlog items to facilitate prioritization.
Estimates made during Backlog Grooming (unless explicitly specified otherwise) are considered final and ready for Sprint Planning.
These estimations incorporate knowledge from prior early estimations but the team is free to override any prior estimates with new findings or collaboration during the Backlog Grooming process itself.
These estimations are accompanied by documentation as needed explaining considerations made. For example:
- What are the potential implementation risks or blockers that the team has uncovered?
- What are potential shortcuts or things that could be leveraged to simplify or accelerate implementation?
- What required context or skill should one have in order for this estimation to remain accurate?
The value of estimations is only added if the estimates are used for planning. There is no purpose to estimate items deep down in the backlog if by the time they get planned the situation will have considerably changed.
Tech spikes are a tool for when the thing the team wants to estimate has crucial unknown factors. For example, the team may not know if the thing they want to estimate is even possible, or what complexities may lie in its integration points. When this happens, the team is not able to estimate the user story at that moment, but they can create a new tech spike and estimate that instead. Estimating tech spikes is a process of assigning a time-boxed amount of time to answer questions that will enable the team to go back and estimate the story they can’t estimate now. These questions may be things like:
- “Can we leverage the existing search engine to implement the new filters or would we need to create a new one?”
- “Does the reporting API support this new type of report we’re asked to do?”
- “We don’t have events or web hooks for this thing, what would it take for the team in charge of that to help us out?”
- “This CMS looks like it should cover our needs, but will it be easy to develop using this headless platform? What if we try making a quick prototype to test it out and validate some assumptions?”
The amount of time allowed for the tech spike is based on the perceived difficulty of answering these questions and the perceived value gained from answering them.
When the tech spike begins, it will start as any other user story the team is working on. The team will work on answering those questions, creating any necessary prototypes and the final output will be documentation that is attached to the original user story to explain and justify its engineering estimation.
In cases where these questions cannot be answered in the amount of time assigned, the team reconvenes to triage the situation and evaluate if the value of those questions justify an additional tech spike (and if so how much time should be assigned for that one) or whether they have learned enough to at least have a partial answer that may be good enough for the time being.
Estimation Changes During Sprints
As the team implements a feature during a sprint, their understanding of its required effort and complexity may evolve.
If this happens, team members immediately notify others and share their newly updated understanding and corresponding estimation of that task. This information is also documented in the User Story as well as any ensuing conclusions for how the team is going to adapt and reprioritize their plans.
The original estimation is not overwritten. Instead, the new estimation and explanation of the change / obstacle / shortcut is documented as a comment in the issue. An example of what this comment may look like would be:
“@development-team, @product-team: We cannot use the Pusher library for push notifications as we’ve learned of a security policy from DevSecOps that prevents us from sending event-activity data through a third party service. This means we need to evaluate alternative services that provide self-hosting capabilities or we will need to implement our own web-sockets server. This effectively increases the original estimation of this feature from a 3 to an 8. The team discussed this on Slack and decided to pause this work for now and plan it for a posterior sprint.”
Backlog grooming and estimating User Stories and tasks can take up to several hours and sometimes require multiple steps to complete. Therefore, estimating is planned for within the sprint, ensuring that there’s enough time and opportunities to do it.
The entire engineering squad is involved in this process. Person A does not estimate something that Person B has to implement later. Estimations can be wildly inaccurate when not considering the context, skills set and experience of the person that will carry out the task.
Moreover, even if one estimates their own assigned work, there is still a need for that estimate to go through the benefit of peer review and collaboration, which should happen naturally and frequently throughout the process and ensures that the team mitigates blindspots and unconscious biases.
Lemma teams usually use a slightly different variant of standard Planning Poker. The entire team (including the PM and stakeholders) gets together over a Google Meet and reviews together a set of issues.
Before actually making any estimations, the team brainstorms about each issue and analyzes possible approaches for implementing it, potential obstacles, problems or dependencies.
If the user story or issue is too big, at this point the team will try to break it down into separate issues, possibly grouped together under an epic or a milestone.
If these smaller pieces are still too undefined to be properly analyzed and estimated, the team will then create a research or spike issue and estimate that instead. These special research issues will be time boxed. Their outcome will be to unlock the estimation process for the original issue, or create additional research issues if necessary.
After reviewing an issue, the team will then discuss what the estimation should be for it. Each team member votes in private and without telling others what they think is appropriate, and the team then reviews the votes.
If everyone agrees on an estimation, then that’s the value used. If there are different votes, each team member explains their reasoning and the team discusses the feedback.
Margin of Error
Estimations are designed to have a built-in mechanism for accounting for a margin of error. An estimation is not measured in exact time, e.g.: “this will take 3.5 hours”. It’s extremely unlikely that a task would take exactly that amount of time. The team should have a system with built-in flexibility (e.g.: measuring in ranges of time or “story points”) so their estimates have a higher chance of being accurate.
Example: Story Points
Story Points measure the required level of effort, complexity and uncertainty associated with implementing an issue.
When tasks are implemented with a unit of days or even hours, a certain level of precision is imposed in the estimations. Why estimate in hours and not minutes? Or why stop there and not estimate in seconds? No matter the unit, if exact values are used, like “1.5 hours”, there is no error margin specified, and if the team then spends 1.5001 hours working on said issue, the estimate was already wrong.
Estimation points don’t suffer from this problem, as in terms of time, they represent ranges, rather than specific values.
Estimating an issue with 2 points, usually means that it’s something that can take between half-a-day to a day and a half. In this case, if a task estimated with 2 points takes 5 hours, 6 hours or 6.5 hours, in all cases the estimation was still correct.
It follows from Information Theory that the accuracy of estimations grows logarithmically with gaining knowledge on an issue. Conversely, as estimations increase in size, the uncertainty about the issue grows exponentially.
This is why we use an exponential scale for our estimations.
Lemma usually chooses the Fibonacci progression instead of something else (like 2^n) because Fibonacci starts with prime numbers, which is particularly appropriate for reinforcing the concept that User Stories should be atomic, indivisible.
The estimation process should be explicit about the scope of the estimations. Is the team estimating only the implementation phase or also any required research, planning, design, QA, documentation, release activities, etc? A way to achieve this is by ensuring that User Stories have a clear “Definition of Done” prior to being estimated.
The estimation process should typically have a separate approach or process for addressing support escalations, managing technical debt and fixing bugs. These tasks cannot be individually estimated as bugs are, by definition, unknown and unexpected problems in the system and in many cases it would be impossible to estimate them until actually working on fixing them.
Similarly, it can be equally difficult to estimate refactoring work of complex systems when paying technical debt. A simple solution to this is to define a time-boxed activity for bug-fixing or tech-debt when possible. This means that the team can’t predict how many bugs they can fix on a given Sprint, but they in turn gain predictability over how much time at most will be spent and ensure the rest of the team’s work won’t be put at risk.