How do you estimate a ROI for clearing technical debt? - technical-debt

I'm currently working with a fairly old product that's been saddled with a lot of technical debt from poor programmers and poor development practices in the past. We are starting to get better and the creation of technical debt has slowed considerably.
I've identified the areas of the application that are in bad shape and I can estimate the cost of fixing those areas, but I'm having a hard time estimating the return on investment (ROI).
The code will be easier to maintain and will be easier to extend in the future but how can I go about putting a dollar figure on these?
A good place to start looks like going back into our bug tracking system and estimating costs based on bugs and features relating to these "bad" areas. But that seems time consuming and may not be the best predictor of value.
Has anyone performed such an analysis in the past and have any advice for me?

Managers care about making $ through growth (first and foremost e.g. new features which attract new customers) and (second) through optimizing the process lifecycle.
Looking at your problem, your proposal falls in the second category: this will undoubtedly fall behind goal #1 (and thus get prioritized down even if this could save money... because saving money implies spending money (most of time at least ;-)).
Now, putting a $ figure on the "bad technical debt" could be turned around into a more positive spin (assuming that the following applies in your case): " if we invest in reworking component X, we could introduce feature Y faster and thus get Z more customers ".
In other words, evaluate the cost of technical debt against cost of lost business opportunities.

Sonar has a great plugin (technical debt plugin) to analyze your sourcecode to look for just such a metric. While you may not specifically be able to use it for your build, as it is a maven tool, it should provide some good metrics.
Here is a snippet of their algorithm:
Debt(in man days) =
cost_to_fix_duplications +
cost_to_fix_violations +
cost_to_comment_public_API +
cost_to_fix_uncovered_complexity +
cost_to_bring_complexity_below_threshold
Where :
Duplications = cost_to_fix_one_block * duplicated_blocks
Violations = cost_to fix_one_violation * mandatory_violations
Comments = cost_to_comment_one_API * public_undocumented_api
Coverage = cost_to_cover_one_of_complexity *
uncovered_complexity_by_tests (80% of
coverage is the objective)
Complexity = cost_to_split_a_method *
(function_complexity_distribution >=
8) + cost_to_split_a_class *
(class_complexity_distribution >= 60)

I think you're on the right track.
I've not had to calculate this but I've had a few discussions with a friend who manages a large software development organisation with a lot of legacy code.
One of the things we've discussed is generating some rough effort metrics from analysing VCS commits and using them to divide up a rough estimate of programmer hours. This was inspired by Joel Spolsky's Evidence-based Scheduling.
Doing such data mining would allow you to also identify clustering of when code is being maintained and compare that to bug completion in the tracking system (unless you are already blessed with a tight integration between the two and accurate records).
Proper ROI needs to calculate the full Return, so some things to consider are:
- decreased cost of maintenance (obviously)
- opportunity cost to the business of downtime or missed new features that couldn't be added in time for a release
- ability to generate new product lines due to refactorings
Remember, once you have a rule for deriving data, you can have arguments about exactly how to calculate things, but at least you have some figures to seed discussion!

I can only speak to how to do this empirically in an iterative and incremental process.
You need to gather metrics to estimate your demonstrated best cost/story-point. Presumably, this represents your system just after the initial architectural churn, when most of design trial-and-error has been done but entropy has had the least time to cause decay. Find the point in the project history when velocity/team-size is the highest. Use this as your cost/point baseline (zero-debt).
Over time, as technical debt accumulates, the velocity/team-size begins to decrease. The percentage decrease of this number with respect to your baseline can be translated into "interest" being paid on each new story point. (This is really interest paid on technical and knowledge debt)
Disciplined refactoing and annealing causes the the interest on technical debt to stablize at some value higher than your baseline. Think of this as the steady-state interest the product owner pays on the technical debt in the system. (The same concept applies to knowledge debt).
Some systems reach the point where the cost + interest on each new story point exceeds the value of the feature point being developed. This is when the system is bankrupt, and it's time to rewrite the system from scratch.
I think it's possible to use regression analysis to tease apart technical debt and knowledge debt (but I haven't tried it). For example, if you assume that technical debt correlates closely with some code metrics, e.g. code duplication, you could determine the degree the interest being paid is increasing because of technical debt versus knowledge debt.

+1 for jldupont's focus on lost business opportunities.
I suggest thinking about those opportunities as perceived by management. What do they think affects revenue growth -- new features, time to market, product quality? Relating debt paydown to those drivers will help management understand the gains.
Focusing on management perceptions will help you avoid false numeration. ROI is an estimate, and it is no better than the assumptions made in its estimation. Management will suspect solely quantitative arguments because they know there's some qualitative in there somewhere. For example, over the short term the real cost of your debt paydown is the other work the programmers aren't doing, rather than the cash cost of those programmers, because I doubt you're going to hire and train new staff just for this. Are the improvements in future development time or quality more important than features these programmers would otherwise be adding?
Also, make sure you understand the horizon for which the product is managed. If management isn't thinking about two years from now, they won't care about benefits that won't appear for 18 months.
Finally, reflect on the fact that management perceptions have allowed this product to get to this state in the first place. What has changed that would make the company more attentive to technical debt? If the difference is you -- you're a better manager than your predecessors -- bear in mind that your management team isn't used to thinking about this stuff. You have to find their appetite for it, and focus on those items that will deliver results they care about. If you do that, you'll gain credibility, which you can use to get them thinking about further changes. But appreciation of the gains might be a while in growing.

Being a mostly lone or small-team developer this is out of my field, but to me a great solution to find out where time is wasted is very, very detailed timekeeping, for example with a handy task-bar tool like this one that can even filter out when you go to the loo, and can export everything to XML.
It may be cumbersome at first, and a challenge to introduce to a team, but if your team can log every fifteen minutes they spend due to a bug, mistake or misconception in the software, you accumulate a basis of impressive, real-life data on what technical debt is actually costing in wages every month.
The tool I linked to is my favourite because it is dead simple (doesn't even require a data base) and provides access to every project/item through a task bar icon. Also entering additional information on the work carried out can be done there, and timekeeping is literally activated in seconds. (I am not affiliated with the vendor.)

It might be easier to estimate the amount it has cost you in the past. Once you've done that, you should be able to come up with an estimate for the future with ranges and logic even your bosses can understand.
That being said, I don't have a lot of experience with this kind of thing, simply because I've never yet seen a manager willing to go this far in fixing up code. It has always just been something we fix up when we have to modify bad code, so refactoring is effectively a hidden cost on all modifications and bug fixes.

Related

Sprint estimation handling after QA increase

Long story short: we had a team of 3 devs and 1 QA working in a stable rhythm of a two-week sprint of 50 story points. We discussed increasing the team by one dev and 1 QA with the PO. The QA was added to the team, but the developer will not be added anymore due to various reasons.
Now, of course, the PO is asking if we can increase the number of story points, considering the team has increased by 1 QA. This, of course, is an odd situation as most of the tasks we estimate require development, and since the development capacity is the same, we cannot estimate more. But from his side, he probably also cannot 'accept' that team has increased, but estimations are the same. So what are the common solutions to this?
The way I see it, a QA can handle some non-dev tickets like documentation, research, etc. So one way to satisfy both parties is for the next sprints to have one or two additional tickets (besides the 30 estimated ones) with the condition that those 1-2 are tickets that only require QA work. Any other suggestions?
I think I would maybe pull in at most one more small story for the first sprint they participate in. It might be better to not pull any in, and just let them pair with someone for a sprint or two.
Whether you pull in a new story or not, be sure to talk about this at the next retro. Did the new person speed the team up or slow the team down? Remember that the team velocity is a team velocity, so it's up to the team to decide if they are ready to pull in more stories in the next sprint.
I recommend you ramp up slowly after that first sprint. If you find room at the end of the sprint, that's a signal that you might be able to pull in an additional story at the next sprint. You won't have to guess if you can pull in more stories, the statistics from the sprint will tell you.

Using OptaPlanner to create school time tables with some tricky constraints

I'm going to use OptaPlanner to lay out time tables for a school.
We're laying out the time tables for a full semester and every week could, if necessary, be slightly different.
There are some tricky constraints to take into account:
1. Weekly schedules
The lectures in one subject should be spread out somewhat evenly over the semester.
We can't for example put 20 math lectures the first week and "be done" with math for this semester.
In fact, it's nice to have some weekly predictibility
"Science year 2 have biology on Tuesday mornings"
This constraint must not be carved in stone however. Some weeks have to include work experience sessions, PE excursions, etc, in which case they must deviate from other weeks.
Problem
If I create a constraint that say, gives -1soft for not scheduling a subject the same time as the previous week, then OptaPlanner will waste a lot of time before it "accidentally" finds a good placement for a lecture, and even if it manages to converge so that each subject is scheduled the same time every week, it will never ever manage to move the entire series of lectures by moving them one by one. (That local optimum will never be escaped.)
2. Cross student group subjects
There's a large correlation between student groups and courses; For example, all students in Science year 2 mostly reads the same courses: Chemistry for Science year 2, Biology for Sience year 2, ...
The exception being language courses.
Each student can choose to study French, German or Spanish. So Spanish for year 2 is studied by a cross section of Science year 2 students, and Social Studies year 2 students, etc.
From the experience of previous (manual) scheduling, the optimal solution it's almost guaranteed to schedule all language classes in the same time slots. (If French is scheduled at 9 on Thursdays, then German and Spanish can be scheduled "for free" at 9 on Thursdays.)
Problem
There are many time slots in one semester, and the chances that OptaPlanner will discover a solution where all language lectures are scheduled at the same time by randomly moving individual lectures is small.
Also, similarly to problem 1: If OptaPlanner does manage to schedule French, German and Spanish at the same time, these "blocks" will never be moved elsewhere, since they are individual lectures, and the chances that all lectures will "randomly" move to the same new slot is tiny. Even with a large Tabu history length and so on.
My thoughts so far
As for problem 1 ("Weekly predictability") I'm thinking of doing the following:
In the construction phase for the full-semester-schedule I create a reduced version of the problem, that schedules (a reduced set of lectures) into a single "template week". Let's call it a "single-week-pre-scheduling". This template week is then repeated in the construction of the initial solution of the full semester which is the "real" planning entity.
The local search steps will then only focus on inserting PE excursions etc, and adjusting the schedule for the affected weeks.
As for problem 2 I'm thinking that the solution to problem 1 might solve this. In a 1 week schedule, it seems reasonable to assume that OptaPlaner will realize that language classes should be scheduled at the same time.
Regarding the local optimum settled by the single-week-pre-scheduling ("Biology is scheduled on Tuesday mornings"), I imagine that I could create a custom move operation that "bundles" these lectures into a single move. I have no idea how simple this is. I would really like to keep the code as simple as possible.
Questions
Are my thoughts reasonable? Is there a more clever way to approach these problems? If I have to create custom moves anyways, perhaps I don't need to construct a template-week?
Is there a way to assign hints or weights to moves? If so, I could perhaps generate moves with slightly larger weight that adjusts scheduling to adhere to predictable weeks and language scheduled in the same time slots.
A question well asked!
With regards to your first problem, I suggest you take a look at OptaWeb Employee Rostering and the concept of rotations. A rotation is "how things generally are" and then Planner has the freedom to diverge from the rotation at a penalty. Once you understand the concept of the rotation from the UI, take a look at the planning entity Shift and how the rotation is implemented with the use of employee and rotationEmployee variables. Note that only the employee is an actual #PlanningVariable, with the rotationEmployee being fixed.
That means that you have to define your rotations manually, therefore doing the work of the solver yourself. However, since this operation is only done once a semester I assume, maybe the solution could be to have a simpler solver generate a reasonable general rotation first, and then a second solver would take it and figure out the specific necessary adjustments?
With regards to your second problem, rotations could help there too. But I'm thinking maybe some move filtering and custom moves to help OptaPlanner to either move all language classes, or none? Writing efficient custom moves is not easy, and filtering stock moves is cumbersome. So I would only do it when the potential of other options is exhausted. If you end up doing this, look for MoveIteratorFactory.
My answer is a little vague, as we do not get into the specifics of the domain model, but for the purposes of designing the overall solution, it hopefully gives enough clues.

Get Annual Financial Data for a Stock for many years in R

Suppose I want to regress in R Gross Profit on Total Revenue. I need data for this, and the more, the better.
There is a library on CRAN that I find very useful: quantmod , that does what I need.
library(quantmod)
getFinancials(Symbol="AMD", src="google")
#to get the names of the matrix: rownames(AMD.f$IS$A)
Total.Revenue<-AMD.f$IS$A["Revenue",]
Gross.Profit<-AMD.f$IS$A["Gross Profit",]
#finally:
reg1<-lm(Gross.Profit~Total.Revenue)
The biggest issue that I have is that this library gets me data only for 4 years (4 observations, and who runs a regression with only 4 observations???). Is there any other way (maybe other libraries) that would get data for MORE than 4 years?
I agree that this is not an R programming question, but I'm going to make a few comments anyway before this question is (likely) closed.
It boils down to this: getting reliable fundamental data across sectors and markets is difficult enough even if you have money to spend. If you are looking at the US then there are a number of options, but all the major (read 'relatively reliable') providers require thousands of dollars per month - FactSet, Bloomberg, Datastream and so on. For what it's worth, for working with fundamental data I prefer and use FactSet.
Generally speaking, because the Excel tools offered by each provider are more mature, I have found it easier to populate spreadsheets with the data and then read the data into R. Then again, I typically deal with the fundamentals of a few dozen companies at most, because once you move out of the domain of your "known" companies the time it takes to check anomalies increases exponentially.
There are numerous potential "gotchas". The most obvious is that definitions vary from sector to sector. "Sales" for an industrial company is very different from "sales" for a bank, for example. Another problem is changes in definitions. Pretty much every year some accounting regulation or other changes and breaks your data series. Last year minorities were reported here, but this year this item is moved to another position in the P&L and so on.
Another problem is companies themselves changing. How does one deal with mergers, acquisitions and spin-offs, for example? This sort of thing can make measuring organic sales growth next to impossible. Yet another point to bear in mind is that if you're dealing with operating or net profit, you have to consider exceptionals and whether to adjust for them.
Dealing with companies outside the US adds a whole bunch of further problems. Of course, the major data providers try to standardise globally (FactSet Fundamentals for example). This just adds another layer of abstraction and typically it is hard to check to see how the data has been manipulated.
In short, getting the data is onerous and I know of no reliable free sources. Unless you're dealing with the simplest items for a very homogenous group of companies, this is a can of worms even if you do have the data.

Given a collection of consumers competing for a limited resource, allocate that resource to maximize it's applicability

Sorry the question title isn't very clear, this is a challenging question to ask without providing a more concrete example. Consider the following scenario:
I have a number of friends whose birthdays are coming up on dates (d1..dn), and I've managed to come up with a number of gifts I'd like to purchase them of cost (c1..cn). Unfortunately, I only have a fixed amount of money (m) that I can save per day towards purchasing these gifts. The question I'd like to ask is:
What is the ideal distribution of savings per gift (mi, where the sum of mi from 1..n == m) in order to minimize the aggregate deviance between my friends' birthdays and the date in which I'll have saved enough money to purchase that gift.
What I'm looking for is either a solution to this problem, or a mapping to a solved problem that I can utilize to deterministically answer this question. Thanks for pondering it, and let me know if I can provide any additional clarification!
I think you've stated a form of a knapsack problem with some additional complications - the knapsack problem is NP-Complete (p 247, Garey and Johnson). The basic knapsack problem is where you have a number of objects each with a volume and a value - you want to fill a knapsack of fixed volume with the objects to maximize the value without exceeding the knapsack capacity.
Given that you have stages (days) and resources (money) and the resources change by day while you decide what purchases to make, would lead me to a dynamic programming solution technique rather than a straight optimization model.
Could you clarify in comments "minimizing the deviance"? I'm not sure I understand that part.
BTW, mathoverflow.com is probably not helpful for this. If you look at algorithm questions, 50 on stackoverflow and 50 on mathoverflow, you'll find the questions (and answers) on stackoverflow have a lot more in common with the problem you are considering. There is a new site called OR Exchange, but there's not a lot of traffic there yet.

asp.net web site developer pricing

i have been approached to build some websites for a few small businesses. They want a basic out of the box database driven website with some standard stuff (users, authentication, a few dynamic pages, etc). i am going to use asp.net mvc for this.
they have asked me how much i charge for this. my question, is that i have no frame of reference here. should i charge for the project a flat fee or a per hour charge. where do i start here to help determine correct pricing for a website project.
Charge an hourly fee that is about 3x the hourly rate you would command in a full-time job. The 3x multiplier basically evens things out for the benefits, etc. that you won't get as a 1099 employee.
Whatever you do, no matter how "Standard" it sounds. Do not charge a flat fee. Under that arrangement they have no incentive to curb feature creep. Even if you agree to a really tight spec up front, it is a recipe for disaster because it forces you to renegotiate every time they want something more. Under an hourly arrangement feature-creep works to your advantage.
Also, don't discount the hourly rate if you are a novice. Just don't bill unproductive hours. It is much easier to ease into billing more hours later than renegotiating the price per hour.
Charge per hour.
-- edit
So attempt to 'quote' it by estimating the number of hours. Make sure your estimate is conservative.
A nice approach is, in your head, consider the 'min', 'max', 'standard' type of time. Then use that to estimate the real time it will take you.
If you know that they know what they want and won't change the specs on you, go for a lump sum. That way you can work quickly.
If they are prone to change their minds and don't know what they want, go for an hourly fee. That way you won't be stuck working on their project for months without additional pay when they can't decide on exactly what they want.
I admit that I don't know much about this issue. However, I would still like to warn about the whole charge-per-hour mentality. While this approach basically protects the developer, it doesn't work well with the business owner:
Charge-per-hour, to the business owner, is a liability, whereas fix-price is just a cost. That's one.
The second thing is, if you are charging per hour, how are you going to justify your "research time"? Are you going to charge that as well? But business owner doesn't like to pay for research time. Or you can stick to your old trick and do something that has been reinvented N times and charge for the amount of time you spend. But that would seem unethical to some.
I have billed both by the hour and by the project. It's been my experience that customers are happier with project based billing instead of hourly billing.
With that in mind, I always pad the project cost by an amount I feel will cover the times when the client decides to change their mind. Further, I keep the project plan pretty simple. For example, I don't write 4 pages on how the login screen will work. Instead, It's a single bullet point: "Login Page". This allows both them and I a little flexibility.
Because I keep things simple AND I allow time for flexibility AND the clients know how much it's going to cost up front, my client's are happier and I can keep better track of my income. Also, I keep in pretty close contact with them. As long as you can keep the relationship good, you'll have a long term client.
Of course, it takes a bit of self discipline in combination with experience to know how long things take to build. Along these lines I never experiment on a client's dime. When I write the proposal, I already know what I'm going to use to get the job done and I've used those tools before. Because of this I can say with confidence that a login page will take a certain amount of time to put together.
Next, don't bite off more than you can chew. If it's a big project, break it up into smaller deliverables with their own pay schedule. That way the client (or you) can decide to walk away at any point. For example, if you think the project will take 3 months, break it up into 3 pieces. Incidentally, this helps with cash flow.
Finally, don't discount your time when getting started. That scares people.
I have a flat rate I charge for sites and outline exactly what they will get and then anything beyond that outline gets charged at an hourly rate. The hard part of this is if your getting into a project where your not sure how long it will take then you might want to break down the various pieces and then add at least 10 hours to that estimate. You don't want to sell yourself short but you also don't want to overcharge the customer. Be sure your clear up front that once the site is delivered then all changes are per hour or based on a maintenance fee structure.
Good luck.

Resources