Rescue Project: How to Help and Approach
Over two-thirds of organizations suffered at least one project failure the previous year. So today, let's focus on less severe cases, which can still be saved!
Join the DZone community and get the full member experience.
Join For FreeSome IT projects fail, and mistakes happen. KPMG Project Management survey showed that more than two-thirds of organizations suffered at least one project failure in the previous year. So today, let's focus on less severe cases, which can still be saved!
When Do You Need to Look For Help?
When completing many software projects and helping various businesses put their ideas to life using code, we've learned that multiple coexisting factors usually cause problems. For example, it was often the case that we took a project over after the previous software development team. Some of these situations were classic project rescue cases where we stepped in to clean up and make things work.
How do you usually know that your software project is in trouble? You may be encountering some of the following signs:
- There are stability or performance issues.
- There is no clear documentation.
- You have problems communicating with the project team members.
- People are leaving the project/software company.
- The people left in the project are missing the required skills.
- Poor project management/overall mess.
- There are issues with delivering new tasks and business value on time, leading to missed deadlines.
How are companies responding to such situations? Sometimes they attempt to add new people to the team. In other cases, they look for external help in the form of consultancy. Finally, in edge cases, they seek out a new team or switch software vendors.
How to Help With the Project?
Analyze the Situation
The first step when taking over a project or joining as a consultant is understanding the situation. Talk to different stakeholders to understand the system's role and how it works. Knowledge transfer from the previous team is always a good thing, but not always possible. If there is an option to ask questions to people leaving the team/who have already left, leverage such possibility as much as it is possible. Analyze the documentation, wikis, readme, and of course, the code. Ask different parties about the system's biggest current issues to get the whole image.
This is the moment several unpleasant things may be discovered, e.g., reasons why the development team decided to leave the projects. There may be plenty of them: legacy stack, bad project quality, issues with project management or team lead, or lack of proper skills. This will allow you to understand better why the project reached a given situation. It is a rather uncommon situation to find out that a vendor left a perfect project.
Short-Term Actions
The first and most important thing is to learn how to build and deploy a given system. This is a necessity just in case any failure appears. What is more, you should get access to the logs and metrics to be able to analyze appearing issues.
The second step is to check if there are any low-hanging fruits; discuss if there are any big issues with the system, which can be resolved with a small time investment. Those won't be the best, most clean, and most efficient solutions, but they may resolve customers' biggest pains. Examples:
- In one project we took over, we encountered a situation where the system processing incoming data was suddenly "hanging" after a few days from deployment. There were no tests, no errors were visible in the logs, and it was difficult to rapidly identify and fix the issue's root cause. The simplest workaround was to schedule daily restarts before the beginning of customers' business day. This allowed the business to operate and gave us more time to solve the situation correctly.
- Another project had huge performance issues, which made the system unusable. A short debugging session showed that a single library had a bug causing threads to hang. A simple dependency update has made the system faster. In the long term, it was still not enough, but in the short term, it made a huge difference.
The third step is to do small cleanups around the work organization. Check if there are hanging, not resolved branches or pull requests. Check the way issues are arranged; if scrum or kanban were used, perhaps it is necessary to examine the backlog items and remove any that are outdated and obscure the project status. Verify if it is easy to find the latest documentation (if such exists); maybe there are a few versions in a few different places. Write down your discoveries from previous steps and knowledge transfers. The goal is to make work easier and more accessible for new team members who may join later.
Long-Term Plans
A long-term roadmap requires a deep discussion with the customer. First, you need to understand the business's driving factors; is it more important to invest in stability and scalability to prepare for new customers, or maybe there are missing features that will be revolutionary for a given product? Unfortunately but some trade-offs will need to be made.
Step-By-Step Improvements
It may be possible to take over the project and improve it step by step, developing new features and improving the old codebase in parallel. This is the optimal situation; however, not the easiest one. You tackle problems one by one, write new tests, docs, code, deploy, and move to the next one. This approach may include splitting the codebase into separate microservices and changing the overall architecture and methods of inter-service communication.
Example:
In one of the earlier mentioned projects, we agreed to continue operating the project after the initial quick fixes, allowing us to develop new features. However, at the same time, we started rework of the computational-heavy part of the system, which was a bottleneck. In the end, it became a separate microservice.
Abandon and Rewrite From Scratch
There are cases where projects are in a disastrous state; no tests, no solid docs, bad code quality, legacy technologies, overall mess, and mistakes made on the architectural level. Making any change in such an environment is very risky, and adding tests is also challenging when code is written without any thought about them. This situation is a total edge case and causes a lot of issues. You may decide to leave the old system in maintenance mode, fixing only the most important bugs, and start writing a new system from scratch. Unfortunately, new system creation takes time and may take years to produce, depending on product complexity. Such a situation is risky from a business perspective because it would mean that new features will reach customers later. Still, what is more, it is risky from a development perspective because you have to do a huge migration between old and new and verify if it works the same from a feature point of view.
Example:
We encountered a production system whose major part was covered by three tests. It had issues on the code level (it was visible that a major part was written by people still learning a given programming language), structural level (issues with running existing tests locally and writing additional ones), and architectural (large infrastructure costs, where at the same time there were performance issues). The decision was made together with the customer to abandon the old code and do a rewrite using different technologies which were a better fit to the given problem, which would later be easier to maintain by the customer and allow to reduce the monthly system costs.
Summary
Rescuing projects is not trivial. It may take time to introduce the required improvements and enable further business feature development. It's good to start such an initiative with a skilled team, who knows the leveraged technologies very well, so they can spot potential problems faster. It's rarely a lost cause; sometimes, it takes time and patience to see a positive outcome.
Published at DZone with permission of Michał Matłoka. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments