Project Comprehension: Understanding Java Projects Efficiently
If you're a new dev on a Java project team, even if you're experienced, it takes time to become familiar with the code. This method should improve project comprehension.
Join the DZone community and get the full member experience.
Join For FreeLet's start with a bit of theory.
A modern Java application is a complex system that frequently operates as a node in a larger enterprise network. By the time a new developer joins the team, the project will likely have been in development for a couple of years and contain code contributions from dozens of developers, most of whom left the project long ago. Documentation is not always up-to-date and accurate, and only a few team members may have a comprehensive picture of the project (whom you'll have to catch for short Q&As in between meetings, code reviews, and emergency deployments).
For many developers, this means spending the first several months mostly becoming familiar with the project i.e. on project comprehension. During this time (which, depending on experience, may take up to six months) the developer is not expected to deliver a lot of value. Even after this initial period, the developer's comprehension of the project is far from complete.
If the described problem sounds familiar, it should. Project comprehension (more generally — program comprehension) is one of the most overlooked areas in software development. It's rarely a topic for water cooler conversations. However, it's enough of a deal to constitute a separate branch of scientific research dating back to the 1970s (e.g. Using a behavioral theory of program comprehension). There are even dedicated international conferences (e.g. IEEE International Conference on Program Comprehension) on the subject, with its facets spanning theoretical math, computer science, psychology and brain physiology (yes, they even used MRI to study our brains: Measuring Neural Efficiency of Program Comprehension).
Why is project comprehension so difficult?
Writing new code |
Reading existing code |
|
Activity |
Writing code from your mind |
Reading and navigating someone else's code |
Intentions |
Intentions are clear |
Intensions are not clear and have to be deduced |
Abstraction level |
Working on one abstraction level at a time |
Jumping between various abstraction levels |
For product owners and management, the biggest concern should be associated costs. Time is money. Every workday spent by the developer figuring out how the thing works and why it was designed that way is an expense.
Research |
Comprehension effort |
IBM (Corbi, 1989) |
Over 50% of time |
Bell Labs (Davison, 1992) |
New project members: 60-80% of time, drops to 20% as one gains experience |
National Research Council in Canada (Singer, 2006) |
Over 25% of time either searching for or looking at code |
Microsoft (Hallam, 2006) |
Equal amount of time as design, test |
Microsoft (La Toza, 2007) |
Over 70% of time |
Microsoft (Cherubini, 2007) |
95%~ significant part of job 65%< at least once a day 25%< multiple times of a day |
If you imagine what these percentages mean in terms of software development budgets, it's easy to see that project comprehension is a very real problem in the industry.
From Theory to Practice
Moving from theory to practice, below I offer a structured list of questions (a template) that may help you build a mental model of an unfamiliar Java project.
In general, projects should be studied using the top-down approach and starting from the business aspect first.
- Business aspect
- General software category
- What general category does this project belong to (e.g. end-user application, middleware, framework, persistence storage, development tool, security mechanism)?
- Relation to other software
- Is this project based on (or a basis for) existing software?
- Does this project extend (or is extended by) existing software?
- Does this project embed (or is embedded by) existing software?
- Similarities and competition
- Is there similar software?
- If there is similar software, how is this software different?
- Does similar software compete with this project (similarity does not always mean competition)?
- If there is competition, what is this project's competitive advantages and disadvantages?
- Main purpose
- What is the main purpose of this project (described in 1-3 key points)?
- Ecosystem
- Is the project self-sufficient (operates in isolation) or networked (provides, consumes or processes data as a node in a network)?
- If this project is networked:
- What other software is in the network?
- What is its purpose (described in 1-3 key points)?
- What role does this project play in the network?
- Which software communicates with this project directly and how?
- Project glossary
- What is the project's glossary (core terminology, aliases and abbreviations specific to the project, it's business domain and network)?
- General software category
- Technological aspect
- Component level
- Component structure
- Is the project a single component (application) or a group of components (from the virtual machine's (application server's) perspective)?
- If it's a group, how is it structured and why?
- Architecture
- Which commonly used term best describes the intended architecture of every component?
- Persistence
- Which components use persistence storage, what type and why?
- Blocks of functionality
- What major blocks of functionality does each component contain (e.g. retrieval, validation, filtering, processing, persisting, graph building, document generation, authentication)?
- Component structure
- Component level
Deducing such information from documentation or the codebase can be very time-consuming and inaccurate, so it's a perfect subject for an introductory Q&A session, which has to be done as early as possible. Ideally, these questions should be answered by at least two team members with the most experience in the project. This ensures better accuracy of such information and also gives project old-timers the rare opportunity to identify and resolve differences in their understanding of the project's basic premises.
After building valuable context with information gained during the introductory Q&A session, it's time to dive into the purely technical side of the project. However, don't start analyzing source code or debugging just yet, Try approaching the matter from a higher level. The questions above stopped at the component level. You can continue studying the project on your own starting from the module level:
- Module level
- Module structure
- Are components a single build module (e.g. Maven, Gradle module) or a group?
- If it's a group, how is it structured and why?
- Which modules, after a full build, represent runnable (deployable) artifacts (EAR files, WAR files, JAR files with a main method) and which end up as internal libraries for the first category?
- Dependencies
- What are the module's declared dependencies (e.g. other modules, external libraries), including version number and scope?
- Build plugins
- What plugins and scripts are run during a module build and why?
- Module structure
- Package level
- What is the package structure of each module?
- Which packages are shared by multiple modules and which exist only in one module?
At this point, you've reached the level of individual source files. Even here you can start by collecting valuable general information.
- Source code files
- Which classes contain core (structural) annotations provided by EE, Spring and similar frameworks (e.g. @Stateless, @Service, @RequestMapping)?
- Which classes use core classes provided by module dependencies (e.g. Hibernate's EntityManager)?
- Which classes are named using conventional marker words (e.g. "repository", "service", "DTO", "generator", "converter")?
- What kinds of enums are declared and where?
- Are there classes with "main" methods (rare, especially in web applications, but very important if found)?
- Non-source code files
- Which files are named using conventional marker words ("properties", "config")?
- Which files have marker extensions (e.g. *.cfg, *.xml, *.csv, *.prop, *.json)?
Whether you are studying the project at the component level, module level, or class level, it's important that you understand the logic behind the names of the units. Sometimes, this can be an eye-opener.
It's really useful to study the organization of project files and folders. This organization might differ from what you see in your IDE. By default, some IDEs display the project with logical grouping and nesting of files and folders according to the IDE's notions of "project", "module", "dependency", "library" etc. If your IDE does that, make sure you understand the logic that it's applying.
Finally, if your project uses persistence storage, study the general organization of data in such storage, if reasonably possible (e.g. it's not hundreds or thousands of tables). For starters, just check out what relational tables (alternative structures for non-relational storages) are there and what is their relation to one another. Many database GUI clients allow you to easily generate diagrams. It gets a bit more complex with NoSQL persistence storage as, often, they promote a schemaless design and the only place you can see the data model is in the code itself.
Now, if you were curious enough (and you should be on a new project), even after all the research described above, you should be able to come up with additional points to discuss during a follow-up Q&A session. If you're going to bother someone with questions, prepare and structure them thoroughly, ask them early and ask them in batches. Throw in your assumptions about the project (they may be wrong). Write down everything. The more details you collect, the better. You can always cross out irrelevant or unimportant items later, but if you don't write them down immediately you risk missing something valuable. Sum up and write down all the answers. Memorizing so much heterogeneous information is a bad idea. Don't rely on the initial feeling of complete clarity as it can be illusory.
Epilogue
Having read this article, you may ask why bother the busiest developers on the team with long Q&A sessions, and not once, but twice?
I have yet to see a more time-efficient (cost-efficient) way of transferring general knowledge about a project (context) than having someone who knows the project hands-on explain things in plain words while you are able to rephrase any question or clarify any response as you go.
The last part is very important, as is the opportunity to ask very basic (or seemingly dumb) questions. This is due to the difference in how people understand things that are considered common knowledge, including what is obvious and what isn't. This is precisely why StackOverflow, forums, webcasts, and the like are so popular in the professional community, even though there is plenty of documentation for any popular programming language, database, framework, library, etc.
We, developers, are always learning new things and skills. Even if you have 20 years of experience behind your back, you may feel uncomfortable on a project that uses a technology stack that you are completely unfamiliar with. In this respect, project comprehension is going to be a recurring theme for all of us.
I hope this article will be helpful in efficiently building mental models of unfamiliar Java projects and proper context for working with the lowest abstraction levels of a project such as classes, methods, control flows, etc.
Opinions expressed by DZone contributors are their own.
Comments