Learn How to Read Code

As a software developer, what do you spend most of your time doing? Really, besides going to meetings and playing ping-pong, how do you spend the majority of your work day? I would guess that you spend more time reading code than writing it. Maybe you’re trying to come up to speed on a new project, or learn how an open source project works. Perhaps you’re just trying to puzzle out what the heck you wrote last week. Whatever the reason, every developer should learn how to read code, not just write it.

Like many developers, I’ve certainly written a lot of code in my career. I’ve also spent a lot of time reading other people’s code as well. More often than not, when I have a question about the inner workings of an open source library, I look at the source code, not the documentation. This may have prevented me from being more productive at the time. Maybe I could have just read the docs and wrote my code and moved on. But I believe that my years reading and understanding other’s code made me a better developer. I’ve learned what makes good, maintainable, easy to understand code. My deeper understanding of how certain libraries and frameworks are put together allows me to be more efficient, confident that code I write will do what I think it should. (I still write tests, though!)

There are three main reasons you should get good at reading and understanding code: to learn, to debug, and to understand.

Read to Learn

In schools across the US (perhaps more broadly) great emphasis is placed on teaching kids to read. If kids learn nothing else, learning to read gives them the ability to learn anything else they set their mind to.

Learning to read code is much the same. Before you can write code, you have to know what it looks like, understand how the various constructs behave. If I asked a preschooler to write a sentence, and that child had never even seen letters, let alone words or sentences, it would be an impossible task.

Beyond the mechanics of coding, reading code expands your software “vocabulary”. We are always encouraging children to read as often as possible because it builds their language muscle.  Reading allows you to see new words in context. You see new turns of phrase, interesting juxtaposition of words, and elegantly expressed ideas. You then, sometimes unconsciously, incorporate those ideas into your own speech and writing.

Reading other people’s code has much the same effect. You see how problems are broken down into components: libraries, packages, classes and methods. You see the interaction between these classes, how they’re constructed, how they manage their data. You begin to understand the elements of a good design and a mediocre one. Real life examples of design patterns come to life in existing, working software.

Read to Debug

Reading code for curiosity’s sake is fine, but the main reason I start digging into someone else’s code is to troubleshoot something. There have been countless times where the documentation isn’t clear about what happens in my specific use case. Sometimes I get a stack trace that doesn’t include any code from my application, meaning that it never got far enough to call into my code, and I need understand why.

When this happens, I’ll start by setting some breakpoints in the library, fire up the debugger, and start reading, following the trail of execution in order to understand what is going on. Usually the problem is in my code and I just misunderstood a subtlety in how I should interact with the library. Sometimes I find a bug in the library and can submit a defect and patch. Either way, I always learn something about the library or framework that wasn’t stated in the documentation (at least, not in a way I understood). Sometimes I curse the developer that wrote the library for not including better logging.  Sometimes that developer is a past version of me, when I didn’t know any better. (facepalm)

Read to Understand

How many of you have the opportunity to build software from a clean slate—that is, so-called “green field” development? Not many do. You may get a small project from time to time, but the majority of our work involves maintaining and improving existing systems. Maybe you just started a new job or you transferred to another team.  Perhaps you have been asked to add some enhancements to a system you haven’t worked on before. Whatever the case may be, at some point (probably tomorrow) you will have to read code that you’re unfamiliar with, even if you’re the original author!

Before you can add enhancements or fix problems with existing code, you need to have some understanding of how it works. What are the major components? How do the pieces fit together? What is the logical flow of execution? What changes have been made since you last worked on this code? The answers to these questions are probably not in the corporate wiki. Keeping documentation up to date is notoriously difficult and rarely done well. Your best source of information is usually the code itself.

How to Read Code

If I’ve convinced you of the value of learning to read other people’s code, how does one go about doing that? Code is not like a book, you don’t just start at the beginning and read every class and every line of code. However, there are a few strategies:

  • Use the debugger. This is how I’ve learned the most from third-party code. Set a breakpoint somewhere in the library (your IDE downloaded the source for you, right?) along the execution path you’re having trouble with. Follow the trail as you step through the code in your debugger. Notice what classes and methods are involved. How does the code handle exceptions? What components are involved? What are their responsibilities? How does your input make it’s way through the system?
  • Scan first, then dig. Code that is written well has well named methods and variables. This means you can read a chunk of code and understand it at a high level without knowing the specifics of the underlying method calls. Once you understand the overall flow, you can start digging into the lower levels to see how those work.
  • Search for relevant text. When I’m digging through third-party code because something went wrong, I typically have an error message or a stack trace. Look for that text in the source code and use that as a starting point. Back up to the start of the method to see how it got there. Dig into the methods that were called just prior to the error to see what they were doing. Again, using the debugger can help.
  • Examine cross-references. Your development environment is very good at indexing and keeping track of the linkages between the various classes in your system. Use these tools to better understand the structure of the code. Look at the class hierarchy. What classes and interfaces does the class you’re looking at inherit from? What derives from it. Where are all the usages of this class or method? Are there a lot, meaning this is a central component? Or are there few, perhaps meaning this component is not used frequently?

All of these ideas really just give you a starting point. In many ways, reading other people’s code is like putting a jigsaw puzzle together. Some people start with a distinctive portion of the image and work out.  Others start with the edge pieces and work in. Neither one is wrong, it’s just a matter of finding a place to start and working and analyzing from there.

Conclusion

Learning to read code is a skill that every developer should cultivate. You learn to write better code, you’ll be better at troubleshooting and you’ll have a better understanding of the systems you work on. I don’t believe that I would be the developer I am today if I had not put in the time reading and understanding other people’s code over all these years.

The next time you have a problem to solve, or you don’t quite understand how a particular bit of code is working, fire up your debugger, download the source and start reading. Learn how to read code and you’ll be surprised what you can learn.

Discussion Question: What have you learned from reading other people’s code? What tips do you have for quickly becoming familiar with a new code base?