Stop fighting code duplication

From day one in Computer Science, you are taught to avoid code duplication. You are explained why it is better if a piece of code is wrapped in a function. I’m sure you were also told many times that you should keep your code DRY.

This is sound advice, and most of the time it is true that code duplication will harm your code. Have you ever had to fix a bug twice? I have, and that’s not fun for anyone.

However, the problem is that you are never taught what code duplication actually is or why you should be fighting it. Here is the thing: code duplication doesn’t matter. What truly matters is knowledge duplication, and that’s what you should be on the lookout for as it is most likely the programmer’s number one enemy. It will cripple your code base faster than anything else I know.

Here is what knowledge duplication looks like :

I have seen a similar line of code incur 8 months worth of work (paid with your taxes, thank you very much). Have you spotted the duplication yet?

To perform this if() you need to know the type of ID. That knowledge belongs within the User class only. Extracting that information outside of the User class causes knowledge duplication, even though there is no code duplication. What if you wanted to add a new type of ID? Remember ISBN-10 vs ISBN-13?

What’s the solution? Easy, that’s what we have encapsulation for!

Now you make no assumption whatsoever on the type of ID. It could support both integers and strings and have some kind of mechanism to convert from one to the other. It could also use some alternative ways to check the ID depending on the context. You don’t know and you don’t care now that this knowledge is encapsulated in a single place in your codebase.

It is true that, in most cases, the main symptom of knowledge duplication is code duplication. But it is merely that, a symptom. It would be a logical fallacy, a converse error, to assume that all code duplication is caused by knowledge duplication. Code duplication may actually be desirable when it is not stemming from knowledge duplication.

Imagine you have a chunk of 7 lines of code that is located in two different places in your codebase. What does that tell you?

Nothing. That’s the catch. It could be duplicated knowledge, in which case you know what to do. But think it through: what else could it be?

It could merely be lines of code appearing identical at the moment, but are destined to evolve in time to fulfil two very distinct needs. Encapsulating them together would lead you to run into the programmer’s second worst enemy: wrong abstractions. Yeah, programming is cruel that way; you can dodge a bullet by diving into a shark infested pool.

The solution to this is trickier however. If you are sure these lines of code are there for two vastly different reasons, don’t try to abstract them! Wrong abstractions quickly lead to problems such as throw NotImplemented(“you can’t do this on this implementation!”) or switch cases using instanceof.

If you are unsure whether this is knowledge duplication, then your guts and experience are pretty much all you have left. Many arguments from the SOLID principles can be used to help you here, but they are not definitive answers. Re-reading about the Liskov Substitution Principle would not be a waste of time.

If you are sure these chunks of code have a different purpose; leave them be. Or maybe abstract the common part in both (if there is one), but don’t overdo it.

Next time you see code duplication, think twice before angrily smashing your expensive DAS keyboard. But if you see knowledge duplication, you are hereby licensed to kill it on sight!

Interestingly, a similar argument could be made about coupling! This could be the subject of an upcoming article: Stop trying to decouple your code.