"She was mostly immensely relieved to think that virtually
everything that anybody had ever told her was wrong"
Douglas Adams, "So Long and Thanks for all the Fish"
I'm in the process of teaching myself Ruby on Rails at the moment. There's
no great reason for this, other than the fact that I kept hearing people talk
about it and curiosity got the better of me. That's not immediately relevant
though. What is relevant is that in parallel, I'm learning Javascript, and one
of the cool new things I learned was this - white space, commenting, and
descriptive variable names are bad. Think about it. All your Javascript,
including your comments, white space and big variable names, has to move from
the server to the user's browser, consuming bandwidth (think time and money)
along the way. Wow. Ponder the implications of that for a moment. Some of
that indisputably good software advice you were given, such as GOTO's being evil, is
just plain wrong.
In some
contexts.
That's bad news for people who just accept what they're told, turn their
brains off, and treat guidelines as unbreakable rules. Actually, it's probably
bad news for those who follow behind, dealing with the results. But
anyway...
The reason I'm writing anything here is that one of the big "rules" that's
mentioned all the time in Ruby on Rails is "DRY" - Don't Repeat Yourself. Don't
duplicate code or information, because that's always bad. Right? Actually,
no. It's wrong.
In some contexts.
Which is all very fortuitous for me, because I get to rehash a blog post I
wrote internally in Sept 2005 ("Colouring outside the lines" for any Verilaber
who want's to check how much reuse I managed to achieve here). One of the many
"rules" I looked at was "You should never duplicate code" because this bugs the
hell out of me. In testbench design, there are sometimes very good reasons for
duplicating code, yet I've seen engineers mindlessly removing all duplication
from a working testbench. By unthinkingly applying rules they didn't really
understand, they wasted time swapping probable advantages for improbably
advantages, and risked injecting bugs into working code. Like we don't have
enough to do already in verification!
So, why is duplicating code bad? Well let's be clear. It's not bad. It's
only bad in some contexts, and to understand which ones, it's worth
understanding why not duplicating code is good.
You might think that an advantage of not duplicating code is that it's faster
to just write the code once, but that's not always true. Making specific code
generic takes time and effort, so what commonly happens is that you find that
you are repeating yourself, so you do a refactoring session to replace the
duplicated code with a shared version. This means that you have already spent
time writing the code multiple times, and on top of that, you then have to write
a version that can be shared, remove the original code, and then fix any
issues. It's not going to be faster than just duplicating, that's for sure.
"Always program as if the person who will be maintaining your
program is a violent psychopath that knows where you live"
Martin
Golding
The advantage really comes during maintenance when you have to change the
code. Rather than change it in 100 places, you only have to change it in 1
place, That's a great thing to have. But it's only a great thing to have if
the cost of removing the duplication is smaller than the cost of updating the
code in P places. When P = 100, it's a no-brainer. When P = 2, it's more
difficult to call. Now, it depends on how often you'll have to change the
code. If you have to change it N times, and if N is large, then removing
duplication is probably good. So basically, if N*P is large, then removing
duplication is probably a good thing.
Probably. It's time to consider context now. We write testbenches, and a
lot of the time, these don't need to be maintained. We verify the RTL, the RTL
ships, and we move on to new designs. Testbench maintenance only really occurs
when we need multiple releases (respins or phased FPGA releases) of the design,
or if we want one testbench to work with multiple derivative designs. For many
testbenches, N is only large if the design is unstable, so we're constantly
modifying the testbench to keep up. That brings something else to consider
though. We remove duplicated code because the code is doing the same thing in P
places. However, what if that becomes false after you've removed the
duplication? What if you were doing FOO in two places, but now because of a
last minute, badly thought out design change, you have to do FOO in one of those
places, and BAR in another. In that case, you'd have been far better off just
keeping the duplicated code, because now you have one block of code that needs
two different behaviours. Ouch.
So if N*P is high and D (the amount or potential amount of divergence) is
low, then removing duplication is good. Otherwise, you might be better
off just allowing code to be duplicated (while keeping a close eye on what N, P
and D do during the project).
Time for a real example. I have one DUT that can be targeted at an ASIC or
an FPGA, and in either case, it can be in RTL or gate version. How many
testbenches should I have? Someone blindly applying the DRY rule might say
one. You should instantiate the DUT once in just one testbench, and use
`defines (or similar), to deal with any differences that come up. It would just
be pure evil to have "DUT dma(.clock(clk) ..." appear in different
places.
Someone who thinks about it a bit deeper might say...
P = 4 (e.g. we connect the clock and reset in four places)
N might be around 10. We have four FPGA releases planned, and we'll
probably get six gate level releases
D will be pretty large because of signal name changes. That is, the clock
connection might remain constant across all releases, but the port map is going
to change like crazy to deal with FPGA targeting and gate level
renaming
...and go with four testbenches. Sure, we're probably going to have to
tinker slightly every time we release a new FPGA release, or generate a new gate
level design (port changes), but the growing differences between the four design
types will mean that a single testbench will become a massive headache of
special case handling dealing with differences between the nominally identical
versions of the design. Any common code that needs to get changed will only
need changed in four places, and as it's not expected to change much anyway,
it's not a major headache. Someone going through this process might decide that
the flexibility offered by maintaining separate testbenches is more useful than
the benefits offered by removing duplication.
"Part of the problem with brittle design is due to
overgeneralization. Good programmers tend to like to factor out the common
aspects of their code, incorporating widely-used functionality into a single
subroutine or class. [...] These kinds of mechanisms tend to break when a
platypus is encountered"
Talin
And that's really the tradeoff we're making here. Being DRY means reducing
your flexibility to deal with divergences in the functionality, but it means
that maintenance will be easier if it doesn't diverge. You have to think about
that before declaring that duplication is good or evil. Things are never that
black or white. My experience has been that flexibility has always been more
useful to me than maintenance when doing testbench design. Flexibility means I
can deal with a change on the day of code freeze. That's more important to me
than saving a couple of hours during a more leisurely and unlikely maintenance
phase. So anytime I see duplicated code, and I feel my fingers start to itch to
"fix" it, I take a moment to think about the context. It might save some
headaches later to just leave it as it is....