|
Managing IT:
The Return of the LOC MonsterLines-Of-Code: An ambiguous metric with the "look and feel" of numerical precision
Contents
Overview
LOC meaning Line(s) Of Code is sometimes
referred to as SLOC (Source Lines Of Code) in spite of the
obvious confusion-causing redundancy added by that practice.
LOC (or SLOC) is literally: a count of the number of lines of code comprising a particular "set" of software. The set might be the software produced by an individual or group in a certain time period, or the number of LOC in a given software development project (completed or estimated). It may also be applied as a reference point to other measures, such as productivity or defects, as in the number of defects per 100 lines of code (sometimes this is flipped over and expressed as LOC per a defect). It is also sometimes expressed as LOCs (extra s on the end) when it is used to denote more than one line of code. Finally, it can be prefixed with a decimal-based modifier, as in KLOC (thousands of lines of code), etc.
What's Wrong With LOC/SLOC?
LOC (or SLOC) is ambiguous by design. For
example, when producing a count of LOC
(or SLOC) within some given code-set:
Currently LOC (and its new-and-improved sounding synonym: SLOC) has become nothing more than a means of generating spin and hype. It generally will be used by those who want to show that they've employed a scholarly and credible sounding numerical analysis to "prove" the superiority of their group and development methods or processes. For example, one might say that
"Applying our
TAO-Extreme-XP-Pattern-Pair-Agile-Y2K-Guru development methodologies allows our
programmers to produce 1000 LOC per a day, while the industry average is only
250 LOC per day".
Nowadays, of course, they would say "SLOC" instead of "LOC", since so many regular management folk have become aware of the fact that "LOC" is a meaningless marketing buzz-measurement. LOC (SLOC) can also be found pressed into service to generate spin based on software defects (bugs), as in:
"TAO-Extreme-XP-Pattern-Pair-Agile-Y2K-Guru
development methodologies allow our programmers to produce code with only
3.2 defects per 100 LOC, while the industry standard is 3.7 defects per
100 LOC."
This has the added ambiguity embodied in the concept of a "defect". In these cases a "defect" is often related as something that can be found using a fully automated tool that scans the target code and counts the defects. What's wrong with that picture? Well, out in the real world (presumably) programmers will have access to such a tool and be required to apply it to all their code work before submissions. Sooo... in this case (i.e., the real world), the defect count produced by such a tool by any programmer employing any development approach should ALWAYS be zero.
Marketing Synergy?
Perhaps you've noticed how anything you do to enhance your advertised LOC
count, will automatically compliment your advertised defect count. This is
because blank lines and comments have zero defects.
If you, for example, have your coders write a tool that scans all your code and inserts one blank line for every three non-blank lines, you will increase your advertised LOC count by up to 25%. At the same time, you will reduce your advertised count of defects per LOC by up to 25%. I think this is what self-promoting types like to call a win-win opportunity. :-) Maybe not. ... right here in River City.
What NOT To Do
Throwing up your hands and submissively embracing a metric that will give you
wrong data is NOT a good response to having
no data. But that seems to be the motivation behind
a lot of what's being offered as LOC alternatives (with SLOC being
the most glaring example).
To demonstrate, consider one of the less obvious alternatives called "eLOC" (Effective LOC). The eLOC metric counts all lines that are not comments, blanks, or standalone brackets braces or parenthesis. Sounds like a really well thought-out attempt to address the problems with LOC counts right? Here is a small function written in C that we can apply eLOC counts to:
void salutation_line(char *ttl, char *lastname)
{
printf("Dear %s %s:",ttl,lastname);
}
Of course, this gets even worse if you just use plain old LOC (not eLOC). In that case, the sky is really the limit as to how many lines this code could take. But assuming you agreed not to count blank lines you could still squeeze 19 lines of valid C code out of it (it is four lines in its above configuration).
Useful Code-Quantity Metrics
If you can't measure something, you can't manage it. You need a
metric, but it must be reliable. The only thing worse than no metric, is
a metric that will give you wrong information because it is too easy to bias.
Logical LOC - While still not perfect, L-LOC or Logical LOC is a little better than LOC/SLOC/eLOC. It insists that only statements that end in semi-colons be counted. It also sometimes makes an exception for the semicolons embedded in the for() statement in C/C++ code. The rule is usually that the semicolons in for() loops aren't counted at all, or count as a single semicolon. Programmers who embed a lot of code in the control statements of for() loops will have much lower LOC counts than those who like to spread it out into a block, but it's rare that coders will do this. I personally have reduced a five page for() block-statement into a single for() statement with no block for example. That said, it is unusual these days to see this code style. Also, it can be restricted with coding conventions without doing too much harm to code readability. :-) Function Points - FPA, or Function Point Analysis, counts the Function Points of a given set of software. It has the obvious problem of it having to be done by humans (hopefully not the same humans who use LOC/SLOC counts with straight faces while marketing their wares). Since it can't be automated, it tends to be more expensive to use as well. Because of this it must be applied more judiciously in most situations, which isn't necessarily a bad thing. Also, because Function Point counts can't be automated they don't have absolute repeatability. That said, they are a well defined discipline when performed by people of integrity. The analogy that you could use compares FPA to the guidelines used to grade olive oil into classes such as regular, virgin, extra-virgin, etc. Of course, if you consider Function Point Analysis analogous to the process of grading olive oil, then LOC/SLOC counts might be comparable to patent medicine salesmen advertising a fictional numeric measure of "healthfulness" for each of their offerings. Real Soon Now - In the future I may document my own "Code Count" (CC). It is loosely based on a more stringent form of L-LOC, and it is completely able to be automated. It has absolutely nothing to do with "lines" though. It is currently for C and C++ code. I will eventually modify it to deal with code in other widely used languages (sigh, when time permits).
Beware of Backfiring
One other practice that you should be aware of is backfiring. It is an
attempt to document the number of LOC/SLOC per function point. This is
so that one can make a simple count of LOC, and convert them to an equivalent
number of function points.
Sounds great doesn't it? You can now do automated LOC counts of your code and do a simple table lookup to replace them with a more credible function-point count. You can do it without having to manually count function-points. And best of all: behold! Your LOC-counted code will magically possess all the real credibility of real function point counts. Unfortunately it doesn't work - It doesn't work because the LOC equivalents that backfiring is producing can have all the same patent-medicine-salesman styled ambiguities that any other LOC counts can contain (see the discussion above). Even if the function point guy who initially produced the FP-to-LOC equivalent tables did his homework and specified L-LOC as his basis, it won't work. The problem? The people who later use the tables to produce their own function point equivalents are perfectly free to base their backfire lookups on their own style of LOC counts. The LOC counts they use will almost certainly be selected to cast their marketing positions in the most advantageous light.
The Meaning of a Measure
I have been surprised by the complete lack of understanding sometimes
exhibited (by people who purport to know better) about the many subtle ways a
code counting metric could be useful. That is of course, if the metric were
actually credible to begin with.
If you actually write computer programs as your calling
you're likely to understand some of the ways a good measure of "amount
or quantity of coded program" would be useful to you.
I'm not sure it makes any sense to go over these, since they make a very good indicator right now. Suffice it to say, you can be a programmer and not understand all the subtle ways such a metric could help you. But if you think about it—even for a minute—,you will certainly be able to come up with something better than what's being forwarded by those defending LOC/sLOC counts in the trade rags today.
Permissions
This article is © Copyright, Creativyst, Inc. 2007 ALL RIGHTS
RESERVED.
Links to this article are always welcome. However, you may not copy, modify, or distribute this work or any part of it without first obtaining express written permission from Creativyst, Inc. Production and distribution of derivative products, such as displaying this content along with directly related content in a common browser view are expressly forbidden! Those wishing to obtain permission to distribute copies of this article or derivatives in any form should contact Creativyst. Permissions printed over any code, DTD, or schema files are supported as our permission statement for those constructs. |