Managing IT:

The Return of the LOC Monster

Lines-Of-Code: An ambiguous metric with the "look and feel" of numerical precision







Overview

LOC meaning Line(s) Of Code is sometimes referred to as SLOC (Source Lines Of Code) in spite of the obvious confusion-causing redundancy added by that practice.

LOC (or SLOC) is literally: a count of the number of lines of code comprising a particular "set" of software.

The set might be the software produced by an individual or group in a certain time period, or the number of LOC in a given software development project (completed or estimated). It may also be applied as a reference point to other measures, such as productivity or defects, as in the number of defects per 100 lines of code (sometimes this is flipped over and expressed as LOC per a defect).

It is also sometimes expressed as LOCs (extra s on the end) when it is used to denote more than one line of code. Finally, it can be prefixed with a decimal-based modifier, as in KLOC (thousands of lines of code), etc.






What's Wrong With LOC/SLOC?

LOC (or SLOC) is ambiguous by design. For example, when producing a count of LOC (or SLOC) within some given code-set:

  • Should we count comments in the code?
    Some do, some say no.

  • How about blank lines, should they be included in our LOC (or SLOC) count?
    Again, some do, some don't. But the ambiguity and confusion surrounding blank lines doesn't stop there. Some will count blank lines as long as they don't exceed 25% of the total lines of source code for which we are producing a LOC count.

  • What if there are more than 25% blank lines?
    Some may chose to count them, others will stop after the count of blank lines reaches 25%.

  • If they chose to count only up to the first 25%, how do you know what portion of the count is blank lines?
    You don't.

  • What if we insist that only statements be counted?
    Are block statements one statement or many? Line with just a curly bracket? What about compound expressions, and multiple expressions embedded in loop and branch constructs? Function calls to our own routines kept in libraries?

Currently LOC (and its new-and-improved sounding synonym: SLOC) has become nothing more than a means of generating spin and hype. It generally will be used by those who want to show that they've employed a scholarly and credible sounding numerical analysis to "prove" the superiority of their group and development methods or processes. For example, one might say that

"Applying our TAO-Extreme-XP-Pattern-Pair-Agile-Y2K-Guru development methodologies allows our programmers to produce 1000 LOC per a day, while the industry average is only 250 LOC per day".

Nowadays, of course, they would say "SLOC" instead of "LOC", since so many regular management folk have become aware of the fact that "LOC" is a meaningless marketing buzz-measurement.

LOC (SLOC) can also be found pressed into service to generate spin based on software defects (bugs), as in:

"TAO-Extreme-XP-Pattern-Pair-Agile-Y2K-Guru development methodologies allow our programmers to produce code with only 3.2 defects per 100 LOC, while the industry standard is 3.7 defects per 100 LOC."

This has the added ambiguity embodied in the concept of a "defect". In these cases a "defect" is often related as something that can be found using a fully automated tool that scans the target code and counts the defects.

What's wrong with that picture? Well, out in the real world (presumably) programmers will have access to such a tool and be required to apply it to all their code work before submissions. Sooo... in this case (i.e., the real world), the defect count produced by such a tool by any programmer employing any development approach should ALWAYS be zero.





Marketing Synergy?

Perhaps you've noticed how anything you do to enhance your advertised LOC count, will automatically compliment your advertised defect count. This is because blank lines and comments have zero defects.

If you, for example, have your coders write a tool that scans all your code and inserts one blank line for every three non-blank lines, you will increase your advertised LOC count by up to 25%. At the same time, you will reduce your advertised count of defects per LOC by up to 25%. I think this is what self-promoting types like to call a win-win opportunity. :-) Maybe not. ... right here in River City.





What NOT To Do

Throwing up your hands and submissively embracing a metric that will give you wrong data is NOT a good response to having no data. But that seems to be the motivation behind a lot of what's being offered as LOC alternatives (with SLOC being the most glaring example).

To demonstrate, consider one of the less obvious alternatives called "eLOC" (Effective LOC). The eLOC metric counts all lines that are not comments, blanks, or standalone brackets braces or parenthesis. Sounds like a really well thought-out attempt to address the problems with LOC counts right?

Here is a small function written in C that we can apply eLOC counts to:

    void salutation_line(char *ttl, char *lastname)
    {
        printf("Dear %s %s:",ttl,lastname);
    }
Depending on how you format it, this simple little function can have anywhere from 1 to 10 eLOCs (2 in its above configuration). Not exactly a meaningful measure.

Of course, this gets even worse if you just use plain old LOC (not eLOC). In that case, the sky is really the limit as to how many lines this code could take. But assuming you agreed not to count blank lines you could still squeeze 19 lines of valid C code out of it (it is four lines in its above configuration).





Useful Code-Quantity Metrics

If you can't measure something, you can't manage it. You need a metric, but it must be reliable. The only thing worse than no metric, is a metric that will give you wrong information because it is too easy to bias.

Logical LOC - While still not perfect, L-LOC or Logical LOC is a little better than LOC/SLOC/eLOC. It insists that only statements that end in semi-colons be counted. It also sometimes makes an exception for the semicolons embedded in the for() statement in C/C++ code. The rule is usually that the semicolons in for() loops aren't counted at all, or count as a single semicolon.

Programmers who embed a lot of code in the control statements of for() loops will have much lower LOC counts than those who like to spread it out into a block, but it's rare that coders will do this. I personally have reduced a five page for() block-statement into a single for() statement with no block for example. That said, it is unusual these days to see this code style. Also, it can be restricted with coding conventions without doing too much harm to code readability. :-)

Function Points - FPA, or Function Point Analysis, counts the Function Points of a given set of software. It has the obvious problem of it having to be done by humans (hopefully not the same humans who use LOC/SLOC counts with straight faces while marketing their wares). Since it can't be automated, it tends to be more expensive to use as well. Because of this it must be applied more judiciously in most situations, which isn't necessarily a bad thing.

Also, because Function Point counts can't be automated they don't have absolute repeatability. That said, they are a well defined discipline when performed by people of integrity. The analogy that you could use compares FPA to the guidelines used to grade olive oil into classes such as regular, virgin, extra-virgin, etc.

Of course, if you consider Function Point Analysis analogous to the process of grading olive oil, then LOC/SLOC counts might be comparable to patent medicine salesmen advertising a fictional numeric measure of "healthfulness" for each of their offerings.

Real Soon Now - In the future I may document my own "Code Count" (CC). It is loosely based on a more stringent form of L-LOC, and it is completely able to be automated. It has absolutely nothing to do with "lines" though. It is currently for C and C++ code. I will eventually modify it to deal with code in other widely used languages (sigh, when time permits).





Beware of Backfiring

One other practice that you should be aware of is backfiring. It is an attempt to document the number of LOC/SLOC per function point. This is so that one can make a simple count of LOC, and convert them to an equivalent number of function points.

Sounds great doesn't it? You can now do automated LOC counts of your code and do a simple table lookup to replace them with a more credible function-point count. You can do it without having to manually count function-points. And best of all: behold! Your LOC-counted code will magically possess all the real credibility of real function point counts.

Unfortunately it doesn't work - It doesn't work because the LOC equivalents that backfiring is producing can have all the same patent-medicine-salesman styled ambiguities that any other LOC counts can contain (see the discussion above).

Even if the function point guy who initially produced the FP-to-LOC equivalent tables did his homework and specified L-LOC as his basis, it won't work. The problem? The people who later use the tables to produce their own function point equivalents are perfectly free to base their backfire lookups on their own style of LOC counts. The LOC counts they use will almost certainly be selected to cast their marketing positions in the most advantageous light.





The Meaning of a Measure

I have been surprised by the complete lack of understanding sometimes exhibited (by people who purport to know better) about the many subtle ways a code counting metric could be useful. That is of course, if the metric were actually credible to begin with. If you actually write computer programs as your calling you're likely to understand some of the ways a good measure of "amount or quantity of coded program" would be useful to you.

I'm not sure it makes any sense to go over these, since they make a very good indicator right now. Suffice it to say, you can be a programmer and not understand all the subtle ways such a metric could help you. But if you think about it—even for a minute—,you will certainly be able to come up with something better than what's being forwarded by those defending LOC/sLOC counts in the trade rags today.





Permissions

This article is © Copyright, Creativyst, Inc. 2007 ALL RIGHTS RESERVED.

Links to this article are always welcome.

However, you may not copy, modify, or distribute this work or any part of it without first obtaining express written permission from Creativyst, Inc. Production and distribution of derivative products, such as displaying this content along with directly related content in a common browser view are expressly forbidden!

Those wishing to obtain permission to distribute copies of this article or derivatives in any form should contact Creativyst.

Permissions printed over any code, DTD, or schema files are supported as our permission statement for those constructs.











 
© Copyright 2007 - 2008, Creativyst, Inc.
ALL RIGHTS RESERVED

Written by: Dominic John Repici


v1.0b