Creativyst(r) Software (logo) - Visit Our Home Page
 Explored, Designed, Delivered.sm


Creativyst Docs 
Understanding the SoundEx Algorithm 


  D
o
c
u
m
e
n
t
s
 
Tech/How-to
º Understanding SoundEx
º CSV File Format
º Temporality
º Dell Svc Tag Algorithm
º Round Table Corners
º Menu Flowers (exercise)
º Show XML in I.E. Browsers

Conventions
º Points for Programmers
º CTXTM File Format
º Hybrid Oriented Programming
º Software Stability Ratings
º The JSMsg Convention
º Glosssary XML
º CUFTM Codes

Managing IT
º The LOC Monster
º The Un-Methodology
º Continuous Independence
º Agile Bridge Building
º Platform Neutral Computing

Random Notes
º C.S. Vs Olivier
º Seeing Through Hype



 
 
How To:

Understanding Classic SoundEx Algorithms

Search Names & Phrases Based on Phonetic Similarity (w/source code)




 
Use our SoundEx Generator Form below.
 





[top]
Overview

Terms that are often misspelled can be a problem for database designers. Names, for example, are variable length, can have strange spellings, and they are not unique. American names have a diversity of ethnic origins, which give us names pronounced the same way but spelled differently and vice versa.

Words too, can be misspelled or have multiple spellings, especially across different cultures or national sources.

To help solve this problem, we need phonetic algorithms which can find similar sounding terms and names. Just such a family of algorithms exist and have come to be called SoundEx algorithms.

A Soundex search algorithm takes a written word, such as a person's name, as input, and produces a character string that identifies a set of words that are (roughly) phonetically alike. It is very handy for searching large databases when the user has incomplete data.

The method used by Soundex is based on the six phonetic classifications of human speech sounds (bilabial, labiodental, dental, alveolar, velar, and glottal), which are themselves based on where you put your lips and tongue to make the sounds.

The algorithm itself is fairly straight forward to code and requires no backtracking or multiple passes over the input word. In fact, it is so straight forward, I will start (after a history section) by presenting it as an outline. Further on, I will give C, JavaScript, Perl, and VB code that implements the two standard algorithms used in the American Census as well as an enhanced version, which is described in this article.




[top]
History

The original Soundex algorithm was not named Soundex. Index systems similar to what we now call "Soundex" were originally developed and used for indexing American census records in the late nineteenth, and early 20th century.

An algorithm that is now considered the predessesor to Soundex began in a patent application filed by Robert C. Russell in 1917. It was simply titled "INDEX". The name "Soundex" came along later, and was first registered as a trademark in 1926. The trademarked term was originally issued to Kardex Systems, Inc.

Later, in the late 1960's through early 1980's, Soundex was used by the American Telegraph & Telephone company (AT&T) in their 411 telephone information service. In the years before speech recognition systems came into use, information inquiries were handled by human operators. Soundex proved to be the best way to access phone number information in response to person and place names that were spoken over telephones.

The original patent titled "INDEX" was issued in 1918 as patent number 1,261,167 (filed Oct. 25, 1917). Other patents by Russell involved index systems and indexing gadgets. Some were variations and improvements of his 1917 index system:

  • 1,261,167 (INDEX filed 1917, issued 1918**)
  • 1,357,653 (CARD-INDEX, filed 1919)
  • 1,478,765 (CARD-INDEX, filed 1922)
  • 1,478,672 (CARD-INDEX, issued 1923)
  • 1,548,752 (CARD-INDEX, filed 1923)
  • 1,548,753 (CARD-INDEX, filed 1923)
  • 1,601,925 (CARD-INDEX, filed 1924)

There are considerable and reliable sources on the Web (as well as the usual unreliable source) naming one Margaret O'Dell as a co-inventor, but I have not been able to find any actual evidence of this. If you have information showing Margaret O'Dell to be a co-inventor of any of these patents, please contact me. so that she can be given credit here for her contributions.

Also the patent filed in 1917 (1,261,167) is not a perfect match for present-day Soundex algorithms but is considered the first Soundex because it is very similar to the algorithms we use today. Russell did produce at least one earlier index system that assigned numbers to "key-letters" which he had filed a patent application for a couple of years earlier. None of these patents, as far as I can tell, list any inventor named O'Dell.

**Finally, Russell's patent number 1,261,167 was reissued in 1923 as # Re.15,582.






[top]
The Algorithm as an Outline

  1. Capitalize all letters in the word and drop all punctuation marks. Pad the word with rightmost blanks as needed during each procedure step.
  2. Retain the first letter of the word.
  3. Change all occurrence of the following letters to '0' (zero):
      'A', E', 'I', 'O', 'U', 'H', 'W', 'Y'.
  4. Change letters from the following sets into the digit given:
    • 1 = 'B', 'F', 'P', 'V'
    • 2 = 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z'
    • 3 = 'D','T'
    • 4 = 'L'
    • 5 = 'M','N'
    • 6 = 'R'
  5. Remove all pairs of digits which occur beside each other from the string that resulted after step (4).
  6. Remove all zeros from the string that results from step 5.0 (placed there in step 3)
  7. Pad the string that resulted from step (6) with trailing zeros and return only the first four positions, which will be of the form <uppercase letter> <digit> <digit> <digit>.


[top]
Discussion

The algorithm presented above is slightly different than the originally patented algorithm.

The original Soundex algorithm of 1918 starts to fail when the number of words in the database gets to be large. For example, the diversity of names in a large database with many foreign spellings starts to put more and more phonetically unlike names into the same code.

The slightly improved algorithm presented here differs from the original in a variety of way. For example, it differs in step three, where the vowel sounds are replaced with zeros ('0'), and in step (6.0) where those zeros are removed. The original algorithm eliminated the vowels completely in step 3.0, before duplicate adjacent digits are removed.

This improved version will produce SoundEx codes with duplicate digits in them; for example, "HERMAN" will code to "H06505" which will then reduce to "H655" in the last step. In the original version, "HERMAN" would code to "H655" which will then reduce to "H65" and finally to "H650" in the last step.

Other SoundEx algorithms exist as well, such as LEAA codes used in crime prevention databases, and Cutter Tables used by libraries to encode author names. Each has its advantages and disadvantages. For example, Cutter Tables have as an advantage, that the codes produced maintain the alphabetical order of the original material.




[top]
Enhancements

It's not hard to think of improvements that will make this already powerful algorithm even more robust. An example (at least to American pronunciation sensibilities) might include replacing many multi-letter sequences that produce unrelated sounds before performing the steps of the basic algorithm. For example, before starting the above procedure, replace:
  • DG with G
  • GH with H ...WHEN... not at start of word
  • GN with N (not 'ng')
  • KN with N
  • PH with F
  • MP with M ...WHEN... it is followed by S, Z, or T
  • PS with S ...WHEN... it starts a word
  • PF with F ...WHEN... it starts a word
  • MB with M
  • TCH with CH
  • A or I with E WHEN - starts word+followed by[AEIO]
    (Not in source code...)
The conversion enhancement for PF would not normally be needed because both letters are in the same group (group 1). However, since this conversion improvement is only for the start of the word, it must be included, since the first letter is preserved in this and classic SoundEx.

Challenge: Can you think of more ways to improve this basic SoundEx algorithm?






[top]
SoundEx and the Census

The U.S. census has been making use of SoundEx codes to index surnames since the late 1800's. Those doing census lookups must use the same method to encode surnames as the census takers did when they generated the database. That means, for starters, our clever set of enhancements can't be used.

With one exception, the 'normal' method of encoding the census soundex codes is identical to that described in the Algorithm as Outline section above. The exception is a special rule for the letters H and W.

In the method described above, if two letters from the same group are on either side of a vowel, an 'H', or a 'W', they are considered two separate letters and they are not removed by the 'remove adjacent digits' step (5).

But in 'normal' census SoundEx, the 'H' and 'W' were completely removed from the conversion when surrounding letters were in the same group, so letters from the same group were combined into a single digit when only 'H' or 'W' were between them. For example:

normal census method:

  • Ashcroft = A261
    The 'S' and 'C' are both from group 2, and the 'H' is not considered a vowel-like separator, so the two are considered adjacent 2s which are combined into a single 2.
  • Asicroft = A226
    Here, since vowels are considered separators, the two letters from group 2 are not adjacent and so do not get turned into a single 2.

Ok, that is the normal method that was used by the census to produce SoundEx codes. Now comes a little twist. Prior to the 1920 census these codes were calculated by hand and many census takers ignored the special rule for 'H', and 'W'. That means they wrote their census codes exactly as described in the Algorithm as Outline section above. In other words:

special census method:

  • Ashcroft = A226
    No special rule was applied for the psuedo-vowels H or W so 'shc' was interpretted as two separated (not adjacent) letters, hence there are two '2' digits in a row.
So the census for 1880 through 1910 included normal census SoundEx codes as well as these special codes randomly intermixed. For this reason you must search these years using SoundEx indexes made in both ways.

For our SoundEx function code, we will use a parameter called CensusOption. We will call the normal method of calculating the census SoundEx method 1. We will call the incorrect special method sometimes used in pre-1920 censuses method 2. If we don't need census compatible codes, we can instruct our function to generate codes using our enhancements by setting CensusOption to 0 (or just not including it at all in languages that permit). Here it is in a table.

CensusOption SoundEx Code Returned
0 Not census codes
Enhanced SoundEx as documented here
1 Normal census codes
Used in all censuses including 1920
2 Special census codes
Used intermittently in 1880,* 1900, 1910 censuses
*1890 census records were destroyed in a fire.

Also of importance for those searching through census records: Often census takers would leave off the prefix in names that had prefixes when producing SoundEx codes (nobody's really sure why). In this case the SoundEx for the name VanDrake (for example) might have the soundex calculated only for 'Drake'.

One more thing - Out on the web, some are saying the incorrect "special" census codes documented here are actually the correctly done census codes, while those that followed the special rule for 'H' and 'W' were the ones done in error. They are wrong. Fortunately though, you don't have to decide which side to believe (but, still, have I ever lied to you? :-) ).

Simple logic should provide enough evidence to allow you to draw your own conclusion. Consider...

  • In order to accept the argument that: 'codes that lack a special rule for 'H' and 'W' are the "correct" codes', you would have to believe that those census takers who "mistakenly" followed the special rule for 'H' and 'W' all spontaneously made up the exact same non-existent rule for 'H' and 'W'.

    In other words, with such an argument, you're not asked to believe they accidentally ignored a special rule for 'H' and 'W'. Instead, you're asked to believe they all independently created the exact same errant rule out of whole-cloth, and then followed it. That's many different individuals in different states...

If logic makes your eyes glaze over, don't worry. A very convincing appeal to authority exists: No less than the U.S. National Archives (those who maintain census data) agree with the position documented here. :-) See the Related Reading section below for a link.






[top]
SoundEx Limitations

SoundEx acts as a bridge between the fuzzy and inexact process of human vocal interaction, and the concise true/false processes at the foundation of computer communication. As such, SoundEx is an inherently unreliable interface.

For this reason, SoundEx is only usable in applications that can tolerate high false positives (when words that don't match the sound of the inquiry are returned) and high false negatives (when words that match the sound of the inquiry are NOT returned).

This limitation is true even of the best SoundEx improvement techniques available. As long as you accept and honor this limitation, SoundEx and its derivatives can be a very useful tool in helping to improve the quality and usefulness of databases.

In many instances, unreliable interfaces are used as a foundation, upon which a reliable layer may be built. Interfaces that build a reliable layer, based on context, over a SoundEx foundation may also be possible.




[top]
Permissions

This article is © Copyright, Creativyst, Inc. 2002 - 2013 ALL RIGHTS RESERVED.

Permissions printed over code, DTD, and schema files are supported as our permission statement for these constructs. Specifically, the C, JavaScript, Perl, and VB code, -and ONLY the code-, printed in this paper may be copied and modified freely under Berkeley style open licensing as described in the comments of each code example.

Links to this paper are always welcome.

However, you may not copy, modify, or distribute this article without first obtaining express written permission from Creativyst, Inc. Those wishing to obtain permission to distribute this paper or derivatives in any form should contact Creativyst.

 
 

 


[top]
Source Code

Below you will find a set of SoundEx functions that produce identical results based on identical inputs. They are currently provided in the following languages:

C Code JavaScript Perl VB

These functions are provided under Berkeley styled licensing (included above each function). If you'd like to contribute this function in your own favorite language under your own similar licensing, please let me know.

The function they perform will return SoundEx codes as produced in the census, or enhanced by techniques documented in this article. The JavaScript version of the function is identical to the one used in the example SoundEx converter form provided below.

Note on internationalization:
These functions are only for words that use characters from the first seven bits of utf-8 (traditionally called 7-bit ASCII).

Beware of SoundEx functions on the web that claim to be international. Some will claim, and truly believe, that their code is international simply because they've used the standard (internationalized) library functions to handle strings. This assumption is often simply a mistake made by a real programmer who is an El Niño (stretching a little for the pun there :-) sorry).




SoundEx in C  

 /*
  * v 1.0e  NEEDS TESTING
  * -----------------------
  *
  * The following SoundEx function is:
  * 
  *    (C) Copyright 2002 - 2013, Creativyst, Inc.
  *               ALL RIGHTS RESERVED
  * 
  * For more information go to:
  *           
  * or email:
  *           Support@Creativyst.com
  * 
  * Redistribution and use in source and binary 
  * forms, with or without modification, are 
  * permitted provided that the following conditions 
  * are met: 
  * 
  *   1. Redistributions of source code must 
  *      retain the above copyright notice, this 
  *      list of conditions and the following 
  *      disclaimer. 
  * 
  *   2. Redistributions in binary form must 
  *      reproduce the above copyright notice, 
  *      this list of conditions and the 
  *      following disclaimer in the 
  *      documentation and/or other materials 
  *      provided with the distribution. 
  * 
  *   3. All advertising materials mentioning 
  *      features or use of this software must 
  *      display the following acknowledgement: 
  *      This product includes software developed 
  *      by Creativyst, Inc. 
  * 
  *   4. The name of Creativyst, Inc. may not be 
  *      used to endorse or promote products 
  *      derived from this software without 
  *      specific prior written permission. 
  * 
  * THIS SOFTWARE IS PROVIDED BY CREATIVYST CORPORATION
  *`AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
  * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
  * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
  * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
  * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
  *
  *
  * ------------------
  * ------------------
  * FUNCTION NOTES:
  *  1. To avoid all possibility of overwrites make
  *     sure *SoundEx points to a buffer with at least
  *     11 bytes of storage.
  *
  *  2. This function is for 7/8-bit ASCII characters.
  *     Modifications are required for UTF16/32, or for
  *     anything other than the first 7-bits of utf-8.
  *
  *  3. For those embedded guys who will understand this:
  *     This is a true library-grade (i.e. re-usable) function,
  *     meaning it has no dependencies on outside functions
  *     and requires no non-standard libraries be linked in
  *     order for it to work. In this case, since it doesn't
  *     even require the standard C library, it is what C99
  *     (I think) calls a: strictly conforming freestanding
  *     function.
  *
 */
  int SoundEx(char *SoundEx,
              char *WordString,
              int   LengthOption,
              int   CensusOption)
  {
      int  InSz = 31;
      char WordStr[32];     /* one bigger than InSz */
      int  SoundExLen, WSLen, i;
      char FirstLetter, *p, *p2;

      SoundExLen = WSLen = 0;
      SoundEx[0] = 0;

      if(CensusOption) {
        LengthOption = 4;
      }

      if(LengthOption) {
          SoundExLen = LengthOption;
      }
      if(SoundExLen > 10) {
          SoundExLen = 10;
      }
      if(SoundExLen < 4) {
          SoundExLen = 4;
      }

      if(!WordString) {
          return(0);
      }

      /* Copy WordString to WordStr
       * without using funcs from other
       * libraries.
      */
      for(p = WordString,p2 = WordStr,i = 0;(*p);p++,p2++,i++) {
        if(i >= InSz) break;
        (*p2) = (*p);
      }
      (*p2) = 0;



      /* Convert WordStr to
       * upper-case, without using funcs
       * from other libraries
      */
      for(p = WordStr;(*p);p++) {
        if( (*p) >= 'a' && (*p) <= 'z' ) {
            (*p) -= 0x20;
        }
      }


      /* convert all non-alpha
       * chars to spaces
      */
      for(p = WordStr;(*p);p++) {
        if( (*p) < 'A' || (*p) > 'Z' ) {
            (*p) = ' ';
        }
      }

      /* Remove leading spaces
      */
      for(i = 0, p = p2 = WordStr;(*p);p++) {
          if(!i) {
              if( (*p) != ' ' ) {
                  (*p2) = (*p);
                  p2++;
                  i++;
              }
          }
          else {
              (*p2) = (*p);
              p2++;
          }
      }
      (*p2) = 0;

      /* Get length of WordStr
      */
      for(i = 0,p = WordStr;(*p);p++) i++;


      /* Remove trailing spaces
      */
      for(;i;i--) {
          if(WordStr[i] == ' ') {
            WordStr[i] = 0;
          }
          else {
              break;
          }
      }

      /* Get length of WordStr
      */
      for(WSLen = 0,p = WordStr;(*p);p++) WSLen++;

      if(!WSLen) {
        return(0);
      }



      /* Perform our own multi-letter
       * improvements
       *
       * underscore placeholders (_) will be
       * removed below.
      */
      if(!CensusOption) {
          if(WordStr[0] == 'P' && WordStr[1] == 'S') {
              WordStr[0] = '_';
          }

          if(WordStr[0] == 'P' && WordStr[1] == 'F') {
              WordStr[0] = '_';
          }

          /* v1.0e gh is G-sound at start of word
          */
          if(WordStr[0] == 'G' && WordStr[1] == 'H') {
              WordStr[1] = '_';
          }

          for(i = 0;i < WSLen;i++) {
              if(WordStr[i] == 'D' && WordStr[i+1] == 'G') {
                  WordStr[i] = '_';
                  i++;
                  continue;
              }
              if( WordStr[i] == 'G' && WordStr[i+1] == 'H') {
                  WordStr[i] = '_';
                  i++;
                  continue;
              }
              if(WordStr[i] == 'K' && WordStr[i+1] == 'N') {
                  WordStr[i] = '_';
                  i++;
                  continue;
              }
              if(WordStr[i] == 'G' && WordStr[i+1] == 'N') {
                  WordStr[i] = '_';
                  i++;
                  continue;
              }
              if(WordStr[i] == 'M' && WordStr[i+1] == 'B') {
                  WordStr[i+1] = '_';
                  i++;
                  continue;
              }

              if(WordStr[i] == 'P' && WordStr[i+1] == 'H') {
                  WordStr[i] = 'F';
                  WordStr[i+1] = '_';
                  i++;
                  continue;
              }
              if(WordStr[i]  ==  'T'  &&
                 WordStr[i+1] == 'C' &&
                 WordStr[i+2] == 'H'
                ) {

                  WordStr[i] = '_';
                  i++; i++;
                  continue;
              }
              if(WordStr[i] == 'M' && WordStr[i+1] == 'P'
                 && (WordStr[i+2] == 'S' ||
                     WordStr[i+2] == 'T' ||
                     WordStr[i+2] == 'Z')
                ) {
                  WordStr[i+1] = '_';
                  i++;
              }
          }
      } /* end if(!CensusOption) */



      /* squeeze out underscore characters
       * added as a byproduct of above process
       * (only needed in c styled replace)
      */
      for(p = p2 = WordStr;(*p);p++) {
        (*p2) = (*p);
        if( (*p2) != '_' ) {
            p2++;
        }
      }
      (*p2) = 0;





      /* This must be done AFTER our
       * multi-letter replacements
       * since they could change
       * the first letter
      */
      FirstLetter = WordStr[0];


      /* In case we're in CensusOption
       * 1 and the word starts with
       * an 'H' or 'W'
       *  (v1.0c djr: add test for H or W)
      */
      if(FirstLetter == 'H' || FirstLetter == 'W') {
          WordStr[0] = '-';
      }



      /* In properly done census
       * SoundEx, the H and W will
       * be squeazed out before
       * performing the test
       * for adjacent digits
       * (this differs from how
       * 'real' vowels are handled)
      */
      if(CensusOption == 1) {
          for(p = &(WordStr[1]);(*p);p++) {
             if((*p) == 'H' || (*p) == 'W') {
                 (*p) = '.';
             }
          }
      }





      /* Perform classic SoundEx
       * replacements.
      */
      for(p = WordStr;(*p);p++) {
          if( (*p) == 'A'   ||
              (*p) == 'E'   ||
              (*p) == 'I'   ||
              (*p) == 'O'   ||
              (*p) == 'U'   ||
              (*p) == 'Y'   ||
              (*p) == 'H'   ||
              (*p) == 'W'
            ){
              (*p) = '0';   /* zero */
          }
          if( (*p) == 'B'   ||
              (*p) == 'P'   ||
              (*p) == 'F'   ||
              (*p) == 'V'
            ){
              (*p) = '1';
          }
          if( (*p) == 'C'   ||
              (*p) == 'S'   ||
              (*p) == 'G'   ||
              (*p) == 'J'   ||
              (*p) == 'K'   ||
              (*p) == 'Q'   ||
              (*p) == 'X'   ||
              (*p) == 'Z'
            ){
              (*p) = '2';
          }
          if( (*p) == 'D'   ||
              (*p) == 'T'
            ){
              (*p) = '3';
          }
          if( (*p) == 'L' ) {
              (*p) = '4';
          }

          if( (*p) == 'M'   ||
              (*p) == 'N'
            ){
              (*p) = '5';
          }
          if( (*p) == 'R' ) {
              (*p) = '6';
          }
      }
      /* soundex replacement loop done  */




      /* In properly done census
       * SoundEx, the H and W will
       * be squezed out before
       * performing the test
       * for adjacent digits
       * (this differs from how
       * 'real' vowels are handled)
      */
      if(CensusOption == 1) {
          /* squeeze out dots
          */
          for(p = p2 = &WordStr[1];(*p);p++) {
            (*p2) = (*p);
            if( (*p2) != '.' ) {
                p2++;
            }
          }
          (*p2) = 0;
      }



      /* squeeze out extra equal adjacent digits
       * (don't include first letter)
       * v1.0c djr (now includes first letter)
      */
      for(p = p2 = &(WordStr[0]);(*p);p++) {
        (*p2) = (*p);
        if( (*p2) != p[1] ) {
            p2++;
        }
      }
      (*p2) = 0;



      /* squeeze out spaces and zeros
       * Leave the first letter code
       * to be replaced below.
       * (In case it made a zero)
      */
      for(p = p2 = &WordStr[1];(*p);p++) {
        (*p2) = (*p);
        if( (*p2) != ' ' && (*p2) != '0' ) {
            p2++;
        }
      }
      (*p2) = 0;



      /* Get length of WordStr
      */
      for(WSLen = 0,p = WordStr;(*p);p++) WSLen++;


      /* Right pad with zero characters
      */
      for(i = WSLen;i < SoundExLen;i++ ) {
          WordStr[i] = '0';
      }

      /* Size to taste
      */
      WordStr[SoundExLen] = 0;


      /* Replace first digit with
       * first letter.
      */
      WordStr[0] = FirstLetter;


      /* Copy WordStr to SoundEx
      */
      for(p2 = SoundEx,p = WordStr;(*p);p++,p2++) {
          (*p2) = (*p);
      }
      (*p2) = 0;
      
      return(SoundExLen);
  }


 



[top]
SoundEx in JavaScript  


 /*
  * v 1.0e  NEEDS TESTING
  * -----------------------
  *
  * The following SoundEx function is:
  * 
  *    (C) Copyright 2002 - 2013, Creativyst, Inc.
  *               ALL RIGHTS RESERVED
  * 
  * For more information go to:
  *           
  * or email:
  *           Support@Creativyst.com
  * 
  * Redistribution and use in source and binary 
  * forms, with or without modification, are 
  * permitted provided that the following conditions 
  * are met: 
  * 
  *   1. Redistributions of source code must 
  *      retain the above copyright notice, this 
  *      list of conditions and the following 
  *      disclaimer. 
  * 
  *   2. Redistributions in binary form must 
  *      reproduce the above copyright notice, 
  *      this list of conditions and the 
  *      following disclaimer in the 
  *      documentation and/or other materials 
  *      provided with the distribution. 
  * 
  *   3. All advertising materials mentioning 
  *      features or use of this software must 
  *      display the following acknowledgement: 
  *      This product includes software developed 
  *      by Creativyst, Inc. 
  * 
  *   4. The name of Creativyst, Inc. may not be 
  *      used to endorse or promote products 
  *      derived from this software without 
  *      specific prior written permission. 
  * 
  * THIS SOFTWARE IS PROVIDED BY CREATIVYST CORPORATION
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
  * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
  * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
  * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
  * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
  *
 */
  function SoundEx(WordString, LengthOption, CensusOption)
  {
      var TmpStr;
      var WordStr = "";
      var CurChar;
      var LastChar;
      var SoundExLen = 10;
      var WSLen;
      var FirstLetter;

      if(CensusOption) {
          LengthOption = 4;
      }

      if(LengthOption != undefined) {
          SoundExLen = LengthOption;
      }
      if(SoundExLen > 10) {
          SoundExLen = 10;
      }
      if(SoundExLen < 4) {
          SoundExLen = 4;
      }

      if(!WordString) {
          return("");
      }

      WordString = WordString.toUpperCase();

      /* Clean and tidy
      */
      WordStr = WordString;


      WordStr = WordStr.replace(/[^A-Z]/gi, " "); // rpl non-chars w space
      WordStr = WordStr.replace(/^\s*/g, "");     // remove leading space
      WordStr = WordStr.replace(/\s*$/g, "");     // remove trailing space



      /* Some of our own improvements
      */
      if(!CensusOption) {

          /* v1.0e: GH at begining of word has G-sound (e.g., ghost)
          */
          WordStr = WordStr.replace(/^GH/g, "G");   // Chng leadng GH to G

          WordStr = WordStr.replace(/DG/g, "G");     // Change DG to G
          WordStr = WordStr.replace(/GH/g, "H");     // Change GH to H
          WordStr = WordStr.replace(/GN/g, "N");     // Change GN to N
          WordStr = WordStr.replace(/KN/g, "N");     // Change KN to N
          WordStr = WordStr.replace(/PH/g, "F");     // Change PH to F
          WordStr =
              WordStr.replace(/MP([STZ])/g, "M$1"); // MP if fllwd by ST|Z
          WordStr = WordStr.replace(/^PS/g, "S");   // Chng leadng PS to S
          WordStr = WordStr.replace(/^PF/g, "F");   // Chng leadng PF to F
          WordStr = WordStr.replace(/MB/g, "M");    // Chng MB to M
          WordStr = WordStr.replace(/TCH/g, "CH");  // Chng TCH to CH
      }





      /* The above improvements may
       * have changed this first letter
      */
      FirstLetter = WordStr.substr(0,1);


      /* in case 1st letter is
       * an H or W and we're in
       * CensusOption = 1
      */
      if(FirstLetter == "H" || FirstLetter == "W") {
          TmpStr = WordStr.substr(1);
          WordStr = "-";
          WordStr += TmpStr;
      }


      /* In properly done census
       * SoundEx the H and W will
       * be squezed out before
       * performing the test
       * for adjacent digits
       * (this differs from how
       * 'real' vowels are handled)
      */
      if(CensusOption == 1) {
          WordStr = WordStr.replace(/[HW]/g, ".");
      }


      /* Begin Classic SoundEx
      */
      WordStr =  WordStr.replace(/[AEIOUYHW]/g, "0");
      WordStr = WordStr.replace(/[BPFV]/g, "1");
      WordStr = WordStr.replace(/[CSGJKQXZ]/g, "2");
      WordStr = WordStr.replace(/[DT]/g, "3");
      WordStr = WordStr.replace(/[L]/g, "4");
      WordStr = WordStr.replace(/[MN]/g, "5");
      WordStr = WordStr.replace(/[R]/g, "6");



      /* Properly done census:
       * squeze H and W out
       * before doing adjacent
       * digit removal.
      */
      if(CensusOption == 1) {
          WordStr = WordStr.replace(/\./g, "");
      }



      /* Remove extra equal adjacent digits
      */
      WSLen = WordStr.length;
      LastChar = "";
      TmpStr = "";
      // removed v10c djr:  TmpStr = "-";  /* rplcng skipped first char */

      for(i = 0; i < WSLen; i++) {
          CurChar = WordStr.charAt(i);
          if(CurChar == LastChar) {
              TmpStr += " ";
          }
          else {
              TmpStr += CurChar;
              LastChar = CurChar;
          }
      }
      WordStr = TmpStr;


      WordStr = WordStr.substr(1);          /* Drop first letter code   */
      WordStr = WordStr.replace(/\s/g, ""); /* remove spaces            */
      WordStr = WordStr.replace(/0/g, "");  /* remove zeros             */
      WordStr += "0000000000";              /* pad with zeros on right  */

      WordStr = FirstLetter + WordStr;      /* Add first letter of word */

      WordStr = WordStr.substr(0,SoundExLen); /* size to taste     */

      return(WordStr);
  }


 



[top]
SoundEx in Perl  

  # v 1.0e  NEEDS TESTING
  # -----------------------
  # The following SoundEx function is:
  # 
  #    (C) Copyright 2002 - 2013, Creativyst, Inc.
  #               ALL RIGHTS RESERVED
  # 
  # For more information go to:
  #           
  # or email:
  #           Support@Creativyst.com
  # 
  # Redistribution and use in source and binary 
  # forms, with or without modification, are 
  # permitted provided that the following conditions 
  # are met: 
  # 
  #   1. Redistributions of source code must 
  #      retain the above copyright notice, this 
  #      list of conditions and the following 
  #      disclaimer. 
  # 
  #   2. Redistributions in binary form must 
  #      reproduce the above copyright notice, 
  #      this list of conditions and the 
  #      following disclaimer in the 
  #      documentation and/or other materials 
  #      provided with the distribution. 
  # 
  #   3. All advertising materials mentioning 
  #      features or use of this software must 
  #      display the following acknowledgement: 
  #      This product includes software developed 
  #      by Creativyst, Inc. 
  # 
  #   4. The name of Creativyst, Inc. may not be 
  #      used to endorse or promote products 
  #      derived from this software without 
  #      specific prior written permission. 
  # 
  # THIS SOFTWARE IS PROVIDED BY CREATIVYST CORPORATION
  # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
  # INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  # PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
  # THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 
  # INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
  # DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
  # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
  # OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  # WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
  # WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
  #
  #
  sub SoundEx
  {
      my($WordString, $LengthOption, $CensusOption) = @_;
      my($WordStr, $CurChar, $LastChar, $SoundExLen);
      my($WSLen, $FirstLetter, $TmpStr);

      if($CensusOption) {
          $LengthOption = 4;
      }

      if($LengthOption) {
          $SoundExLen = $LengthOption;
      }
      if($SoundExLen > 10) {
          $SoundExLen = 10;
      }
      if($SoundExLen < 4) {
          $SoundExLen = 4;
      }

      if(!$WordString) {
          return("");
      }

      $WordString = uc($WordString);
      # Clean and tidy
      #
      $WordStr = $WordString;
      $WordStr =~ s/[^A-Z]/ /sig;   # replace non-chars with space
      $WordStr =~ s/^\s*//sg;       # remove leading space
      $WordStr =~ s/\s*$//sg;       # remove trailing space


      # Some of our own improvements
      #
      if(!$CensusOption) {
          # v1.0e: GH at start of word is G-sound (e.g., ghost)
          #
          $WordStr =~ s/^GH/G/sg;         # Change leading GS to G

          $WordStr =~ s/DG/G/sg;          # Change DG to G
          $WordStr =~ s/GH/H/sg;          # Change GH to H
          $WordStr =~ s/KN/N/sg;          # Change KN to N
          $WordStr =~ s/GN/N/sg;          # Change GN to N
          $WordStr =~ s/MB/M/sg;          # Change MB to M
          $WordStr =~ s/PH/F/sg;          # Change PH to F
          $WordStr =~ s/TCH/CH/sg;        # Change TCH to CH
          $WordStr =~ s/MP([STZ])/M$1/sg; # MP if follwd by S|T|Z
          $WordStr =~ s/^PS/S/sg;         # Change leading PS to S
          $WordStr =~ s/^PF/F/sg;         # Change leading PF to F
      }


      # Done here because the
      # above improvements could
      # change this first letter
      #
      $FirstLetter = substr($WordStr,0,1);


      # in case 1st letter is
      # an H or W and we're in
      # CensusOption = 1
      #  (add test for 'H'/'W' v1.0c djr)
      #
      if($FirstLetter eq "H" || $FirstLetter eq "W") {
          $TmpStr = substr($WordStr,1);
          $WordStr = "-$TmpStr";
      }



      # In properly done census
      # SoundEx: the H and W will
      # be squeezed out before
      # performing the test for
      # adjacent digits
      # (this differs from how
      # the 'real' vowels are
      # handled)
      #
      if($CensusOption == 1) {
          $WordStr =~ s/[HW]/\./sg;
      }


      # Begin Classic SoundEx
      #
      $WordStr =~ s/[AEIOUYHW]/0/sg;
      $WordStr =~ s/[BPFV]/1/sg;
      $WordStr =~ s/[CSGJKQXZ]/2/sg;
      $WordStr =~ s/[DT]/3/sg;
      $WordStr =~ s/L/4/sg;
      $WordStr =~ s/[MN]/5/sg;
      $WordStr =~ s/R/6/sg;


      # Properly done census:
      # squeeze H and W out
      # before doing adjacent
      # digit removal.
      #
      if($CensusOption == 1) {
          $WordStr =~ s/\.//sg;
      }



      # Remove extra equal adjacent digits
      #
      $WSLen = length($WordStr);
      $LastChar = "";
      # v1.0c rmv: $TmpStr = "-";    # rplc skipped 1st char
      $TmpStr = "";
      for($i = 0; $i < $WSLen;$i++) { # v1.0c now org-0
          $CurChar = substr($WordStr,$i,1);
          if($CurChar eq $LastChar) {
              $TmpStr .= " ";
          }
          else {
              $TmpStr .= $CurChar;
              $LastChar = $CurChar;
          }
      }
      $WordStr = $TmpStr;


      $WordStr = substr($WordStr,1);      # Drop first ltr code
      $WordStr =~ s/\s//sg;               # remove spaces
      $WordStr =~ s/0//sg;                # remove zeros
      $WordStr .= "0000000000";           # pad w/0s on rght

      $WordStr = "$FirstLetter$WordStr";  # Add 1st ltr of wrd

      $WordStr = substr($WordStr,0,$SoundExLen);  # size to taste

      return($WordStr);
  }

 



[top]
SoundEx in VB  

'
' v 1.0d  NEEDS TESTING
' -----------------------
'
' The following SoundEx function is:
'
'    (C) Copyright 2002 - 2013, Creativyst, Inc.
'               ALL RIGHTS RESERVED
'
' For more information go to:
'           
' or email:
'           Support@Creativyst.com
'
' Redistribution and use in source and binary
' forms, with or without modification, are
' permitted provided that the following conditions
' are met:
'
'   1. Redistributions of source code must
'      retain the above copyright notice, this
'      list of conditions and the following
'      disclaimer.
'
'   2. Redistributions in binary form must
'      reproduce the above copyright notice,
'      this list of conditions and the
'      following disclaimer in the
'      documentation and/or other materials
'      provided with the distribution.
'
'   3. All advertising materials mentioning
'      features or use of this software must
'      display the following acknowledgement:
'      This product includes software developed
'      by Creativyst, Inc.
'
'   4. The name of Creativyst, Inc. may not be
'      used to endorse or promote products
'      derived from this software without
'      specific prior written permission.
'
' THIS SOFTWARE IS PROVIDED BY CREATIVYST CORPORATION
' ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
' INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
' WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
' PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
' THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
' INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
' DAMAGES (INCLUDING, BUT NOT LIMITED TO,
' PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
' OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
' HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
' WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
' (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
' WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
' ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'
'
'
Function SoundEx( _
                 ByVal WordString As String, _
                 ByVal LengthOption As Integer, _
                 ByVal CensusOption As Integer _
                 ) As String

Dim WordStr As String
Dim b, b2, b3, SoundExLen, FirstLetter As String
Dim i As Integer


' Sanity
'
If (CensusOption > 0) Then
    LengthOption = 4
End If
If (LengthOption > 0) Then
    SoundExLen = LengthOption
End If
If (SoundExLen > 10) Then
    SoundExLen = 10
End If
If (SoundExLen < 4) Then
    SoundExLen = 4
End If
If (Len(WordString) < 1) Then
    Exit Function
End If
    
    
' Copy to WordStr
' and UpperCase
'
WordStr = UCase(WordString)


' Convert all non-alpha
' chars to spaces. (thanks John)
'
For i = 1 To Len(WordStr)
    b = Mid(WordStr, i, 1)
    If (Not (b Like "[A-Z]")) Then
        WordStr = Replace(WordStr, b, " ")
    End If
Next i


' Remove leading and
' trailing spaces
'
WordStr = Trim(WordStr)

' sanity
'
If (Len(WordStr) < 1) Then
    Exit Function
End If


' Perform our own multi-letter
' improvements
'
' double letters will be effectively
' removed in a later step.
'
If (CensusOption < 1) Then
    b = Mid(WordStr, 1, 1)
    b2 = Mid(WordStr, 2, 1)
    If (b = "P" And b2 = "S") Then
        WordStr = Replace(WordStr, "PS", "S", 1, 1)
    End If
    If (b = "P" And b2 = "F") Then
        WordStr = Replace(WordStr, "PF", "F", 1, 1)
    End If

    ' v1.0e if gh at start of word G-sound (e.g., ghost)
    '
    If (b = "G" And b2 = "H") Then
        WordStr = Replace(WordStr, "GH", "G", 1, 1)
    End If

    WordStr = Replace(WordStr, "DG", "_G")
    WordStr = Replace(WordStr, "GH", "_H")
    WordStr = Replace(WordStr, "KN", "_N")
    WordStr = Replace(WordStr, "GN", "_N")
    WordStr = Replace(WordStr, "MB", "M_")
    WordStr = Replace(WordStr, "PH", "F_")
    WordStr = Replace(WordStr, "TCH", "_CH")
    WordStr = Replace(WordStr, "MPS", "M_S")
    WordStr = Replace(WordStr, "MPT", "M_T")
    WordStr = Replace(WordStr, "MPZ", "M_Z")
    
End If
' end if(Not CensusOption)
'

' Sqeeze out the extra _ letters
' from above (not strictly needed
' in VB but used in C code)
'
WordStr = Replace(WordStr, "_", "")



' This must be done AFTER our
' multi-letter replacements
' since they could change
' the first letter
'
FirstLetter = Mid(WordStr, 1, 1)

' in case first letter is
' an h, a w ...
' we'll change it to something
' that doesn't match anything
'
If (FirstLetter = "H" Or FirstLetter = "W") Then
    b = Mid(WordStr, 2)
    WordStr = "-" + b
End If


' In properly done census
' SoundEx, the H and W will
' be squezed out before
' performing the test
' for adjacent digits
' (this differs from how
' 'real' vowels are handled)
'
If (CensusOption = 1) Then
    WordStr = Replace(WordStr, "H", ".")
    WordStr = Replace(WordStr, "W", ".")
End If
    
    
' Perform classic SoundEx
' replacements
' Here, we use ';' instead of zero '0'
' because of MS strangeness with leading
' zeros in some applications.
'
WordStr = Replace(WordStr, "A", ";")
WordStr = Replace(WordStr, "E", ";")
WordStr = Replace(WordStr, "I", ";")
WordStr = Replace(WordStr, "O", ";")
WordStr = Replace(WordStr, "U", ";")
WordStr = Replace(WordStr, "Y", ";")
WordStr = Replace(WordStr, "H", ";")
WordStr = Replace(WordStr, "W", ";")

WordStr = Replace(WordStr, "B", "1")
WordStr = Replace(WordStr, "P", "1")
WordStr = Replace(WordStr, "F", "1")
WordStr = Replace(WordStr, "V", "1")

WordStr = Replace(WordStr, "C", "2")
WordStr = Replace(WordStr, "S", "2")
WordStr = Replace(WordStr, "G", "2")
WordStr = Replace(WordStr, "J", "2")
WordStr = Replace(WordStr, "K", "2")
WordStr = Replace(WordStr, "Q", "2")
WordStr = Replace(WordStr, "X", "2")
WordStr = Replace(WordStr, "Z", "2")

WordStr = Replace(WordStr, "D", "3")
WordStr = Replace(WordStr, "T", "3")

WordStr = Replace(WordStr, "L", "4")

WordStr = Replace(WordStr, "M", "5")
WordStr = Replace(WordStr, "N", "5")

WordStr = Replace(WordStr, "R", "6")
'
' End Clasic SoundEx replacements
'



' In properly done census
' SoundEx, the H and W will
' be squezed out before
' performing the test
' for adjacent digits
' (this differs from how
' 'real' vowels are handled)
'
If (CensusOption = 1) Then
    WordStr = Replace(WordStr, ".", "")
End If
    


' squeeze out extra equal adjacent digits
' (don't include first letter)
'
b = ""
b2 = ""
' remove from v1.0c djr: b3 = Mid(WordStr, 1, 1)
b3 = ""
For i = 1 To Len(WordStr) ' i=1 (not 2) in v1.0c
    b = Mid(WordStr, i, 1)
    b2 = Mid(WordStr, (i + 1), 1)
    If (Not (b = b2)) Then
        b3 = b3 + b
    End If
Next i

WordStr = b3
If (Len(WordStr) < 1) Then
    Exit Function
End If



' squeeze out spaces and zeros (;)
' Leave the first letter code
' to be replaced below.
' (In case it made a zero)
'
WordStr = Replace(WordStr, " ", "")
b = Mid(WordStr, 1, 1)
WordStr = Replace(WordStr, ";", "")
If (b = ";") Then ' only if it got removed above
    WordStr = b + WordStr
End If



' Right pad with zero characters
'
b = String(SoundExLen, "0")
WordStr = WordStr + b


' Replace first digit with
' first letter
'
WordStr = Mid(WordStr, 2)
WordStr = FirstLetter + WordStr

' Size to taste
'
WordStr = Mid(WordStr, 1, SoundExLen)


' Copy WordStr to SoundEx
'
SoundEx = WordStr

End Function


 



[top]
Related Reading

The following links provide resources for those who'd like to learn more about the SoundEx algorithm, it's limitations, enhancements, and uses.



[top]
SoundEx Converter Form

The following form uses the JavaScript SoundEx function presented above to produce a SoundEx code for any word you enter into the field labeled "Input Word:". It returns the SoundEx codes for three different algorithms in the fields below. You may also enter a size for the output code within limits (4 to 10 characters).

 
Creativyst® SoundExR
   +Add to your site     Tear-away icon Pop-up
Input Word:  
Size:  

 
0: Enhanced SoundEx:
As documented in this article about
SoundEx

1: Normal Census
Properly calculated SoundEx codes
found in all census years.

2: Special Census
Improperly calculated SoundEx codes
found in SOME of the censuses performed
in 1880, 1900, and 1910.

 

To test differences in the enhanced SoundEx option, use words and names like knightridder, psychology, and Pflanders. The word Knight will demonstrate two different enhancements as you type it. To see the differences between the two census options, try Ashcroft.




[top]
Contribute

If you've written a SoundEx function in another language and would like to share it as part of this article, please contact me. If used, your code will be attributed to you with a link to your site.
















 
© Copyright 2002 - 2013 Creativyst, Inc.
ALL RIGHTS RESERVED

Written by Dominic John Repici

Note: Use of this information as the basis for your own article without citing it is dishonest and constitutes plagiarism. If you base any part of your own article on information from this article PLEASE CITE IT!


Thanks for your help & contributions:
      John Nye (Tool guy) for finding bugs in Perl code
      Dmitry S. Denisov for bug-finding (thanks!)
      Mark Westenberger (mjw -at: igs.net) for finding bugs in C code



 tests:

    v1.0e:
      Gh bug (non-census extended code): If Gh begins word it should
      not be replaced with 'H' only.

        I first noticed this ghastly little ghoul while visiting a ghetto
        in Ghana. I was sitting down to a meal, munching on some Gherkins,
        when I was approached by a Ghazi. He pulled out his sword and I was
        terrified (for you see, Jesus is my Lord and Savior). Imagine my
        relief when he handed the sword to his Ghillie. They had just
        caught some fish and wanted to share them and my gherkins for
        lunch.
        
        Once his ghillie had cleaned the fish and returned from the ghat
        we all sat down to a meal. The Ghazi told me he has been questioning
        his faith. I replied that we all go through times of questioning our
        faith and that such times only help to clarify that faith, like a ghee.
        In so doing our faith becomes stronger, I continued. Such times of
        questioning serve —not only— to help prove our own faith through
        clarity, but they also improve the institutions who claim to be the
        keepers of our faith. For you see, while they exercise authority over
        us, they also need us in order to have relevance. This dynamic is
        comparable to a subtle, and ongoing (if tacit) ghareo.

        I'm not sure if it was because I helped him find his faith, or because
        he simply disagreed, but at that moment he nodded to his ghillie, who
        swatted at me with the sword. Perhaps they just thought I was eating
        more than my share of the fish...
        
        AFTER-WORD: I got away with just a scratch (a director's "cut" is NOT
        available). Now you know why I'm a programmer and not a ghostwriter.
        dammit jim.

    v1.0d:
      Include test-strings with non-alpha characters.

    v1.0c:
      Schmit      S530 (not S253)
      Schneider   S536 (not S253)
      Lloyd       L300 (not L430)
      Pfister     P236

    v1.0b:
      Holden      (first letter is 'H')
      Write       (first letter is 'W' followed by #6 letter)
      White       (oooh, H/W together in 1st letter)
      Htacky      H320
      Atacky      A320
      ashcroft    (to test normal H squeezes)