E-mail link to Martin Tulic, Indexer

Valid HTML 4.01!


About indexing
Samples
Résumé
Rates
Other indexers
Site map
Home > About indexing >

Alphabetization




Alphabetization is the process of ordering headings in an index.

In the past, the terms filing and sorting were generally used to denote the process. However, those terms are widely used in other fields in which they have connotations that may be inappropriate in the field of indexing. For example: in other fields, the distinction between upper and lower case letters is often important in sorting; in indexing, it never is. Therefore, to prevent potential misunderstandings, indexers generally now restrict themselves to the term alphabetization when speaking or writing about the process of ordering headings.

Also in the past, alphabetization was guided not only by the characters, numbers and symbols in the heading, but also by complicated and arcane rules based on such factors as the heading's pronunciation or meaning. Those rules were an obstacle to computerized alphabetization. Because the benefits of computerization could not be passed up, rules and standards for alphabetization had been revised by 1980. As a result, pronunciation and meaning are ignored today in alphabetization. Ignoring them reduces the apparent inconsistencies that once confused users and indexers alike.

The results of alphabetization depend on how the words in a heading are treated. For purposes of alphabetization, a word is a text string delimited by spaces or commas. A heading consisting of two or more words is called a compound heading. The words in a compound heading are separated by a space and perhaps other punctuation as well.

In word-by-word alphabetization, each word in a compound heading is alphabetized in succession and separately. If the first words in the headings are equivalent, the second words are compared. If the second are equivalent, the third are compared, and so on until the headings are distinguished from one another. In this process, the space character is assigned a value lower than that assigned to any letter, and it is sorted. The primacy of the space character explains why word-by-word alphabetization is often called the nothing before something system. In international standards, dashes, hyphens and slashes are assigned the same value as the space character and other punctuation marks, such as commas, apostrophes and single or double quotation marks, are ignored. In the Chicago Manual of Style, an important authority in the American publishing community, hyphens, slashes and apostrophes are ignored because they are considered to be characters that continue a word. As a result, when the international standards are followed, displeasure counts as one word but dis-pleasure counts as two. When the Chicago Manual of Style is followed, they both count as one word.

Different standards also exist for letter-by-letter alphabetization, which is sometimes called the all-through system because, in strict letter-by-letter alphabetization, compound headings are treated as if they were one word run together and all characters other than letters and numerals are ignored. That is a simple rule, but indexes in American books are often based on the more complicated rules appearing in the Chicago Manual of Style. In the Chicago variant, the first comma or parenthesis causes alphabetization to begin again, and serial commas and all other punctuation are ignored. These rules serve primarily to keep surnames together, causing people to be distinguished by their forenames. The two letter-by-letter systems may produce different results given the same headings, as seen in this example:

Strict
letter-by-letter
Chicago-style
letter-by-letter
Jones, Adam
Jonesboro
Jones Mountain
Jones, Nathan, and Fry
Jones, Otis A. (1896-1963)
Jones, Otis A. (1924-1989)
Jones, Otis Augustus
Jones, Adam
Jones, Otis A. (1896-1963)
Jones, Otis A. (1924-1989)
Jones, Otis Augustus
Jonesboro
Jones Mountain
Jones, Nathan, and Fry
bottom row

Similarly, word-by-word and letter-by-letter alphabetization may produce different results given the same headings, as seen in this example:

Letter-by-letter Word-by-word
soul
soulard crab
soul brother
souletin
soul food
soulful
soul kiss
soullessness
soul mate
soul music
soul sister
soul
soul brother
soul food
soul kiss
soul mate
soul music
soul sister
soulard crab
souletin
soulful
soullessness
bottom row

The word-by-word and letter-by-letter alphabetization are equally powerful in terms of their ability to alphabetize entries. Users tend to prefer headings to be arranged using word-by-word alphabetization because it keeps headings beginning with the same word or phrase together. Nevertheless, in the Chicago manual and in some other style guidelines, letter-by-letter alphabetization is the preferred system, based on the assumption that people are already familiar with it because it is the system used in most dictionaries, encyclopedias and telephone directories.

Guidelines for alphabetization:

  • House style trumps all other guidelines, including those that follow. If you have doubts about what to do, ask your editor.
  • Beware of sorting facilities provided by word processors. Many of them sort letters, numbers and symbols based on their corresponding ASCII codes. Using that method,

    "quotation marks, single or double"
    Precede upper case letters
    Which precede
    any lower case letter
    in the alphabet

  • Beware of any recommendation to alphabetize character strings, especially numbers, as if they were spelled out. The practice is a bad one because different pronunciations may be widely used. For example: would you post 1907 as if it were pronounced nineteen hundred seven, nineteen oh seven, nineteen seven, or one thousand nine hundred and seven? Would it make a difference if it represented a quantity rather than a year? Would you double-post it if different segments of the audience pronounce it differently? Would you post 123 in a book about Lawrence Welk as in

    udders, ...
    1 2 123, ...
    Uighurs, ...

    because he and his fans pronounce it as uh one uh two uh one two three?
  • Make sure you know whether headings should be ordered using word-by-word or letter-by-letter alphabetization. Sometimes the system to be used depends on the publisher; sometimes it depends on the book being published. If you are using indexing software, you may be able to postpone the decision until just before you submit the index because indexing software allows you to toggle from one system to another. If you are using the sort facilities provided by word processors, you may need several days to manually convert your index from one system to the other.
  • Unite proper names and subjects in one alphabet. The argument for an integrated index is based on the fact that most people who cannot find what they are looking for in an index never realize that there is another one to search, even if its presence is announced in an introductory note. The two indisputable exceptions to the rule favoring integrated indexes are: (1) separate author and subject indexes for magazines and other periodicals, and (1) separate indexes for first lines in a poetry anthology. In all other cases, the multiple indexes should be justified and their contents should be specified in advance. An example of a common misunderstanding between an indexer and an editor concerns questionable exceptions that are called variously a name index or an author index. Generally, name indexes are indexes that include the proper names of persons mentioned in the body of the text but do not include entries for names of authors when their name appears in in-text citations or in footnotes. Also generally, author indexes are indexes that include only the names of all authors cited in the text, including all (or some maximum number) of them listed in each citation. Sometimes, however, editors want the names of all persons mentioned or cited in the body of the text or elsewhere in the book to be included in the separate index. Sometimes they want only the first and middle initials; sometimes they want the entire first and middle names. Sometimes they allow subheadings, sometimes they don't. Sometimes they allow only locators for information specifically about the person, such as a locator in an entry for Freud that points to information about his birthplace; sometimes they also allow locators for information about subjects related to the person, such as a locator in an entry for Freud that points to information about Freudian theory. Sometimes they want names of people to appear only in the separate index; sometimes they allow them to also appear in the subject index. When they allow names in both indexes, sometimes they allow cross-references such as See also in subject index or See also in names index are allowed; sometimes they don't. The point is, to avoid problems, talk to them about these issues in advance.
  • Consider only the sequence of letters, numerals and other symbols in the heading. Ignore pronunciation and meaning. Ignore locators and cross-references.
  • Treat spaces, dashes, hyphens, and slashes as equivalent and place them first.
  • Place ampersands after spaces and their equivalents.
  • Place Arabic and Roman numerals after ampersands and in numerical order.
  • Write Roman numerals in upper case letters and arrange them with other numbers in accordance with their numerical value. For example:

    Alfonso I, ... 
    Alfonso II, ... 
    Alfonso VI, ... 
    Alfonso IX, ... 

      
    Ist Olympiad, ...
    IInd Olympiad, ...
    VIrd Olympiad, ...
    IXth Olympiad, ...

  • Place Roman alphabetic letters after numbers and in alphabetical order.
  • Use lower case letters except for proper nouns and for abbreviations, acronyms and initialisms commonly written in upper case.
  • Treat capitals and corresponding lower case letters as equivalent.
  • Treat letters with diacritics (e.g., é, è, ê, ë) as equivalent to the unmodified letter (e.g., e).
  • Ignore apostrophes and treat words in which they occurs as single words.
  • Ignore punctuation marks not listed above.
  • Use qualifiers, not capitalization, to distinguish among homographs with uppercase letters inside a lowercase sequence. For example:

    MacPherson, C. B. (poet)
    Macpherson, C. B. (political scientist)
    Qualifiers are needed because uppercase and lowercase letters have equal value during alphabetization.
  • Ask your editor how to handle initial definite or indefinite articles (the, a, an). The simplest rules - to ignore all of them regardless of the type of heading, or its opposite, to always acknowledge them, - are not widely followed. Instead, initial articles in place names are generally retained and they affect the placement of the heading. For example: The Dalles and The Hague are generally placed under T. For initial articles in other names, there are four options: (1) do not include them in the heading, (2) place them at the end of the heading, (3) retain them in the heading but ignore them in placement and during alphabetization, and (4) retain them in the heading and acknowledge them in placement and during alphabetization. For example: The Canterbury Tales may be listed in the same position under C as Canterbury Tales or Canterbury Tales, The or The Canterbury Tales, or it may be listed under T as The Canterbury Tales. Some standards insist that initial articles should always be retained and should always affect the placement of the heading; some don't. Asking editors what they want will avoid problems due to misunderstandings.
  • Ask your editor about the placement and sorting of symbols used as headings. Although symbols are often and placed before other types of entries and sorted in ASCII order in indexes in technical books and computer manuals, there are no real standards governing the placement or ordering of symbols. When individual symbols need to be used as stand-alone headings, as they often need to be in technical manuals, my preference is to qualify them using their name and to order them in relation to their names. For example:

    & (ampersand), ...
    ¢ (cent sign), ...
    e (e, the irrational number),...
    i (i, the irrational number),...
    ¬ (not sign), ...
    £ (pound sign), ...
    ¥ (yen sign), ...

    As the example indicates, this practice accommodates the rare cases in which individual letters in the alphabet are used as symbols. More importantly, it minimizes the apparently arbitrary order that results when symbols are listed without their name also being indicated, either when they are placed together at the beginning of an index or when they are placed under the initial letter in their name. Also, it can be used with cross-referencing or double-posting when there are alternative names for a symbol.

 
See also   Non-alphabetical ordering  

To top of page





Copyright © 2005 Martin Tulic. All rights reserved.