Improve page number processing and functionality

Page number processing

When editing an output style, there are six different ways of formatting page numbers to choose from, all of which rely on EndNote processing the string input in the Cited Pages field. (These examples are all based on using EndNote with the Cite As You Type plugin for MS Word.)

I don’t know how exactly this processing goes, but the what happens as of EndNote X9 unfortunately seems to be rather brute force–based and inefficient.

If either of the two abbreviated styles is chosen, only simple sequences of continuous page numbers work reliably; e.g., “pp. 220–226” becomes “pp. 220–6” and “pp. 220–26”, respectively (as they should).

If the Cited Pages string contains multiple individual numbers which are  not part of a sequence of continuous page numbers, however, these get abbreviated as well; e.g., “p. 224, 263 and 304 note 308” ends up becoming “p. 224, 63 and 304 note 8” and “p. 224, 63 and 304 note 08”, respectively.

This is obviously Very Bad, since it leads to completely misleading citations with incorrect page numbers. The only workaround that I know of is to use the Suffix field instead of the Cited Pages field, which is a hack: it completely voids the whole point of using the built-in abbreviation functions and requires the user to input the page numbers exactly as they are.

Obviously, page numbers can be complex, and people can input them any which way they like; but it seems to me that these things should hold true for abbreviating them a good 99 per cent of the time:

  • A number by default should not be eligible for abbreviation
  • A number should be eligible for abbreviation only if it is immediately preceded, in reverse order (i.e., from right to left), by
    • a sequence delimiters (hyphen, en dash, etc. [see below], optionally flanked by space characters)
    • a number which can indicate an earlier number within the same hundred (= a number which satisfies the criteria already used in EndNote to determine whether a number is eligible for abbreviation)

If a number is preceded by anything that does not end in a sequence delimiter optionally flanked by space characters, it should be considered ineligible for abbreviation. In the example given above, in the string  224, 263 and 304 note 308 , the four numbers are preceded, respectively, by [nothing], [comma space], [space and space] and [space note space]; or, trimming the spaces, by [nothing], [comma], [and] and [note]. Since none of those are possible sequence delimiters, none of the four numbers should be eligible for abbreviation.

This would probably still leave some edge cases unaccounted for, but it would at least  vastly improve the usefulness of the page number functions.

Sequence delimiters

In addition to the processing, it would be nice if EndNote were able to determine how sequence delimiters appear as well, since different output styles can dictate different ways of formatting sequences.

The two main issues here would be

  • Delimiter:  the shape of the delimiter when formatted – most commonly an en dash or a hyphen, but some styles may prefer something else
  • Add spaces:  any characters to flank the delimiter – most commonly nothing or perhaps a regular space, but hair spaces or similar could also be required by some styles

When processing page numbers, EndNote should preferably look consider at least hyphens, en dashes, numeric dashes, minus dashes, etc. to be valid sequence delimiters, but whatever is input in the Delimiter field should also be included. When formatting, the value set in the Delimiter field should always be used, regardless of what is input.

For example, if the Cited Pages field contains the string  “225-287” , the Delimiter field contains an em dash, and the Add spaces field contains a hair space, then the formatted string should be  “225**  — 87”** (two two five hair-space em-dash hair-space eight seven), assuming the Page Number Format option is set to one of the abbreviated options.