revised version of naco_normalize
This implements the latest version of the NACO
normalization specification found at
http://www.loc.gov/catdir/pcc/naco/SCA_PccNormalization_Final_revised.pdf
This version of the algorithm is more general -- for example,
all combining characters are removed -- so there should be
fewer fiddly edge cases to worry about for most European
languages.
Rebuilding the metabib.*_field_entry tables (e.g., by using
reingest-1.6-2.0.pl) is recommended if there are any bibs that contain
any non-ASCII characters.
Normalized text is now left in the NFKD form, so while this should
be transparent to the search system after reindexing, it does mean
that (for example) Korean text in metabib.*_field_entry may not
be in the same Unicode normalization form as that found in
biblio.record_entry.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
git-svn-id: svn://svn.open-ils.org/ILS/trunk@18864
dcc99617-32d9-48b4-a31d-
7c20da2025e4