old-git.evergreen-ils.org Git - evergreen/bjwebb.git/commit

author	gmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
	Mon, 29 Nov 2010 21:44:34 +0000 (21:44 +0000)
committer	gmc <gmc@dcc99617-32d9-48b4-a31d-7c20da2025e4>
	Mon, 29 Nov 2010 21:44:34 +0000 (21:44 +0000)
commit	0d883cd743c1fb7f76f172796f5bb8e31be01c2c
tree	10789a44cbf992f0c5c8dfea992412810758828f	tree
parent	72220fc0e5cb7f9a5d19fbbac6dc53a882d1ae4f	commit \| diff

revised version of naco_normalize

This implements the latest version of the NACO
normalization specification found at

http://www.loc.gov/catdir/pcc/naco/SCA_PccNormalization_Final_revised.pdf

This version of the algorithm is more general -- for example,
all combining characters are removed -- so there should be
fewer fiddly edge cases to worry about for most European
languages.

Rebuilding the metabib.*_field_entry tables (e.g., by using
reingest-1.6-2.0.pl) is recommended if there are any bibs that contain
any non-ASCII characters.

Normalized text is now left in the NFKD form, so while this should
be transparent to the search system after reindexing, it does mean
that (for example) Korean text in metabib.*_field_entry may not
be in the same Unicode normalization form as that found in
biblio.record_entry.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
git-svn-id: svn://svn.open-ils.org/ILS/trunk@18864 dcc99617-32d9-48b4-a31d-7c20da2025e4

Open-ILS/src/sql/Pg/002.schema.config.sql		diff \| blob \| history
Open-ILS/src/sql/Pg/020.schema.functions.sql		diff \| blob \| history
Open-ILS/src/sql/Pg/upgrade/0467.schema.updated_naco_normalize.sql	[new file with mode: 0644]	blob