Fix Unicode mangling in clean_marc function
authorDan Scott <dscott@laurentian.ca>
Sun, 4 Mar 2012 07:41:11 +0000 (02:41 -0500)
committerDan Scott <dscott@laurentian.ca>
Sun, 4 Mar 2012 07:41:11 +0000 (02:41 -0500)
commitd258b7847591f9344765909d1e737d59cb5686cf
treeb24d8c912753735379c931733e2d6ace5d0fb964
parentcc227d5040e64d7254149881353d5398ad84b662
Fix Unicode mangling in clean_marc function

Calling s/\p{Cc}//go; before entityize() was resulting in all xFFFD
entities being returned for the upper case diacritic characters, which
in turn caused the new unit test to fail (yay unit tests). I added a
corresponding unit tese for entityize() to ensure that the problem
wasn't coming from that function. Switching the order in which the p{Cc}
regex and entityize() calls resolved the corruption in the unit test.

This suggests that Vandelay may be introducing significant corruption to
imported records and that backporting of this commit to the inline
Vandelay variants from previous releases may be warranted.

Signed-off-by: Dan Scott <dscott@laurentian.ca>
Open-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm
Open-ILS/src/perlmods/t/01-OpenILS-Application.t
Open-ILS/src/perlmods/t/14-OpenILS-Utils.t