Avoid data loss by setting MARC::Charset->assume_unicode(1)
When using MARC::File::XML, MARC::Charset is used to perform character
conversions; however, MARC::File::XML does not tell MARC::Charset that it is
handling Unicode data. If we do not tell MARC::Charset that it is handling
Unicode data, it can return an error which results in the loss of data
(typically a subfield containing one or more characters which MARC::Charset
does not have an equivalent mapping outside of Unicode).
This problem could be reproduced in authority_control_fields.pl with a
subfield like "von Hans-Christian Müơller" - when this subfield was encountered
without assume_unicode(1), a null string was returned for that subfield, and
if the record was written back to the database due to an authority match being
found in a different field, the only recourse was to restore the record from
auditor.biblio_record_entry_history. The same sort of problems could occur
for any other script or function that modifies the data being handed to it
using MARC::File::XML and BinaryEncoding => UTF8.
Signed-off-by: Dan Scott <dscott@laurentian.ca>
git-svn-id: svn://svn.open-ils.org/ILS/trunk@20385
dcc99617-32d9-48b4-a31d-
7c20da2025e4