Page 1 of 1

iconv doesn't work as expected

Posted: Thu Mar 23, 2017 5:40 pm
by cah
After migrating to Centos 7.3.1161, I had to re-compile quite a few applications.
Apache is one of them. For some reason, Apache does not take the default character set any more (AddDefaultCharset big5). Everything is UTF-8.

Therefore, I have to find a way to convert my existing BIG-5 encoded HTML files to UTF-8 encoding format.

I tried "iconv" and it looked promising for test files. I then tried to convert all BIG-5 encoded files to UTF-8 and found iconv was unable to completely convert for me. Even worse, after converting to UTF-8, I cannot convert back to BIG-5. This is certainly not a good tool for conversion.

I wrote 2 small perl scripts (big5_2_utf8.pl & utf8_2_big5.pl) in January 2017 for this very purpose and details can be found at this BBS post Encoding conversion between BIG-5 and UTF-8.

I tried the scripts for the BIG-5 encoded files in mirror environments and they seem to look good (much better than iconv result for sure).

Just in case I need to roll back to the original BIG-5 encoded files, I had created a list of files for both mirror and prod environments (html_list & mirror_list) and tar files for them (html_list.tar.gz & mirror_list.tar.gz). They are located in /home/www/mirror/cah/archive.