Page 1 of 1

Traditional Chinese (Big5) and Simplified Chinese (GB2312) converter

Posted: Fri Jul 26, 2013 7:23 pm
by cah
I remember there are scripts from openwebmail that can perform the conversion.

I checked and found the following directory:

Code: Select all

ls -l /export/home/www/cgi-bin/openwebmail/misc/tools/big5_gb2312
total 6
-rw-r--r--   1 www      www          644 Aug 16  2004 b2g.pl
-rw-r--r--   1 www      www          634 Aug 16  2004 g2b.pl
Take a look at the g2b.pl:

Code: Select all

#!/usr/bin/perl
#
# script to convert chinese gb2312 to big5
#
my (%config, $content);
$config{'ow_mapsdir'}="/usr/local/www/cgi-bin/openwebmail/etc/maps";
$config{'dbm_ext'}=".db";
$config{'dbmopen_ext'}="";

while (<>) {
   $content.=$_;
}
print g2b($content);

# generic routines ##################################################################
sub g2b {
   my $str = $_[0];

   if ( -f "$config{'ow_mapsdir'}/g2b$config{'dbm_ext'}") {
      my %G2B;
      dbmopen(%G2B, "$config{'ow_mapsdir'}/g2b$config{'dbmopen_ext'}", undef);
      $str =~ s/([\xA1-\xF9][\xA1-\xFE])/$G2B{$1}/eg;
      dbmclose(%G2B);
   }
   return $str;
}
Looked at the script, I found there are 2 issues:
  1. Path should be /export/home/www/cgi-bin/openwebmail/etc/maps rather than /usr/local/www/cgi-bin/openwebmail/etc/maps for $config{'ow_mapsdir'}
  2. Extension should be ".pag" rather than ".db" for $config{'dbm_ext'}
After some tweak, I was able to make it work.

Code: Select all

#!/usr/bin/perl
#
# script to convert chinese gb2312 to big5
#
my (%config, $content);
$config{'ow_mapsdir'}="/export/home/www/cgi-bin/openwebmail/etc/maps";
$config{'dbm_ext'}=".pag";
$config{'dbmopen_ext'}="";

@files = glob("<PATH>/*.html");
foreach my $file (@files) {
  $i++;
  $filename = sprintf("%03s.html", $i);
  open(GB, "$file");
  open(BIG5, "><PATH>/$filename");
  while (<GB>) {
    $content.=$_;
    print BIG5 g2b($content);
  }
  close BIG5;
  close GB;
  $content="";
}

# generic routines ##################################################################
sub g2b {
   my $str = $_[0];

   if ( -f "$config{'ow_mapsdir'}/g2b$config{'dbm_ext'}") {
      my %G2B;
      dbmopen(%G2B, "$config{'ow_mapsdir'}/g2b$config{'dbmopen_ext'}", undef);
      $str =~ s/([\xA1-\xF9][\xA1-\xFE])/$G2B{$1}/eg;
      dbmclose(%G2B);
   }
   return $str;
}
After conversion, all files encoded with GB code were converted to Big5!!