Detect the character encoding of a file #

The aforementioned Perl module Unicode::Japanese includes ujguess, which attempts to detect the character encoding of a given file. The Unix program file is often suggested on forums and the like for this purpose, but it only returns the file type, not the encoding. Here's an illustration of the difference, using a Shift JIS-encoded file:
$ file foo
foo: UTF-8 Unicode text, with no line terminators

$ ujguess foo
sjis
and an EUC-encoded one:
$ file bar
bar: ISO-8859 text, with CRLF line terminators

$ ujguess bar
euc

/nix | Jan 03, 2010


Subscribe or visit the archives.