TinyApps.Org
Small is beautiful


 HOME

  0. Internet
  1. Text
  2. Graphics
  3. System
  4. File
  5. Misc
  6. Palm
  7. OS X

 BLOG

 DOCS

 FAQ

 RSS (?)





Detect the character encoding of a file #
The aforementioned Perl module Unicode::Japanese includes ujguess, which attempts to detect the character encoding of a given file. The Unix program file is often suggested on forums and the like for this purpose, but it only returns the file type, not the encoding. Here's an illustration of the difference, using a Shift JIS-encoded file:
$ file foo
foo: UTF-8 Unicode text, with no line terminators

$ ujguess foo
sjis
and an EUC-encoded one:
$ file bar
bar: ISO-8859 text, with CRLF line terminators

$ ujguess bar
euc

/nix | Jan 03, 2010



Categories
/blosxom
/eink
/mac
/misc
/nix
/palm
/windows

Blosxom Archive
2012: 2 1
2011: 12 11 10 9 8 7 6 5 4 3 2 1
2010: 12 11 10 9 8 7 6 5 4 3 2 1
2009: 12 11 10 9 8 7 6 5 4 3 2 1
2008: 12 11 10 9 8 7 6 5 4 3 2 1
2007: 12 11 10 9 8 7 6 5 4 3 2 1
2006: 12 11 10 9 8 7 6 5 4 3 2 1
2005: 12 11 10

Blogger Archive
2005: 10 9 8 7 6 5 4 3 2 1
2004: 12 11 10 9 8 7 6 5 4 3 2 1
2003: 12 11 10 9 8 7 6

Ezine Archive
2004: 4 3 2 1
2003: 12 9 8 7 6 5 4 2 1
2002: 12 10 9 8 7 6 5 3 2 1
2001: 12 11 10