grep not returning text known to be in file

Symptom

cat displays text, but grep can't find it:

% cat foo.txt
The world is overcome--aye! even here!
By such as fix their faith on Unity.
% grep fix foo.txt
%

Cause

The file is UTF-16, not UTF-8/ASCII.

For example, in UTF-16LE, fix is encoded as:

66 00 69 00 78 00

not as the contiguous ASCII bytes, which are also the UTF-8 bytes for fix:

66 69 78

So grep fix does not match. cat output may appear normal since NUL bytes are often not visibly rendered.

Detect encoding

% file foo.txt
foo.txt: Little-endian UTF-16 Unicode text

If the file has no byte-order mark (BOM), file may fail to identify it:

% file foo.txt
foo.txt: data

For a file expected to be text, data is a clue to inspect the bytes:

% xxd -g 1 -l 8 foo.txt

UTF-16LE has alternating character/NUL bytes:

54 00 68 00 65 00 20 00

UTF-16BE has the reverse pattern:

00 54 00 68 00 65 00 20

Solution

Convert to UTF-8 before grepping:

% iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix
By such as fix their faith on Unity.

Use UTF-16BE instead if the byte pattern is big-endian.

Why UTF-16LE and not plain UTF-16?

With a BOM, plain UTF-16 works everywhere: iconv reads the BOM and picks the byte order automatically.

Without a BOM, BOM-less UTF-16 input is implementation-dependent. Common GNU/Linux and macOS iconv implementations differ: little-endian on GNU iconv, big-endian on macOS. The same file can convert on one platform but fail on another:

iconv -f UTF-16 -t UTF-8 foo.txt | grep fix     # not portable for BOM-less input
iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix   # explicit byte order

For BOM-less UTF-16, use UTF-16LE or UTF-16BE, not plain UTF-16.

Handling conversion errors

If iconv stops with illegal input sequence, -c can skip invalid input:

% iconv -c -f UTF-16LE -t UTF-8 foo.txt | grep fix

Sources

❧ 2026-06-24