grep not returning text known to be in file

Symptom

cat displays text, but grep can't find it:

% cat foo.txt
The world is overcome--aye! even here!
By such as fix their faith on Unity.
% grep fix foo.txt
%

Cause

The file is UTF-16, not UTF-8/ASCII. file may correctly identify it as such:

file foo.txt
foo.txt: Little-endian UTF-16 Unicode text

unless the byte-order mark (BOM) is missing, in which case file may report just data, suggesting a hex dump is in order:

xxd -g 1 -l 16 foo.txt
00000000: 54 00 68 00 65 00 20 00 77 00 6f 00 72 00 6c 00  T.h.e. .w.o.r.l.

The alternating character/NUL pattern is UTF-16LE (UTF-16BE is the reverse, 00 54 00 68 ...). So fix is stored as 66 00 69 00 78 00, and grep fix fails to match the ASCII/UTF-8 bytes 66 69 78. cat output looks normal because terminals typically don't render the NUL bytes.

Solution

Convert to UTF-8 before grepping:

iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix
By such as fix their faith on Unity.

Use UTF-16BE instead if the byte pattern is big-endian.

Why UTF-16LE and not plain UTF-16?

With a BOM, plain UTF-16 works everywhere: iconv reads the BOM and picks the byte order automatically.

Without a BOM, iconv's behavior is implementation-dependent. Common GNU/Linux and macOS iconv implementations differ: little-endian on GNU iconv, big-endian on macOS. The same file can convert on one platform but fail on another:

iconv -f UTF-16 -t UTF-8 foo.txt | grep fix     # not portable for BOM-less input
iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix   # explicit byte order

For BOM-less UTF-16, use UTF-16LE or UTF-16BE, not plain UTF-16.

Handling conversion errors

If iconv stops with illegal input sequence, -c can skip invalid input:

iconv -c -f UTF-16LE -t UTF-8 foo.txt | grep fix

Sources

Cannot grep UTF-16 Unicode files
grep UTF-16
Ken Thomases' comment on Difference between Mac and Linux iconv UTF16 to UTF8
Endianness

Update

Grep not finding text that I know is in file

❧ 2026-06-24