grep not returning text known to be in file
Symptom
cat displays text, but grep can't find it:
% cat foo.txt
The world is overcome--aye! even here!
By such as fix their faith on Unity.
% grep fix foo.txt
%
Cause
The file is UTF-16, not UTF-8/ASCII.
For example, in UTF-16LE, fix is encoded as:
66 00 69 00 78 00
not as the contiguous ASCII bytes, which are also the UTF-8 bytes for fix:
66 69 78
So grep fix does not match. cat output may appear normal since NUL bytes are often not visibly rendered.
Detect encoding
% file foo.txt
foo.txt: Little-endian UTF-16 Unicode text
If the file has no byte-order mark (BOM), file may fail to identify it:
% file foo.txt
foo.txt: data
For a file expected to be text, data is a clue to inspect the bytes:
% xxd -g 1 -l 8 foo.txt
UTF-16LE has alternating character/NUL bytes:
54 00 68 00 65 00 20 00
UTF-16BE has the reverse pattern:
00 54 00 68 00 65 00 20
Solution
Convert to UTF-8 before grepping:
% iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix
By such as fix their faith on Unity.
Use UTF-16BE instead if the byte pattern is big-endian.
Why UTF-16LE and not plain UTF-16?
With a BOM, plain UTF-16 works everywhere: iconv reads the BOM and picks the byte order automatically.
Without a BOM, BOM-less UTF-16 input is implementation-dependent. Common GNU/Linux and macOS iconv implementations differ: little-endian on GNU iconv, big-endian on macOS. The same file can convert on one platform but fail on another:
iconv -f UTF-16 -t UTF-8 foo.txt | grep fix # not portable for BOM-less input
iconv -f UTF-16LE -t UTF-8 foo.txt | grep fix # explicit byte order
For BOM-less UTF-16, use UTF-16LE or UTF-16BE, not plain UTF-16.
Handling conversion errors
If iconv stops with illegal input sequence, -c can skip invalid input:
% iconv -c -f UTF-16LE -t UTF-8 foo.txt | grep fix
Sources
- Cannot grep UTF-16 Unicode files
- grep UTF-16
- Ken Thomases' comment on Difference between Mac and Linux iconv UTF16 to UTF8
- Endianness
❧ 2026-06-24