#PSCXTip How to determine the byte order mark of a text file
Text files created by PowerShell are little endian Unicode (UTF-16LE) by default. You can see this by inspecting the first couple of bytes of a text file for a BOM i.e. a byte order mark. BOMs are not required but PowerShell usually create a BOM when it creates a text file. Typical BOMs you’ll encounter with Windows and PowerShell are:
UTF-8 : 0xEF 0xBB 0xBF UTF-16LE : 0xFF 0xFE
You can’t use code like [System.IO.File]::ReadAllText() to view a BOM because the bytes associated with the BOM aren’t output – just the associated text is output. Get-Content works the same way except when you use the –Encoding Byte parameter. Given a file created in PowerShell:
PS> Get-Date > date.txt
You can see the encoding using Get-Content like so:
PS> Get-Content .\date.txt –Encoding Byte –TotalCount 3 255 254 13
However, unless you’re quick with your decimal to hex conversions, this output isn’t ideal. The PowerShell Community Extensions comes with a command called Format-Hex that will format its input or a specified file in hex format. This utility is much like the od command from UNIX. The output from the Format-Hex command for the same file as above would be:
PS> Format-Hex .\date.txt -Count 16
Address: 0 1 2 3 4 5 6 7 8 9 A B C D E F ASCII
-------- ----------------------------------------------- ----------------
00000000 FF FE 0D 00 0A 00 53 00 75 00 6E 00 64 00 61 00 ......S.u.n.d.a.
Here we can see the first two bytes are 0x_FF 0xFE_, which is UTF-16LE or little endian Unicode. If we saved the date.txt as UTF-8:
PS> Get-Date | Out-File date.txt -Encoding Utf8
PS> Format-Hex .\date.txt -Count 16
Address: 0 1 2 3 4 5 6 7 8 9 A B C D E F ASCII
-------- ----------------------------------------------- ----------------
00000000 EF BB BF 0D 0A 53 75 6E 64 61 79 2C 20 44 65 63 .....Sunday, Dec
Here we can see the UTF-8 BOM 0xEF 0xBB 0xBF. This tip is most useful when you’re processing a file created by another program with PowerShell and you need to make sure you leave the file in the same encoding that it started out with.
Note: There are many more useful PowerShell Community Extensions (PSCX) commands. If you are interested in this great community project led by PowerShell MVPs Keith Hill and Oisin Grehan, give PSCX a try at http://pscx.codeplex.com.
Share on: