See the question and my original answer on StackOverflow

You need two encodings to read an XML file (I will not mention the BOM which is just another hint that simplify things):

1) the first encoding is used to read the XML declaration. It's more a byte-encoding oriented encoding because you only need to read US-ASCII characters. You have a bunch of bytes, and you need to read a bunch of ASCII characters.

Note it works because encoding names can only contain US-ASCII characters (IANA Character Sets). For example, at that stage, you don't really need to differentiate between UTF-8 and US-ASCII because they encode ASCII characters the same way.

So, the number of encodings to test here is limited, because you focus on byte -> ASCII (1 byte -> 1 char, 2 bytes -> 1 char, 4 bytes -> 1 char, etc.) character conversion, not the whole Unicode set. The encoding you will use here may not be used for the rest of the file.

At that point for example, you will not be able to differentiate a file using the Windows-1252 encoding from a file using the ISO-8859-1 encoding. For this you need to read the encoding name.

2) the second encoding is used to read the rest of the file.