Fallback charset enables fingerprinting of bundle localization
Torbutton has the
spoof_english pref that changes the value of the
Accept-Language header to
en-us,en;q=0.5; this cloaks what particular localized bundle you may be using. But localized bundles still differ in their default (fallback) charset. By figuring out what characters a byte sequence decodes as, it's possible to find out what charset is in use.
The attack goes like this. The web server sends an HTML page with no declared charset, neither in the HTTP header (
Content-Type) nor in the HTML (
It looks like our current bundles may come with any of 6 different default charsets:
- utf-8: ar fa
- iso-8859-1: de es-ES fr it nl pt-PT vi
- iso-8859-2: pl
- windows-1251: ru
- euc-kr: ko
I found these by grepping the langpacks' unpacked
*.xpifiles for "intl.charset.default".
As an example of how byte sequences can be variously decoded, here are decodings of "\xc3\xa3":
- utf-8: ã
- iso-8859-1: Ã£
- iso-8859-2: ĂŁ
- windows-1251: ГЈ
- euc-kr: 찾
- gbk: 茫 That is, an HTML page can contain the sequence "\xc3\xa3" and it will render as different characters depending on the charset in effect.
A possible solution is just to force intl.charset.default to UTF-8 in all localizations. Here are some Mozilla bugs I found that are relevant to setting this pref to UTF-8: 910165 406498 536506 910169.
Also see https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings#Specifying_the_fallback_encoding, which indicates that Firefox's behavior with respect to the fallback charset will change:
As of Firefox 28, this section is obsolete, since the preference intl.charset.default no longer exists. The mapping from locales onto fallback encodings is now built into Gecko itself. In the best case, this could be interpreted to mean that the
spoof_englishsetting will become sufficient, and the fallback will become as it would be for en-US. Or it might just mean that the preference is moved to somewhere inside Gecko. It seems the relevant bug is 910192: Get rid of intl.charset.default as a localizable pref and deduce the fallback....