Hindi and Tamil fonts fail to render as seen on Wikipedia, might be other common languages may not be supported as well but I haven't looked further into it.
you can view this in Tor Browser to see which ones aren't supported. All of them work in Firefox.
13 of them doesn't work on Windows 10, including Bangla and Punjabi.
I've tested also Bengali (e.g., on BBC) and it's a problem as well.
We could fix try to fix all the problematic scripts at this point.
Pier Angelo Vendramechanged title from Tor Browser on Windows does not support Hindi or Tamil. to Tor Browser on Windows does not support some kind of scripts.
changed title from Tor Browser on Windows does not support Hindi or Tamil. to Tor Browser on Windows does not support some kind of scripts.
Pier Angelo Vendramechanged title from Tor Browser on Windows does not support some kind of scripts. to Tor Browser on Windows is lacking fonts to render some kind of scripts.
changed title from Tor Browser on Windows does not support some kind of scripts. to Tor Browser on Windows is lacking fonts to render some kind of scripts.
so the font used in Windows for Ethiopic is Nyala and is whitelisted by Tor Browser. But this font was dropped in default Windows starting in Windows 10 [1] and became an supplemental font
So this is the same situation of inequality across various Windows versions, e.g. Bengali (Vrinda was dropped in Win 10 as well)
In https://arkenfox.github.io/TZP/tests/fontscripts.html, I still see broken: bamum, bassa vah, balinese, buhid, hanunoo, sundanese, tagalog, tagbanwa, tamil supplement, cyrillic extended-c, cjk compatibility ideographs supplement, mandaic, samaritan.
Now, Windows 7 has been EOL for more than 2 years now (14 January 2020).
I say: RIP Windows 7, let's just stop supporting it, and add the new fonts to the allowed list.
Either now, or in 12.0. But I'd prefer more now.
AFAIK we don't collect data on Windows versions, so I don't know how many users we could possibly make angry.
I'll discuss later with the rest of the team.
Now, Windows 7 has been EOL for more than 2 years now (14 January 2020). I say: RIP Windows 7, let's just stop supporting it, and add the new fonts to the allowed list. Either now, or in 12.0. But I'd prefer more now.
AFAIK we don't collect data on Windows versions, so I don't know how many users we could possibly make angry. I'll discuss later with the rest of the team.
We don't have data, but we'll probably stop supporting Windows 7 when Mozilla does (if they start using APIs not supported by Windows 7, otherwise it will be a natural thing).
So, these are the fonts we currently (11.5a13, unless we do a really last-time change) enable:
Font list
I will complete this table with ️ and ️ when I get some time, or even better, we could add the bundled version, instead.
Font
Windows 7
Windows 8
Windows 8.1
Windows 10
Windows 11
Additional notes (e.g., supported scripts)
Arial
5.22
6.89
6.89
7.00
7.00
Batang
5.00
5.00
5.00
️
️
바탕
️
️
️
️
️
= Batang with (Korean?) script
Cambria Math
5.97
6.84
6.84
6.99
6.99
Why exactly are we enabling Math fonts? For things like MathML or for things like MathJax?
Courier New
5.13
6.87
6.87
6.92
6.92
Euphemia
5.00
5.05
5.05
️
️
Gautami
5.90
5.94
6.00
️
️
Georgia
5.51
5.51
5.51
5.59
5.59
Gulim
5.01
5.01
5.00 (?)
️
️
굴림
️
️
️
️
️
= Gulim with (Korean?) script
GulimChe
5.00
5.00
5.00
️
️
Why do we enable it, but don't enable other Che fonts?
굴림체
️
️
️
️
️
= GulimChe with (Korean?) script
Iskoola Pota
5.90
5.94
6.00
️
️
Kalinga
5.90
5.95
6.00
️
️
Kartika
5.90
5.95
6.00
️
️
Latha
5.90
5.94
6.00
️
️
Lucida Console
5.00
5.00
5.00
5.01
5.01
Malgun Gothic
6.11
6.22
6.50
6.68
6.68
Mangal
5.91
5.94
6.00
️
️
Meiryo
6.05
6.30
6.30
️
️
Meiryo UI
6.05
6.30
6.30
️
️
Microsoft Himalaya
5.00
5.06
5.10
5.23
5.23
Microsoft JhengHei
6.02
6.10
6.11
6.14
6.14
Windows 8.1 introduced the light variant
Microsoft JhengHei UI
️
6.10
6.11
6.14
6.14
Microsoft YaHei
6.02
6.10
6.14
6.25
6.23
Windows 8.1 introduced the light variant, and the bold has another version
微软雅黑
️
️
️
️
️
= Microsoft YaHei in Chinese
Microsoft YaHei UI
️
6.10
6.14
6.25
6.25
MingLiU
7.00
7.01
7.01
7.02
7.02
PMingLiU
7.00
7.01
7.01
7.02
7.02
細明體
️
️
️
️
️
= Xì míng tǐ, maybe MingLiU in Chinese?
MS Gothic
5.05
5.31
5.30 (?)
5.32
5.32
MS ゴシック
️
️
️
️
️
= MS Gothic in Japanese
MS PGothic
5.05
5.31
5.30 (?)
5.32
5.32
MS Pゴシック
️
️
️
️
️
= MS PGothic in Japanese
MS Mincho
5.05
5.30
5.30
️
️
MS 明朝
️
️
️
️
️
= MS míng cháo in Chinese
MS PMincho
5.05
5.30
5.30
️
️
MS P明朝
️
️
️
️
️
= MS P míng cháo in Chinese
MV Boli
5.01
5.01
6.00
6.84
6.84
Noto Sans Buginese
️
️
️
️
️
Bundled font
Noto Sans Khmer
️
️
️
️
️
Bundled font
Noto Sans Lao
️
️
️
️
️
Bundled font
Noto Sans Myanmar
️
️
️
️
️
Bundled font
Noto Sans Yi
️
️
️
️
️
Bundled font
Nyala
5.00
5.01
5.01
️
️
新細明體
️
️
️
️
️
= Xīn xì míng tǐ, maybe PMingLiU in Chinese?
Plantagenet Cherokee
5.00
5.07
5.07
️
️
Raavi
5.90
5.94
6.00
️
️
Segoe UI
5.13
5.37
5.37
5.62
5.62
Many variants on weight (with possibly different versions), Segoe UI Emoji is new on Windows 8.1
Shruti
5.90
5.94
6.00
️
️
SimSun
5.03
5.13
5.13
5.16
5.18
宋体
️
️
️
️
️
= Sòngtǐ in Chinese, SimSun according to the logic so far
Sylfaen
5.02
5.04
5.04
5.06
5.06
Tahoma
5.22
6.05
6.05
7.00
7.00
Times New Roman
5.22
6.89
6.89
7.01
7.01
Tunga
5.90
5.94
6.00
️
️
Twemoji Mozilla
️
️
️
️
️
Bundled font
Verdana
5.31
5.31
5.31
5.33
(Both on Windows 7 and 8 italic and bold are version 5.30)
Vrinda
5.90
6.80
6.81
️
️
Yu Gothic UI
️
️
️
1.90
1.93
we wouldn't need five Latin script fonts
I disagree. Sites do assumptions.
So, Arial, Verdana, and Tahoma are already 3 sans-serif latin fonts, but sites expect Windows users to have them, and maybe don't use a fallback font.
(In some cases, bad sites even specify one of them without specifying sans-serif, for the joy of users of other OS).
"we wouldn't need five Latin script fonts" - yeah, I meant useless crap, not system fonts expected on all versions such as Tahoma (which maps to MS Shell Dlg etc) and it's no effort to keep those. I meant crap like Impact, Ink Free?, Broadway, Candara - I'm guessing here, but what I meant was we don't need all of them)
Win7: 20% of Firefox users are windows 7, that would be a good guess to start from, but it doesn't really matter. If Firefox officially supports the OS, we should too
Win7: this is not about win7 (or fingerprinting, except that we need to be careful to not add supplemental fonts): we will not be able to counter metric differences such as unicode support: This is about usability. If we fix Win10, this also fixes Win7 by default (and just so happens to move us towards a unified fonts solution). An advanced script will run rings around this
Win7: looks like you or the team answered your own question
Using the quick diff between 7 vs 10, was to pull out the major differences and confirm that those fonts were dropped starting in Win10 (and I nailed it! I want a ). If we want to support those scripts (and maybe some of them were only supported in the past because windows had a default font), then it's that simple: add the noto font (they all seem to be ones we already use in e.g Linux), change the whitelist, modify the font.name prefs. Win7/8.1 then fall into line anyway.
If we really need to, we could fracture windows users into two different font prefs, e.g. by adding window10+ only fonts to the whitelist to support more scripts in win10+. Win7 vs 10 is already divergent
So
add fonts that all versions have (tahoma, verdana etc) - optionally add win10 only fonts
add noto fonts for scripts we wish to support - we don't have to support all of them
"I still see broken" - some/all? of those have always been broken. some are really obscure
the list of fonts here each show info on the code points supported, which is handy
in testing which fonts broke a script I just removed fonts from the whitelist until it broke in the TZP script test (it updates asynchronously, so I just had it open in a separate window), reset, rinse, repeat - so not a 100% method. My bad.
"we wouldn't need five Latin script fonts" - yeah, I meant useless crap, not system fonts expected on all versions such as Tahoma (which maps to MS Shell Dlg etc) and it's no effort to keep those. I meant crap like Impact, Ink Free?, Broadway, Candara - I'm guessing here, but what I meant was we don't need all of them)
Okay, I can agree on that.
This is about usability.
That's why font related issues need their tags .
If we fix Win10, this also fixes Win7 by default
Only if we use fonts that were available also on Windows 7...
It was empirical: I've tested every script with a not customized Firefox, and saw what it used.
Maybe documentation is simply wrong, it happens sometimes.
The whole NotoSans*.ttf we use on Linux are between 25MB and 30MB.
Compressed in xz (without the default level, not the maximum) they are about 14MB.
However, I think fonts aren't updated frequently.
So, the additional download would be only on the first download, and then the incremental archives woudln't have this problem.
I'm not a fan of dividing the Windows crowd even with these purposes.
Also, we may increase the archive/installer sizes by creating multi-language packages.
Documentation can indeed be out of whack. Do you mean liveReload to enable a single TB install per platform, think of all the server storage space saved. Soz, offtopic.
I think we can agree that if we have to bundle something for win10/11, then we just make win-other match. If an additional 12mb compressed over current is acceptable, then it's pretty straightforward - we just need to check. PS: that font table is looking great, thanks. Good point on the incrementals - we should check all notos are up to date at some stage
Edit
Only if we use fonts that were available also on Windows 7
I think you misunderstood. If we bundle fonts for one, we apply them to all (e.g. we would bundle Noto Sans Bengali and drop vrinda from the whitelist since it only applies to win7/8) - that's what I meant about the others falling into line
Thanks! The 8.1 should be checked twice, because at a certain point I may have used Windows 8 fonts, instead . I got confused when I saw that one font was apparently downgraded.
I think you misunderstood. If we bundle fonts for one, we apply them to all (e.g. we would bundle Noto Sans Bengali and drop vrinda from the whitelist since it only applies to win7/8) - that's what I meant about the others falling into line
Oh, yes, I thought you were still speaking of system fonts.
Of course, with bundled fonts we'll solve everything.
Now we only need to understand which one are going to be really needed.
IDK if it's important to list the unicode version, because there's nothing we can do about that with system fonts. We really only need to know what fonts are on what version, unless I'm missing something (e.g. we just want it documented). BTW, I have an upcoming PoC that tests a char or two from each script per unicode version (from say about v5.0 on) - if it's speedy enough, and I can reduce it to a bare minimum, it goes into TZP That might have helped you here, but it's not ready
MS docs.
I think it's the font version, though, not the Unicode version.
Creating another table with that information would be a good idea.
I love VMs, but I don't really have a VM per Windows version.
But if you wanted to know the Unicode version, either using the VMs, or grabbing the files from each Windows version and feed them to some script to automatically compile a table would be a better idea.
When you are on Linux, they make you download the ISO, rather than the media creation tool.
Spoofing the user agent allows you to do the same on Windows (but IIRC, eventually they added a small link to download the ISO, instead of the tool).
The important anyway, is getting a safe source for its hash, rather .
These ISOs need product keys you are legally allowed to use, etc etc etc.
Okay, I've finished going through the list of fonts we currently enable.
Some fonts actually haven't been completely deleted by MS, but moved as supplemental.
Which means that joint with some technique to detect the Windows version, it could be used to reveal user's installed locales, which is even worse!
A good tradeoff is deleting them, and using bundled fonts.
Fonts removed/moved to supplemental in Windows 10:
Batang/바탕
Euphemia
Gautami
Gulim/굴림 and GulimChe/굴림체
Iskoola Pota
Kalinga
Kartika
Latha
Mangal
Meiryo and Meiryo UI
MS Mincho/MS 明朝 and MS PMincho/MS P明朝
Nyala
Plantagenet Cherokee
Raavi
Shruti
Tunga
Vrinda
Fonts not available on some Windows versions:
Microsoft JhengHei UI and Microsoft YaHei UI (not available on Windows 7)
Yu Gothic UI (available from Windows 10).
Now, I could cross reference these information with Thorin's comment, but I think I'll just do some tests on my Windows 10 VM (with also some Noto fonts), and then add here a candidate list for font.system.whitelist and for fonts to be bundled in TBB.
For a complete test, we should test them in each Windows versions.
I should still be able to check on a Windows 11 system, and I could try with the VMs I linked above on 7 and 8.1 (but not 8 - I may still have some ISO in some disk, though).
Okay, so, first of all, I've cleaned my font list to remove the old fonts:
Arial, Cambria Math, Courier New, Georgia, Lucida Console, MS Gothic, MS ゴシック, MS PGothic, MS Pゴシック, MV Boli, Malgun Gothic, Mangal, Meiryo, Meiryo UI, Microsoft Himalaya, Microsoft JhengHei, Microsoft YaHei, 微软雅黑, MingLiU, 細明體, Noto Sans Buginese, Noto Sans Khmer, Noto Sans Lao, Noto Sans Myanmar, Noto Sans Yi, PMingLiU, 新細明體, Segoe UI, SimSun, 宋体, Sylfaen, Tahoma, Times New Roman, Twemoji Mozilla, Verdana
getting closer: those orange georgian/cyrillic/mongolian/nko (who uses nko, ahh, west africa) were never supported in the past. We're never going to get a perfect match. Edit: If we did add those, not saying to do that - then the diffs are down to a few code points difference (this is the minimal code point test, so ignores later unicode changes, of which there would be legion)
linux results: unsupported 35. When you add the Noto fonts to windows, that should drop from 48 to 35'ish (i.e all 13 scripts from my initial list of no longer shipped by windows by default). This consistency is starting to feel good for usability
Wouldn't CJK fonts be rather large? There's also an issue (I assume still open) to do with Chinese fonts that got marked confidential because the reporter went super paranoid (and it annoys me because there's nothing confidential or PII in it)
meet ... if you guys want a font label, do it. AFAICare Fonts will always be a FP issue because we need to be careful, but also doubles for accessibility
There are both Noto Serif CJK something (a.k.a. Language-specific OTFs, each one is about 25MB alone, or 125MB together, which means doubling Tor Browser's size ) and Noto Serif Something (a.k.a. Region-specific Subset OTFs, and each one is about 5MB, much better).
All the others I listed are 5MB together, and 2,5MB in a zip (but we use lzma, so possibly less than 2MB on an archive that is already ~100MB).
Adding them would be trivial, and possibly improve accessibility and/or the experience.
2mb compressed ... I assume they add value, e.g. Hebrew was listed as supported in win7 (me) win10 (your list), AFAICT, so what does Noto Serif Hebrew bring to the table? Are we talking about across all platforms? If so then that seems to be out of scope a little, and we'll never achieve parity (unless we bundled every font)... but that said, I think that's a cheap 2mb (and it does fix some win7 such as georgian etc). So yeah, some wins, and some consistency across platforms (for support too)
there's a new shiny Font label ... if you want I can compile a list of issues (open and closed) to replace Fingerprinting, or add. And yeah, there's a bunch of old font related issues that have been hanging around sine Jesus
ask some information to someone that knows languages more
I'm still intrigued as to the font.name prefs importance, because web content should map to fallback (not that we want to cause reflow). I read a ticket recently where the UI (tab title) used a different font to the one listed in preferences, so the naming is important
Arial, Cambria Math, Courier New, Georgia, Lucida Console, MS Gothic, MS ゴシック, MS PGothic, MS Pゴシック, MV Boli, Malgun Gothic, Mangal, Meiryo, Meiryo UI, Microsoft Himalaya, Microsoft JhengHei, Microsoft YaHei, 微软雅黑, MingLiU, 細明體, Noto Sans Bengali, Noto Sans Buginese, Noto Sans Canadian Aboriginal, Noto Sans Cherokee, Noto Sans Devanagari, Noto Sans Ethiopic, Noto Sans Ethiopic, Noto Sans Gujarati, Noto Sans Gurmukhi, Noto Sans Kannada, Noto Sans Khmer, Noto Sans Lao, Noto Sans Malayalam, Noto Sans Myanmar, Noto Sans Oriya, Noto Sans Sinhala, Noto Sans Tamil, Noto Sans Telugu, Noto Sans Yi, PMingLiU, 新細明體, Segoe UI, SimSun, 宋体, Sylfaen, Tahoma, Times New Roman, Twemoji Mozilla, Verdana
Result on Windows 10
unsupported [34]
adlam
balinese
bamum
bassa vah
braille patterns
buhid
cjk compatibility ideographs supplement
cjk unified ideographs extension-b [19/20]
control pictures
cyrillic extended-c
deseret
glagolitic
gothic
hanunoo
javanese
mandaic
new tai lue
ogham [19/20]
old italic
old turkic
optical character recognition
osage
osmanya
phags-pa
runic
samaritan
sundanese
syriac
tagalog
tagbanwa
tai le
tamil supplement
tifinagh
vai
mixed [3]
combining diacritical marks supplement [9/20]
mongolian [10/20]
nko [10/20]
partial [1]
combining diacritical marks for symbols [1/18]
I must say I'm satisfied with this new result.
It's slightly better than Linux except for cyrillic extended-c, but both were expected.
And the new fonts are 1.73MB uncompressed, less than 1MB in a zip file!
Including them is a trivial choice, and we should do the same process once #41004 (closed) gets fixed upstream.
I wanted to create a testbuild as well, so I was preparing the tor-browser-build part.
However, I'm getting lost on the details on why we have some unhinted fonts and some hinted ones.
From what I understood, I would say hinted fonts are better, and should always be preferred if available.
So, I was looking for the reasons on GitLab. Sadly, the commit that set unhinted fonts doesn't have an associated ticket number.
between the two, the tofu hash stayed identical (we're only testing the first 20 chars using only early unicode versions) - edit i.e between the win7 current setup and the win7 new test with 19 bundled
Anyway, win7 barely moves the needle (this is with 19 bundled fonts including Twemoji), and obviously win10 has improved. If we ignore georgian + some cyrillic (that's 5), and the other two are mongolian and nko (which win10 becomes partial). After that the diffs are like one or two code points
click me for details
soz, I zoomed out to get it all in one screen gab, it's not super sharp
I think all we need to do is bundle a noto serif each for georgian and cyrillic-a/b ?
just to be clear, I meant in terms of the minimal test and just looking at basic script support (the tofu hash of 810 tofus didn't even change!). Obviously more up-to-date fonts will produce better unicode version support. So in effect, we've updated win10 now to match (and be slightly better) than win7, and win7 gets up-to-date fonts, and both use the same fonts
add georgian and cyrillic and we're there IMO
Noto Serif, or Noto Sans
your earlier list is all Serif, so I assumed that's what we would want
I don't have tests for serif vs sans - what is the difference here. If an element specifies sans-serif vs serif, does the display not fallback to render chars properly and ignore the generic font-family? IANAE on css
Uhm. My experience with latin characters is that browser falls back to the default font, which usually is some serif (e.g., Times New Roman).
If that font doesn't contain the requested characters, I expect the browser to try with some other "similar" font (i.e., try to use another serif if it was a serif, or another sans, if was a sans), and eventually fallback to whatever is available.
I'm not sure about what the standard dictates about this topic (nor if the standard leaves it to the implementation).
I expect non-latin scripts to have sans and serif fonts as well, so I hope Firefox/the font libraries can do the right thing, and try to do what sites want.
for #40849 (closed) , on my ToDo list, was to replicate the script index, and when you click a script, it would load that script as monospace, serif, sans-serif. I'll see if I can get onto that
I expect non-latin scripts to have sans and serif fonts as well,
As for providing sans and serif: one thing I notice, on win7 at least, is that the font*name prefs for each script can vary, e.g. he has (Firefox)
serif as Narkisim, David
sans serif as Arial
monospace as Fixed Miriam Transparent, Miriam Fixed, Rod, Consolas, Courier New
Others can be the same or near to it e.g Settings > Language & Appearance > Fonts > Advanced
order: Serif, Sans-serif, monospace
Bengali is Vrinda, Vrinda, default
Gujarati is Shruti, Shruti, Shruti
But overall, it looks like most scripts have multiple fonts, which backs up your quoted snippet. Some scripts I wouldn't expect to have any (but IANAE) e.g. CJK. I mean if they're already super flowery/scripty/embellished or ideographs.
My gut instinct is that char fallback will work regardless of generic font-family, and I'm wary of download sizes. That zip you sent is 796k which is not a huge addition, double it plus some for serif and a few missing we plan to add, and I think we're good. Just match em all to the font.name prefs
Balinese
That would be a nice addition. We should check after all of this that our script support is consistent across platforms
So, is it used for something other than the font preferences
IDK. I'm not a wizard, harry potter! Umm, so see hebrew above, on say serif .. if it can't find Narkisim it then tries David and if it can't find that it falls back to the x-western (or whatever the pref is) serif, and failing that IDK?
Would be nice if we could get some insight from henri or jonathon
Uff, I don't know if I'm being unlucky with GitHub servers, or my connection doesn't like downloading big files (it usually can), or what: tor-browser-build#40529 (closed).
TL;DR: we're using an old version of Noto fonts. Switching to a new version is... complicated.
We had a patch to reduce the download size (the repo is 6GB, or even more now), but GitHub produces corrupted zip for newer versions (switching to .tar.gz doesn't help).
I'll see if we can find a tradeoff, so we're blocked on this.
The fonts directory grew from 3.5MB to 7MB, but I think it's okay for the result we obtained:
Result on Windows 10
unsupported [32]
adlam
bamum
bassa vah
braille patterns
buhid
cjk compatibility ideographs supplement
cjk unified ideographs extension-b [19/20]
control pictures
deseret
glagolitic
gothic
hanunoo
javanese
mandaic
new tai lue
ogham [19/20]
old italic
old turkic
optical character recognition
osage
osmanya
phags-pa
runic
samaritan
sundanese
syriac
tagalog
tagbanwa
tai le
tamil supplement
tifinagh
vai
mixed [2]
mongolian [10/20]
nko [10/20]
partial [1]
combining diacritical marks for symbols [1/18]
Later I'll test also with other OS versions.
I think that I should also try to get a few hashes to use as a PoC to see if infra-OS fingerprinting is trivial or if we can resist it without too much trouble with these changes.
get a few hashes to use as a PoC to see if infra-OS fingerprinting is trivial
do you mean between e.g. windows versions? the answer is yes, it is trivial: not that a script has to "guess" the version, the entropy is in the metrics
do you mean between e.g. windows vs mac .. forget about it, so trivial it got added to Trivial Pursuit back in 1979 ... lost cause
do you mean between e.g. windows versions? the answer is yes, it is trivial: not that a script has to "guess" the version, the entropy is in the metrics
This one
do you mean between e.g. windows vs mac .. forget about it, so trivial it got added to Trivial Pursuit back in 1979 ... lost cause
I honestly can't tell the difference except for spacing/kerning. Also wrapped in divs for you Hopefully this will show up Noto Sans vs Noto Serif, and we can experiment (or not, just do it) the font.name prefs
Qyestion: assuming mac bundled fonts is resolved soon, are we planning on updating bundled fonts for linux/windows/mac in ESR91 (which has about four or five months left max), or just ESR102+ ?
OK, I just know that sysrqb loves flag days but mac will already change when we fix bundled not being used anyway (plus the Menlo fix). Will await the official builds and update TB whitelist/bundled lists
but mac will already change when we fix bundled not being used anyway (plus the Menlo fix).
Uhm. I wouldn't mind Mozilla to fix the bundled fonts, because they know better how things work. But I think that it won't be for 11.5.
We might try to backport, in case (but I don't assure anything).
The alternative is that I try to setup some debugging environment on a Mac. I have compiled a testbuild with debug symbols, so I should be able to debug, but I don't know how to.
I expect installing lldb through brew (and maybe also VS Code because I like fancy things) should be enough.
https://en.wikipedia.org/wiki/Flag_day_(computing) - which I take in our cases to mean not that it is difficult/large, but that it alters fingerprints, and ESR releases are a prime time to do these things (because not everyone updates in a timely fashion, or even updates at all)
pref("font.system.whitelist","Arial, Cambria Math, Courier New, Georgia, Lucida Console, MS Gothic, MS ゴシック, MS PGothic, MS Pゴシック, MV Boli, Malgun Gothic, Mangal, Microsoft Himalaya, Microsoft JhengHei, Microsoft YaHei, 微软雅黑, MingLiU, 細明體, PMingLiU, 新細明體, Segoe UI, SimSun, 宋体, Sylfaen, Tahoma, Times New Roman, Verdana, Twemoji Mozilla, Noto Sans, Noto Sans Balinese, Noto Sans Bengali, Noto Sans Buginese, Noto Sans Canadian Aboriginal, Noto Sans Cherokee, Noto Sans Devanagari, Noto Sans Ethiopic, Noto Sans Georgian, Noto Sans Gujarati, Noto Sans Gurmukhi, Noto Sans Kannada, Noto Sans Khmer, Noto Sans Lao, Noto Sans Malayalam, Noto Sans Myanmar, Noto Sans Oriya, Noto Sans Sinhala, Noto Sans Tamil, Noto Sans Telugu, Noto Sans Yi, Noto Serif, Noto Serif Balinese, Noto Serif Bengali, Noto Serif Devanagari, Noto Serif Ethiopic, Noto Serif Georgian, Noto Serif Gujarati, Noto Serif Gurmukhi, Noto Serif Kannada, Noto Serif Khmer, Noto Serif Lao, Noto Serif Malayalam, Noto Serif Myanmar, Noto Serif Sinhala, Noto Serif Tamil, Noto Serif Telugu, Noto Serif Tibetan");
I've changed the order of the list: instead of going alphabetically, I kept system fonts first, and bundled fonts later (first Twemoji, then all the other bundled fonts).
I have learn a pair of things about the font preferences.
font.name.* should not be used, and font.name-list.* should be used, instead (// All prefs of default font should be "auto"., from modules/libpref/init/all.js);
only a subset of languages is valid for these preferences (see gfx/thebes/gfxFontPrefLangList.h, for example), the others should specified in the catch-all group (x-unicode): Firefox is smart enough to use the correct font. Of course, if only one font covers a certain range of codepoints, specifying it is useless.
So, I've fixed Linux first (because, of course I do preferences, not because my Linux dev build is quicker to update and test ).
Here are the promising results:
Linux fonts, after the update of the preferences
Language
Monospace
Sans
Serif
Has Noto Sans
Has Noto Serif
gfxFontPrefLangList.h
Arabic
Noto Naskh Arabic
Noto Naskh Arabic
Noto Naskh Arabic
ar
Armenian
Noto Sans Armenian
Noto Sans Armenian
Noto Serif Armenian
x-armn
Balinese
Noto Sans Balinese
Noto Sans Balinese
Noto Serif Balinese
Bengali
Noto Sans Bengali
Noto Sans Bengali
Noto Serif Bengali
x-beng
Buginese
Noto Sans Buginese
Noto Sans Buginese
Noto Sans Buginese
Canadian Aboriginal
Noto Sans Canadian Aboriginal
Noto Sans Canadian Aboriginal
Noto Sans Canadian Aboriginal
x-cans
Cherokee
Noto Sans Cherokee
Noto Sans Cherokee
Noto Sans Cherokee
ChineseCN
Noto Sans SC
Noto Sans SC
Noto Sans SC
️
️
zh-CN
ChineseHK
️
️
️
️
️
zh-HK
ChineseTW
Noto Sans SC
Noto Sans SC
Noto Sans SC
️
️
zh-TW
Cyrillic
Cousine
Arimo
Tinos
x-cyrillic
Devanagari
Noto Sans Devanagari
Noto Sans Devanagari
Noto Sans Devanagari
x-devanagari
Ethiopic
Noto Sans Ethiopic
Noto Sans Ethiopic
Noto Sans Ethiopic
x-ethi
Georgian
Noto Sans Georgian
Noto Sans Georgian
Noto Serif Georgian
x-geor
Greek
Cousine
Arimo
Tinos
el
Gujarati
Noto Sans Gujarati
Noto Sans Gujarati
Noto Serif Gujarati
x-gujr
Gurmukhi
Noto Sans Gurmukhi
Noto Sans Gurmukhi
Noto Serif Gurmukhi
x-guru
Hebrew
Cousine
Arimo
Tinos
he
Japanese
Noto Sans SC
Noto Sans SC
Noto Sans SC
️
️
ja
Kannada
Noto Sans Kannada
Noto Sans Kannada
Noto Serif Kannada
x-knda
Khmer
Noto Sans Khmer
Noto Sans Khmer
Noto Serif Khmer
x-khmr
Korean
Noto Sans KR
Noto Sans KR
Noto Sans KR
️
️
ko
Lao
Noto Sans Lao
Noto Sans Lao
Noto Serif Lao
Malayalam
Noto Sans Malayalam
Noto Sans Malayalam
Noto Serif Malayalam
x-mlym
Mathematics
(a lot)
(a lot)
(a lot)
x-math
Mongolian
Noto Sans Mongolian
Noto Sans Mongolian
Noto Sans Mongolian
Myanmar
Noto Sans Myanmar
Noto Sans Myanmar
Noto Serif Myanmar
Oriya
Noto Sans Oriya
Noto Sans Oriya
Noto Sans Oriya
x-orya
Sinhala
Noto Sans Sinhala
Noto Sans Sinhala
Noto Serif Sinhala
x-sinh
Tamil
Noto Sans Tamil
Noto Sans Tamil
Noto Serif Tamil
x-tamil
Telugu
Noto Sans Telugu
Noto Sans Telugu
Noto Serif Telugu
x-telu
Thaana
Noto Sans Thaana
Noto Sans Thaana
Noto Sans Thaana
Thai
Noto Sans Thai
Noto Sans Thai
Noto Serif Thai
th
Tibetan
Noto Serif Tibetan
Noto Serif Tibetan
Noto Serif Tibetan
x-tibt
Western
Cousine
Arimo
Tinos
x-western
Yi
Noto Sans Yi
Noto Sans Yi
Noto Sans Yi
Others (catch all group)
x-unicode
The next phase is copying Mozilla's defaults for Windows, remove the fonts we don't allow, and then add our Noto fonts.
we do not ship Mongolian fonts, but we should (also Windows 10 has only mixed [10/20] support)
apart from Mongolian, some scripts are still unsupported on Windows 7:
kangxi radicals
modi
nko (mixed support on Windows 10, unsupported on Linux): there is a Noto Sans font for it!
we also have some other differences, from Windows 10:
Arabic has mixed support on Windows 7 (5/20), so we could add its Noto font
cjk radicals supplement has mixed support, but is possibly supported on Windows 10
previous Windows 10 test was lacking on combining diacritical marks supplement [9/20], Windows 7 supports them, thanks to Noto Sans Regular (I have enabled it)
Windows 7 and 8 (build 9200) are very similar. The main differences are the Mongolian and N'Ko partial support (read: full of tofus anyway) on Windows 8.
Another difference is that Arabic has full support in Windows 8 (including, Courier New contains Arabic, so it's a real monospaced result!).
Then, I've discovered that the developer console lists also Arial, Courier and Times New Roman because they're used for spaces between symbols .
Cambria Math has the best-looking set R of those 3...
But from what I recall LaTeX documents have a different output for it.
So, I wonder whether the screenshots are very accurate, but it could be that actually we'd need much more fonts to handle Maths correctly.
While debugging on Mac I've found that Apple/Mozilla have also taken the Noto Sans approach (see gfxPlatformMac::GetCommonFallbackFonts in gfx/thebes/gfxPlatformMac.cpp: it contains also a list of Noto fonts Firefox expects to find on macOS, we could at least match it in all platforms).
I said that this issue was for Windows... but also spoke for Linux. So, now that I've fixed the problems on Mac (at least on my local build), I can also see what's the Mac situation .
Uhm.
Yes, We could keep this issue for finishing with the allowed list on Windows.
I have posted the macOS results mainly to have more information about current differences between platforms.
The allowed lists for Windows and Linux I have in my bug_30589 branch bring all the Windows versions and Linux to a similar level of script support.
We have to decide whether we want to extend it.
The easy solution is including all the Noto Sans fonts (in their regular weight). IIRC, they are be about +4MB compressed for each installer, but Richard was reminding me that 4MB * 5 platforms * 36 languages = +720MB for each release.
So, a better solution is pruning the list in some clever way (and possibly enable all when we switch to multi language installers).
This pruned list is what eventually is missing here.
An idea is to include all the languages Tor Browser is translated to, plus all the languages of Tor sites (even though they use web fonts), plus any script requested in issues (e.g., Khmer and dingbats) or in the tor-l10n mailing list (currently none, but I like the idea).
Another idea could be include all currently used languages, and ignore Unicode blocks for historical scripts (e.g., cuneiform).
To sum up, I'd slightly correct your list in this way:
mac monospace: done
mac bundled fonts: almost done, I'll send the patch to Mozilla, before merging it to our repos
find the list of scripts we want to support: we're here, 2 isn't blocking it
update the fonts project of Tor Browser Build: we've started it, but 3 is blocking it
update allowed lists: blocked by 3 and 4
tests
Twemoji Mozilla isn't rendered on macOS.
For #20842: I think it touches more points, like seeing which fonts have a better rendering for CJK (for which I cannot help), so I would not close it.
Anyway, this week we have the hackweek at Tor, so any developments will be likely delayed to the next week .
I'll try send the macOS patch to Mozilla for review this week anyway.
The unsupported fonts are either some extensions, or historical fonts, so they're expected not to be supported (I had to cut size down somehow).
Script view basically has almost fonts visible now .
And the serif/sans-serif difference is visible.
However, I have sent yet another change, that keep Noto Fonts before system fonts, to avoid having a mix in case system fonts support a few glyphs (except for Arabic, in which system fonts have good coverage from what I understood).
Linux has very similar results! But the UI changed font as well ️️
Woo! The serif is a nice touch. Am a bit iffy about the size increases, but that's all up to you guys. In a perfect world we would ship all the fonts same on all platforms and lock them to those.
side-note: I'm surprised at a few of those partials, 1 kannada, 1 hebrew, 1 cyrillic, 3 currency .. is this a mapping issue or because google needs more money to update their fonts ?
But the UI changed font as well
Yikes. Only for en? Experimenting years ago with whitelist, when you whitelist nothing except say one fake font, the UI changed in windows, but font detection would pick up a heap of fonts still, because in effect you can't stop them - e.g. helvetica, tahoma etc. I forget which one was the UI.
// from my old font listlet fntAlways = { windows: ["Courier","Helvetica","MS Sans Serif","MS Serif","Roman","Small Fonts","Times","宋体","微软雅黑","新細明體","細明體","굴림","굴림체","바탕","MS ゴシック","MS 明朝","MS Pゴシック","MS P明朝",],}
I would think the same applies in Linux, e.g. the font used for widgets - on Ubuntu it is usually Ubuntu. Still, UI should be exempt from whitelisting, but that probably requires some engineering. IDK.
The purpose of Script view is two-fold. First, to detect support. But secondly, I will use it to reduce the chars per script to reduce measurement entropy (but not unicode version entropy), for subsequent tests (basically david fifield's glyph PoC on steroids) and this reduction also helps perf.
Anyway, that leads me to something you said earlier, about my test not showing some scripts. And I could remove some, like I did once before e.g. medefaidrin. It's absolutely no problem for me to add more scripts, so speak up, or no more food for you!
Woo! The serif is a nice touch. Am a bit iffy about the size increases, but that's all up to you guys. In a perfect world we would ship all the fonts same on all platforms and lock them to those.
My idea is this to be a temporary thing, and include everything when we switch to multi-lingual installers (as I wrote on #17400 (closed)).
side-note: I'm surprised at a few of those partials, 1 kannada, 1 hebrew, 1 cyrillic, 3 currency .. is this a mapping issue or because google needs more money to update their fonts ?
Yes, I was surprised as well... If the script is correct, I'd say time, rather than money. (Well, they're the same, in a certain sense).
But I haven't really looked at your source code .
I would think the same applies in Linux, e.g. the font used for widgets - on Ubuntu it is usually Ubuntu. Still, UI should be exempt from whitelisting, but that probably requires some engineering. IDK.
Well, we force TBB to use our fonts directory only, in which we don't include Ubuntu.
When I had a look at the monospace issue, I've noticed that at a certain point Firefox chose the first font it believed to be okay enough.
Maybe messing with preferences something more may work, at least as a workaround, but I haven't tried yet.
That's an issue I didn't want to deal with for sure.
I've called it "All Noto Fonts", but the first column are enum values used by Firefox.
It's absolutely no problem for me to add more scripts, so speak up, or no more food for you!
I really don't have a preference on scripts to test.
My suggestion would be to automatically extract the data you use for the tests from the official Unicode data (last time I've played with it they were a series of .txt files, but maybe you can also find them already preprocessed to JSON, or XML, or whatever).
In that way you could test everything with a low effort (if writing some script to apply my idea is low effort for you).
On windows it's MS Shell Dlg \\32 and that's not whitelisted or in the windows font directory (but it does map to Tahoma which is in both)
In that way you could test everything with a low effort
Ugh. Perf. But yes, I could use that to get more scripts ideas, I just thought you might like some added, although I think I've got the ones people use
But I haven't really looked at your source code
script just outputs each code point at a whoppingly huge size in an offscreen div+span (prerendered for async fallback) and then compares the size of each to known tofu (known unassigned, but I should update that to those marked never use) (calculated each run due to zoom). There are about 5 false positives in windows 7 chromium, none in windows 7 FF. I even use ZWNJ (chromium requires it). You can easily check by running the ALL MAX test and just scan the colored items. I can't help size collisions
basically the same as david's glyph PoC but on steroids
Yup, Cantarell was one I knew about. You'll find it differs across a lot of distros. I used to list them by distro and claim you were openSuse or whatever, but in the end it was easier to get mac/win/droid else must be linux: that and I can't test all distros, and "guessing" to show off isn't required for entropy
Cantarell is GNOME's font, and many distros defaults to GNOME. One notable exception is Ubuntu, which customizes GNOME and replaces the font with... Ubuntu.