Is there some reason we need, for example, Malgun Gothic Semilight when we already have Malgun Gothic, etc. Can we replace JhengHei and YaHei with bundled Noto's? Do we really need anything listed that is only available in windows 8+
No, and they aren't part of Windows's font.system.whitelist already.
Hmmm, I was going off my code, which I recently updated directly from the whitelist .. maybe I fubared - I will recheck
This is a good question.
I was only asking because of the win8+ fonts, but if none of them are whitelisted (or leaking, I will need to test that, I was just going off the font lists per windows) then it doesn't matter
Long term, as long as each platform doesn't have diffs between users, then we're good. Windows does still throw diffs due to optional fonts such as Arial Narrow (does not come with windows, but comes with at least MS Office and whitelisting Arial allows this font as well), but we're pretty good and windows is a large bucket
Oh snap, I must have needed some coffee, you are right, all those win8+ fonts are not whitelisted (and will not leak, but I'll test on my win10 shortly)
EDIT: "and will not leak, but I'll test on my win10 shortly" - I think they leak
If it can be detected, it means that fonts whitelisting isn't enough, and we need a way to stop additional variants. Same for all the 8.1 additions.
tested on windows 7
// variants: not whitelisted but are allowed due to e.g. "Arial""Arial Black","Arial Narrow","Segoe UI Light","Segoe UI Semibold",
Segoe UI Light and Semibold are not an issue, they're in all windows since 7. I've let to check on windows 10, but based on the above I highly suspect all the following variants are also allowed in web content despite not being whitelisted
All you have to do is load windows 10 up, make sure you have those fonts, load https://arkenfox.github.io/TZP/tests/fontcheck.html, select the preset windows, run the sucker, and see if they were found or not
We should confirm this @pierov reopening so we don't lose track (we can we name the title) - it may be that eventually when we get to a single install for all languages, that we bundled and limit all fonts in windows same as in Linux
Microsoft JhengHei/YaHei UI (8) are not whitelisted, so Microsoft JhengHei/YaHei UI Light (8.1) don't leak, phew, because that (with a UI) is a different font family (to that without a UI) - you never know, fonts are weird
so the only ones not leaking below are those cases I just mentioned, you do have them in your system fonts right? just checkin'!
// variant leaks// 7"Arial Black","Arial Narrow","Segoe UI Light","Segoe UI Semibold",// but everyone should have them// 8"Segoe UI Semilight",// 8.1"Microsoft JhengHei Light""Microsoft YaHei Light","Segoe UI Black",// 10"Malgun Gothic Semilight",// ignore these, the win8 fonts are not whitelisted so the 8.1 fonts do not leak// 8"Microsoft JhengHei UI","Microsoft YaHei UI",// 8.1"Microsoft JhengHei UI Light","Microsoft YaHei UI Light",
I don't think it's worthwhile testing 8/8.1 to see if they leak.
Also, I don't have Office
no need to test for that, we already know some Arial's leak. If that was all you wanted to know. It also doesn't necessarily leak on win10 (not sure if it's the narrow or the black)
This issue on windows should go away with ESR128 which is win10+ only. But users can still add/remove system fonts. Ideally we should bundle all fonts like Linux, but I get it, CJK would add IDK 4 or 5mb compressed?
^ not a font variant issue, if you want to address this in a new ticket. I suspect that if you upgrade from 7/8/8.1 to win10+, you would keep those fonts, so it's not just a win version equivalency
ugghh: sorry for the noise, let me re-post this in the right thread
My understanding is that only Songti SC Black is used (and only when the font-weight is changed).
No, also light is used, but doesn't seem very light.
Also, I thought that black was enough to make my point .
Songti has light, regular, bold and black. PingFang has thin, ultralight, light, regular, medium and semibold.
That's why I was suggesting to check all weights: 100, 200, 300, 400, 500, 600, 700, 800, 900 and 950.
I'd say you can test only the font you know a user already have, or you could test a set of fonts that you know might reveal something with this test.
The rationale is that you can maybe tell an OS version (especially on Windows, see above), or having some programs that add variants (Office?), or tell users who downloaded additional variants that weren't included with the OS ("Others can be downloaded using Font Book [...] Fonts that can be downloaded appear dimmed in Font Book.").
Checking fonts is perf heavy. Every time you query measurements, it eats time - there is no way around this.
What is eating time, exactly?
The text rendering? The measurements? Changing the DOM with adding and removing nodes?
starts here - very boring, long story, but look for my three posts or so with bold titles about testing, starting with the linked one
So TZP does 11 measurement tests, it used to be 7 - testing one measurement vs testing 11 measurements is approx the same (and yes, I commented out the obsolete methods in getDimensions)
Looping generic style then font vs font then generic also tested (I forget which is faster if any)
Also tested non-detected fonts vs detected. Negligible diffs.
I also tested just setting fonts and not measuring anything - it's like no time at all.
I'd say you can test only the font you know a user already have
Indeed
To be clear, the tests are not to detect every single font, but to maximize entropy. And they are focused on using whitelists and font.vis (RFP's level 1). I can 100% reliable detect TB and RFP and OS with clientside JS - and I change the font list tested accordingly. So for example, I do not mind or even care if I don't detect Tahoma on windows, because that is the font that is used in MS Shell Dlg \32 and is an expected system font, so it's absence in the font test results adds entropy (the mapped font may differ on various windows) and we already collect all that entropy (sizes) in [base sizes] fonts (click the details - TZP main test). I also don't care if I don't check for some fonts at all because of size collisions: e.g. on Firefox RFP windows 8+ if you have Nirmala UI you would also have Gadugi, so I can drop one of them. It's not worth testing for rare outliers (and ultimately the answer is mostly to bundle all fonts and ignore the OS completely)
That's why I was suggesting to check all weights: 100, 200, 300, 400, 500, 600, 700, 800, 900 and 950.
How do you propose to test that, ignoring the fact that eight more sizes is 8x the perf (or that we could be selective: e.g. a second run of non-detected fonts)? I can't logic this, sorry. If I compare any font with a different weight from say normal to the base measurement (which is normal), it will (depending on the amount) have a different size = false positives. And if I match the font weight in the base, then it won't be different. Not that I have tested any of this.
What does detecting expected fonts that fail to show up at normal gain? Almost everyone would be the same (I think we have enough other fonts for size differences, e.g. rendering, subpixels).
So you can't bypass my const style? I double dare you
Still, your script isn't finding it, but I've tried to manually set font-weight: 800, and the inspector shows Songti SC Black. So it's probably fingerprintable .
OK, I took that to mean my test found it because you "corrupted" the test by changing Songti SC globally or something
You shouldn't test them all, but you could consider also checking different font-weights when it makes sense, only for the respective values.
E.g., Segoe UI Semilight or Segoe UI Semibold for Windows 8.1.
But I have not tested how Windows responds to them when you set them as font-family, rather than font-weight.
From what I've understood, macOS would ignore them when set as families, but leak them with the weight (please ignore the fact that they are Windows fonts, I don't know macOS enough to provide you a similar example with macOS fonts on macOS you can test that with Songti: 10.12 doesn't include Songti Black, so you can fingerprint an OS minor in this way. Or if you can already fingerprint the minor, you can fingerprint a user that added that font, which is even worse: you care of entropy, but an attacker might care of information and outliers Songti TC still doesn't have black, only Songti SC has it, so no, you cannot use it).
Right. We could do a second run on certain fonts, a minimal list of possible variants. But I still don't get how you don't get false positives
Or if you can already fingerprint the minor, you can fingerprint a user that added that font, which is even worse: you care of entropy, but an attacker might care of information and outliers).
I know. But that can be a secondary PoC page, because we already know the answers (bundled same fonts + font versions for all users per OS and only use those period, including widgets and system-ui fonts etc) and are just improving the defenses.
And indeed, was going to say that detecting a change in fonts in minor OS versions is not guaranteed, but likely, to indicate the minor OS - because a user can still add (or delete) it (which as you say is worse entropy)
Side note: Ultimately I want this test to be hosted at tor project, and fingerprints collected (no PII, no IPs etc) into a backend database (denormalized), and we only accept certain results: e.g. any extension nonsense (excluding NoScript's signature - see proxylies) and we don't bother - and we only collect gecko data, and the FP contains if it is FF or TB, the version, the OS, if RFP is enabled etc. And we just store one FP per hash, maybe a counter but I fail to see the point with that. That way we can drill down into TB v102 audioContext keys, or FF v102+ RFP fonts and see how many buckets there are. The test needs to be performant. There's a lot more yet for me to add.
And even longer term, we could get an idea of entropy via a campaign to visit one time per user in TB, by passing a one time test-id
Because until we know what still needs action and is not equivalency (buckets), and how bad it is (entropy), we're really just guessing in many cases
From what I've understood, macOS would ignore them when set as families, but leak them with the weight
Here I use a special case for "not a family", as my baseFontsFull needs to handle it (in this case on windows normal FF, I also measure -moz-button and -moz-desktop as two distinct sizes from a widget/system test)
console
I'm not sure where all this is going and TBH I'm not sure it's feasible. I will open an issue at TZP so I don't lose track, and no need for noise here. I think I might buy a mac
works on windows, I tested it before I posted. The TZP test found (edit: on my windows on Firefox) 130, the fontface test via console found 125. I also tested when on about:home with a small test and it found fonts.
load TZP ... after all is done, you can set the fntList (or list it, or push to it) and then get_fontface() from the console. The default will be want TZP has decided, so on TB mac it is currently just the whitelisted system fonts (not added songti, pingfang yet)
So you could
console.log(fntList) // to check
fntList.push("PingFang HK Light","PingFang HK Medium","PingFang HK Semibold","PingFang HK Ultralight","PingFang SC Light","PingFang SC Medium","PingFang SC Semibold","PingFang SC Ultralight","PingFang TC Light","PingFang TC Medium","PingFang TC Semibold","PingFang TC Ultralight", "Songti SC Black","Songti SC Light","Songti TC Light")
Aha .. so FF leaks (sorry, that was FF without a whietlist) but TB doesn't ... interesting ... I haven't seen any issues about fontface, so whatever TB did we could uplift it under RFP?
Edit: except we don't even know if fontface "leaks", we need to add a whitelist in FF and test against that
Ignoring Linux patches due to bundling all fonts ... so how come FontFace doesn't work in TB but does in Firefox? IDK if we should edit: should block it, that is) - it's a common API - but then again, faces man ... so many of them (and between OS releases) but IDK if this leaks anything the current font test doesn't already since we're already almost tight our whitelist
Also nice handwriting
Lulz .. I usually scribble a mess and sometimes can't even read it myself and have to do a sherlock to figure it out (true fact, sherlock holmes does not use deduction, he uses abduction) - that one in the pic was done slow and big and heavy so it stood out and you could read it
I was lacking coffee the other day - ignore all my nonsense about false positives
You can change it via console: it only accepts a number (or a number as a string) see here - I might add a dropdown for users, but console is fine for now
e.g.
fntWeight=900fntWeight="900"
The first (fontdebug) is handy for obvious reasons: the smaller visual also applies the font-weight to be consistent and for the font inspector checks. The second (fontlists), when running a full test also includes fonts we didn't detect or aren't on our whitelist/kBaseFonts (all those black, light, medium, ultralight, semi-whatevers), and thus you we can check what fonts we can detect (by comparing all the results of the 10 weights) as possible extra entropy
On my windows 7 (no protection) I have 155 fonts detected. On 100 I have 155 (same hash). On 900 I have 154. I obviously don't have all possible 250+ windows fonts (or all additional 230+ "faces") - I need to set up my new windows laptop to be certain, but I think windows doesn't detect anything extra with font-weight TBH
I might see if I can record "full" results per fntWeight and then output hashes/diffs in a console compute_diffs() or something - how do you do a possible 10 way diff? - rhetorical
Anyway, without font protections, and with all fonts available - we can use these to determine what fonts show up outside of normal, and then determine if they pose any possible entropy (based on kBaseFonts and whitelist)
This does actually affect windows - the font changes e.g. Arial at 900 will use Arial Black
So I think the answer here is that it is only useful to test select fonts at select weights only where we can't detect them at weight normal.
This then brings up another conundrum. For example, if you have deleted FontA 700 but have FontA 900, what is the fallback. If 900 falls back to 700 then we would get a false positive (but we still get a measurement and added entropy) - and this becomes not about correctly getting the fontname, but getting entropy from sizes and the font-name tested. i.e it doesn't have to be 100% correct with names. My guess is the names are only used on certain weights - e.g. Black is always 800 or 900 (or whatever). Tedious testing to follow, one day
and this becomes not about correctly getting the fontname, but getting entropy from sizes and the font-name tested.
Yes, I agree.
If 900 falls back to 700 then we would get a false positive (but we still get a measurement and added entropy)
What do you mean? I think the fallback should not be a synthetic black, rather the rendering at 700, which should not be distinguishable from the regular bold, and we can take it as a "weight not installed".
So we have monospace 900 (monospace is not the font being being tested) being compared to Font 900, but that was deleted, so it falls back to Font 700 - and the sizes differ, so we call Font 900 as detected and collect the size as a bucket and shove the Font 900 into it.
but ... "this becomes not about correctly getting the fontname, but getting entropy from sizes and the font-name tested" - so we're good.... also "Yes, it's the specification of OTF" .. so that's not going to happen anyway (IIU my code methods and the specs correctly)
It helps to type things out sometimes, for my brain to catch up
Edit
(but we still get a measurement and added entropy)
And I think you might wrong with this (at least you seem to be wrong for Linux, to me): if font-weight 900 falls back to font-weight: 700, we should see that sizes don't differ.
^ that's the question. Suppose it does - this means a font exists and is being used rather than the base fallback - and sizes should differ compared to that base font (e.g. monospace). I say should because some fonts have size collisions and it entirely depends on what base font we're using (on mac I use -apple-system for a single pass, on linux I use all three monospace, serif, sans-serif). Using all three increases the chance of not a size collision and the font (or a font weight substitute) is detected. e.g. courier = the same size and courier new. If you default font (monospace) is one of those you won't detect it. Same with Roman, Times, and Times New Roman. Bit moot since those examples are expected system fonts. Almost all fonts have a unique size, at least at normal weight (and especially against MS Shell and -apple-system which is why I use them) except some family members especially in CJK. Noto fonts are notorious (yeah, I'm calling them out!) for having lots of size collisions. It's almost as if the original is used as a base/template.
Unless I'm missing something (and I will eaisly confirm when I eventually get around to actually testing) that's exactly what I said - the font will differ compared to base (base is the fallback, e.g. monospace default, or serif default etc)
Use fontdebug
type in liberation sans [1]
console type fntWeight = 700
run
does it not detect it against at least one of the base fonts (monspace, serif, sans-serif)?
then
type in liberation sans black (or liberation sans?)
console type fntWeight = 900
run
does it still not detect it against at least one of the base fonts (monospace, serif, sans-serif)?
So this is what I call a "false positive" on the 900 test, because (assuming there is such a thing as Liberation Sans Black) we will record it as detected and collect the size against that font. It will really be the size of Liberation Sans, but we record it as Liberation Sans Black (because that's the font being tested). That size probably differs to an actual Liberation Sans Black - and there's the entropy caught. And we also caught Liberation Sans in normal weight in the main test, and the size there matching the size here also tells us that this is a false positive.
[1] we assume liberation sans is not the default font for all three generic font-families
Let's give this a rest for a few days and let me get my bearings with modifying tests and getting around to my new mac and new windows machine. I know what I'm measuring and comparing to, and can already tell that any fallback to within the same family will be caught (baring size collisions which are super rare), otherwise it falls back to the base - and by using three bases, we always catch. Capisce?
No, non capisco, but I'm sending you data anyway. And coffee .
Everything green.
700
DETAIL: eb86409cmonospace offset: base 3072 x 307 got 4267 x 307 npixel: base 3072 x 307 got 4266.95 x 307 ntransform: base 3072 x 307 got 4266.96 x 307sans-serif offset: base 4267 x 367 got 4267 x 307 npixel: base 4267 x 367 got 4266.95 x 307 ntransform: base 4267 x 367 got 4266.96 x 307serif offset: base 5085 x 307 got 4267 x 307 npixel: base 5085.1 x 307 got 4266.95 x 307 ntransform: base 5085.1 x 307 got 4266.96 x 307system-ui offset: base 4316 x 316 got 4267 x 307 npixel: base 4316.15 x 315.5 got 4266.95 x 307 ntransform: base 4316.14 x 315.5 got 4266.96 x 307Cantarell offset: base 4316 x 316 got 4267 x 307 npixel: base 4316.15 x 315.5 got 4266.95 x 307 ntransform: base 4316.14 x 315.5 got 4266.96 x 307
900
DETAIL: 3bfebbdcmonospace offset: base 3072 x 307 got 4267 x 307 npixel: base 3072 x 307 got 4266.95 x 307 ntransform: base 3072 x 307 got 4266.96 x 307sans-serif offset: base 4267 x 367 got 4267 x 307 npixel: base 4267 x 367 got 4266.95 x 307 ntransform: base 4267 x 367 got 4266.96 x 307serif offset: base 5085 x 307 got 4267 x 307 npixel: base 5085.1 x 307 got 4266.95 x 307 ntransform: base 5085.1 x 307 got 4266.96 x 307system-ui offset: base 4442 x 316 got 4267 x 307 npixel: base 4442.15 x 315.5 got 4266.95 x 307 ntransform: base 4442.14 x 315.5 got 4266.96 x 307Cantarell offset: base 4442 x 316 got 4267 x 307 npixel: base 4442.15 x 315.5 got 4266.95 x 307 ntransform: base 4442.14 x 315.5 got 4266.96 x 307
Nowadays my serif is "Bitstream Vera Serif" (why, Debian?), and my sans-serif is "TeX Gyre Heros" (I think I did this), both rendered as bold, and they haven't changed between 700 and 900.
My monospace is Inconsolata (I love it, but should make Firefox take a less fingerprintable font... Fortunately I also have Tor Browser), and I have its variable version!! So, it should change size.
so what is this then from 900? base is 3072 x 307, you returned 4267 x 307
if it's green, there was a size difference. You said everything was green. You do realize I am talking about the base size (base is the five fonts listed: monospace, serif, sans-serif system-ui, cantarell)
monospace offset: base 3072 x 307 got 4267 x 307 npixel: base 3072 x 307 got 4266.95 x 307 ntransform: base 3072 x 307 got 4266.96 x 307
We're not testing Cantarell - that's the base fallback (if it has a variant that doesn't matter). We only care if the font being tested differs from any of the fallbacks
Sorry, this is Firefox. This isn't going to happen on Tor Browser.
I didn't understand you were interested in base, I thought you were interested in seeing what happened to Liberation Sans in 900 (that is, it doesn't change from 700)
Sorry, this is Firefox. This isn't going to happen on Tor Browser.
Yes. I need to test Firefox (without any protections, and get all fonts allowed on say windows 10/11) and see what can and can't be detected at weight normal. I will probably end up just looping thru fonts in font-weight buckets
{
100: ["fontA"]
400: ["fontB", "fontC"]
990: ["fontK"]
}
loop font weight, loop font, loop fallback
no font is listed twice
For RFP font vis =1, and TB whitelisting, there will be less fonts to add outside of 400 (whitelist smaller than font.vis which is smaller than the full OS list) - but yes, it would now mean we pick up variants if they were added/removed by the user - or differ between OS versions. This is what we wanted - a full proper test to catch those 0.0001% of outliers
So in truth I probably do not need to do any testing, just code it. But it means an increase in fonts tested - how much I'm not sure - there's a lot of variants in mac for example.
I thought you were interested in seeing what happened to Liberation Sans in 900 (that is, it doesn't change from 700)
Yes, I am. But only to see if we would get "false positives" (as in the font tested is not there but we detect change). I wanted to know what happens in certain situations
but yes, it would now mean we pick up variants if they were added/removed by the user
to be more precise: it would pick changes at each font weight - the size will either be the real thing (e.g. FntA Black) or a family fallback (FntA) = the entropy. Or the size may be identical between those two (who knows). The name is immaterial TBH
FYI: just an update, "variants" [1][2] are not "stable" in the current TZP font test
that test uses font-weight normal (we have to control all the style attributes so no individual font differs from the others)
on windows 7 this seemed stable (lack of additional fonts to test with), e.g. Arial Black/Narrow would be detected without fail if present
windows 10 VM (free iso from MS) locks adding fonts if not activated, so was bare bones, so again hard to tell
but on windows 11 at least, this behaves differently
on a cold load, e.g. new session, some fonts are not detected, and some are but at a different size (of their namesake) - e.g. Corbel Light would use Corbel and record Corbel's measurements.
on a rerun or reload then we would get a stable result - e.g. Corbel Light would actually be used this time and the measurement would be different to Corbel
what was and wasn't detected on a first run (per session?) was fairly random
NOTE: it is interesting that other font tests still use some of these "unstable" fonts
So I cleaned up the windows fonts in TZP, but haven't touched the mac ones which include e.g. black, narrow, bold, oblique, heavy, light, etc - needs testing and to be fair, we should be assigning the correct font-weight
One I guess hacky solution may be to run the test twice or "preload" these fonts, but that's problematic given the number of fonts (more than listed), and perf, and or timing with "fallback", and I don't think it works (it takes about 3ms to run thru the function but not measure anything) - but more importantly, there are other fonts that do not show ever when normal but can be detected [3] if the style is correct
So the solution at some stage is to to loop font-weight, then font (each font-weight having it's own list), and of course the base measurement would do the same). I will expose the global var fntWeight in https://arkenfox.github.io/TZP/tests/fontdebug.html at some stage, otherwise you can set it via console using 100 to 900