Thursday, May 18, 2023

Release 3.3.4 - Revenge of the Character Encoding

Previously, in The Case of the Distorted Symbols I talked a bit about some improvements being made to better handle character encoding detection - this is the followup. If you're a non-technical reader, just know that some sites should hopefully work better to soon for displaying accented characters etc. as they ought to be rather than as symbols. If however, you're a technical reader, read on for some interesting notes about handling character encoding on the web.

 I've had at least one dedicated international user helping report bugs. To that end, I'd like to thank Drago for the helpful feedback in reviews. Recently, Drago reported that a specific site wasn't working, so it gave me an opportunity to debug further and nail down the specific problems.

In the first round, I took a naive approach to detecting character encoding and was able to pass most of the test suites found here: https://www.w3.org/2006/11/mwbp-tests/index.xhtml 

However, I had some interesting problems:

  • My original approach would read in bytes and output them through the TextEncoder as utf-8. This is problematic because the input bytes could actually have been in iso-8859-1.
  • True character set detection is quite difficult because you have to actually sniff the request contents because the headers are not enough to definitively determine the character set type.
     

By default, the new implementation starts in iso-8859-1 and then "upgrades" to utf-8 if any of a variety of conditions are encountered:

  1. Headers: Content-Type has a charset
  2. Content sniffing: starts with BOM
  3. Content sniffing: XML encoding indicates utf-8
  4. Content sniffing: meta http-equiv Content-Type indicates utf-8

Content sniffing uses the first 512 bytes currently, and the specific upgrade types have quite narrow search patterns - e.g. different ordering of http-equiv could cause non-detection etc.

All of the tests in the W3C test suite now pass!

With the improved approach, I'm hopeful that >90% of international pages will be correctly handled now, but we'll see what folks like you encounter - let me know if you encounter any bugs via the feedback link in the addon or via https://github.com/wingman-jr-addon/wingman_jr!

Thursday, May 4, 2023

The "Hidden Tabs Keeps Appearing" Problem

 Summary: Hopefully this is fixed in 3.3.3, but see Note 2 at bottom if you still have problems!

I've been hearing scattered reports for quite some time about how the hidden tabs prompt keeps appearing; enough that I added a dedicated category for it to the exit survey so I could see if it was a major issue or not.

And the answer was a resounding "yes". I also started seeing the occasional review that discussed it, too.

Unfortunately as is often the case, I could not reproduce the issue, and I did not have a clear way to find out what was causing it. Fortunately I was able to respond to one of the reviewers and the worked with me to capture logs from their system. So if this fix works for you, please tell Umbrella123 over on the GitHub issue thanks! (See https://github.com/wingman-jr-addon/wingman_jr/issues/185)

So what was the problem and why did the "Hidden Tabs" prompt keep appearing?

First, it's important to keep in mind that originally the addon did not need these hidden tabs. But unfortunately some slight changes in how Firefox and the machine learning framework Tensorflow.js interacted caused the old approach to become unbearably slow one one Firefox release a couple years ago. This forced a massive rewrite. As part of the rewrite, I needed to basically separate the addon into two parts: the original core part, and a satellite "web page" that can access the graphics card more effectively. Having these separate parts has two important implications: 1) the "Hidden Tabs" prompt shows up to hide the satellite "web page" and 2) there is increased complexity to get the two parts to talk to each other.

Second, the increased complexity of the system drove me to rely on a "watchdog" - the addon will send itself a known test image every few seconds and check that the result is as expected and consistent. This makes sure data is flowing through the system OK. This is quite important because generally people expect the addon to work even after scanning tens of thousands of images, just like the browser does. I'd also seen sometimes in the past where Tensorflow.js eventually could become unstable so it helps guard against that. If the "watchdog" self-test fails, the addon will assume that something has gone wrong with itself and restart after a certain threshold.

You can probably see where this is going now: when you see the "Hidden Tabs" prompt, it is the addon restarting itself because it thinks it is malfunctioning.

For a while I assumed this was probably due to some bad interaction with another addon that also did filtering. But it turns out that the true cause of the problem is that Tensorflow.js on certain Linux systems seems to give inconsistent results from the first prediction to later predictions for the same test image - the values can be close, but still different. So the self-checker would bail after a few self-checks and the "Hidden Tabs" prompt would appear.

I would argue that this seems to be a bug in Tensoflow.js on Linux, but I decided to make the check more robust by doing an "approximately equal" check instead and now it is working. Hooray! Special thanks to Umbrella123 for helping finding the issue, Opensourcerer for confirming the fix, and all the users reporting they were having this problem. Please let me know on GitHub or in reviews if you are still seeing issues, but hopefully it's just gone now.

Note! Umbrella123 also pointed out that they needed to set two settings to work - in about:config, set webgl.out-of-process to true and webgl.force-enabled to true. Please try these if you are having issues.

Note 2! Codedotexe reported that they needed to increase the tolerance for their computer - I'll plan to widen it for the next release as per https://github.com/wingman-jr-addon/wingman_jr/issues/191