Friday, November 27, 2020

Release 1.3.0 - Partial Fix for Firefox 83 Slowness!

 This release is an emergency release in response to the release of Firefox 83.

TL;DR - Firefox 83 broke things for some users and made browsing unbearably slow. While things get properly fixed, I can make it faster again, but not quite as fast as the plugin would be in Firefox 82.

The long version:
This plugin leverages another excellent library, Tensorflow.js, that runs the AI models created for this plugin. Tensorflow.js gives many different ways to run the AI models, called backends. They all give the same prediction, but some backends are much faster than others. The fast backend (WebGL) started failing in Firefox 83 for some users, which caused the default slow backend (CPU) to be used instead. For at least two users, this made the browsing experience so slow as to be unusable.
Fortunately Tensorflow.js recently added support for another relatively fast backend (WASM) that I have found does not seem to fail to load in Firefox 83. I am adding in support for that new backend as a fallback. It is not quite as fast, but makes browsing usable once again.

 If you are experiencing issues, please disable the plugin and let me know over at Github  - thanks!

For the technically curious, the Tensorflow.js team has a great writeup on the introduction of fast SIMD in the WASM backend over at their blog.

 One final note - this version also fixes one issue that caused downloads to sometimes show up as gibberish rather than prompting for download.

Saturday, November 21, 2020

Firefox 83 Problem!

(Update: This problem has been partially worked around, see the later post on 1.3.0)

(Update 2: I have traced this back to the specific change in Firefox 83 that caused the issue and have posted an issue on Mozilla's bug tracker. Please be aware that given the nature of the commit that caused the issue, it's possible that fixing the issue experienced by Wingman Jr. may cause other things to break - so fixing this may not be as easy as it seems.)

(Update 3: I have found a technical workaround and have a full fix in progress - here's the post explaining the changes.)

 

Today my browser updated itself to Firefox 83, and it promptly made the addon on unusable! The underlying issue is something related to the way the graphics card, Firefox 83, and possibly Tensorflow.js are interacting. Note this may not affect all users but if suddenly performance became unusable after Firefox updated itself, this is why.

Workaround: Revert back to Firefox 82; otherwise the performance was poor enough you may have to disable the plugin until this can be resolved.

Things I thought might help, but did not:

  • Updating graphics driver
  • Reverting to an old version of Tensorflow.js. This also means older versions of the addon are unlikely to work either.

Technical details can be found with the bug I am tracking for this.

 Sorry for the inconvenience!

Wednesday, November 4, 2020

Release 1.2.1 - The Case of the Distorted Symbols

International users - this release is a bug fix release for you!
One of you kindly reported that they were seeing special characters such as "ä", "ö", "ü", "ß" and "€" showing up incorrectly as ¿½. This release should fix most instances of that happening, but please comment at https://github.com/wingman-jr-addon/wingman_jr/issues/70 if you are still seeing problems. Thanks!

 For the technically curious (or perhaps those who are having trouble falling asleep at night and need something boring to read), here's what was happening. In order to scan images that have been encoded as Base64 data URI's, I fully scan all documents of Content-Type text/html and do search and replace as necessary. However, when I get the document it is as bytes, so I need to handle the decoding from bytes into text myself. All examples out there just use UTF-8 for the TextDecoder, but alas, real life is a bit more complex - the source of this issue is due to incorrectly decoding non-UTF-8 docs as UTF-8. So now I try to do rudimentary encoding detection based on "charset" in Content-Type. An interesting followup is that when I turn text back into bytes, I use TextEncoder which - at present - only supports UTF-8, so I need to make sure the Content-Type gets set appropriately for that.

Note that using only Content-Type for character encoding detection is considerably simpler than the mechanism that browsers use, but it still hits a vast majority of the use cases even though it is not quite accurate. You can see how it fares against a selection of standardized tests by W3C. Character encoding detection is exceedingly sophisticated - if I still haven't bored you with the details, I recommend checking out the spec for those facing truly persistent insomnia.

Saturday, September 26, 2020

Release 1.2.0

 It's been a while since the last release. I've got a couple small things in this latest 1.2.0 release:

  • A helpful user, Stephen, submitted a feature request to add an on/off button to the main menu. While showing it isn't the default, you can now turn it on in the options. I know I'll find this feature valuable as well! It's useful when most of the time you're browsing safe sites but then need to go to e.g. a photo site of some sort to find some content. See the GitHub issue here. Thanks Stephen!
  • As you may have noticed, the image score for blocked images almost always shows "99" since the release of the zones feature. I finally got back to adding a bit better approximation of the image score.
  • The key library for the AI part, Tensorflow.js, has seen an upgrade from version 1.x to 2.x in order to make sure this plugin will continue to be compatible with it.

The AI model was not changed, so no change in how good or bad the filtering performs is expected. However, if you're into machine learning, you may be interested to know that I've released the model into its own repository now, too

As always, feel free to contact me at the GitHub project site: https://github.com/wingman-jr-addon/wingman_jr

Monday, May 18, 2020

Training from Scratch, pt. 2 - Mechanics

In part 1, I discussed the desire to successfully train MobileNetV2 from scratch to both act as a baseline for other architectures/variations and to ultimately better capture the specifics of the dataset better.

First it is worthwhile to discuss what it means to "train from scratch" in this context. I am using this as if it were some binary truth, but how well a dataset captures the population in conjunction with how it trains on a specific network architecture variant can vary a great deal depending on the network, the image size, the training regimen, and countless other factors. For my specific case, I wanted to achieve similar accuracy (say no greater than about 1% absolute accuracy) as the original finetune against MobileNetV2, alpha=1.0, image size 224 with the same loss functions. The finetune had achieved in the neighborhood of 73% accuracy for the raw classifier, giving a goal of 72-73% for the new training.

But how can this be achieved?

First, I needed to obtain much better hardware. As noted in The Quest for New Hardware series, I had already obtained an old GPU server with a K80 in it with roughly an effective 21 GB of GPU RAM. This was a giant step up from the Jetson TX1 dev kit I had previously and allows for greatly increasing batch size.

Second, I needed to consider the various changes typically made when training from scratch. Surveying several results led me to broadly make the following theoretical and practical changes:
  1. Switch away from Adam to SGD for best generalization, albeit at a likely cost of training speed.
  2. As part of introducing SGD, introduce a schedule to reduce the learning rate.
  3. Greatly increase batch size. Increasing batch size has several side effects, including (among others) an increase in regularization, a need to revisit overall learning rate, and of course a nice speed boost.
  4. On the practical side, greatly increasing the batch size also required changing the data pipeline significantly to keep up: this meant switching to a tf.data approach.
 Interestingly, these are almost all nearly inter-related. Unsurprisingly, the culprit for this interdependence is batch size. Batch size affects base learning rate and loading performance. After some trial and error, I settled on a batch size of 192. This worked well not only with the GPU RAM, but also with system memory (64 GB DDR3), cores (16-core 2.6GHz, hyperthreaded), and storage (WD SSD Red) and of course the variant of MobileNetV2. I'd like to say this was a refined scientific process, but sizing was largely driven by picking sums of powers of 2 and finding what fit without warning about allocation OOM's.
However, data pipelining was a bit more scientific, watching the overall utilization rate of the GPU's using nvidia-smi. As noted, this did require changing the data loading method to tf.data. Like many others I was using the older Keras style of PIL and load_image. tf.data is powerful, but lacked some of the image augmentations I had (e.g. rotation) while adding other powerful ones (random JPEG compression artifacts). I think one area where I initially stumbled was understanding the necessary preprocessing. In the past I had more or less used the textbook-Keras-style:

img = image.load_img(img_path, target_size=(224224))
= image.img_to_array(img)
= np.expand_dims(x, axis=0)
= preprocess_input(x) # Note preprocess_input comes from the relevant application namespace

While I expected changes in the loading, I thought that the use of preprocess_input would likely be preserved. Not so. The example finetuning MobileNetV2 from Tensorflow now has the following for example code (here):

def format_example(imagelabel):
  image = tf.cast(image, tf.float32)
  image = (image/127.5- 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

It would appear that this first maps onto [0-255.0], then onto [0,2.0], then onto [-1,1]. It seems a bit strange that the specialized preprocess_input would be replaced with this more specific set of code.
Even more interesting, there is also a tf.image.convert_image_dtype for use as well. My original code had this:

def map_to_image(hashrating_onehot):
    img_bytes = tf.io.read_file(IMG_ROOT+hash+'.pic')
    img_u8 = tf.io.decode_image(img_bytes, channels=3expand_animations=False)
    img_float = tf.image.convert_image_dtype(img_u8, tf.float32)# [0,255] -> [0,1]
    img_resized = tf.image.resize_with_pad(img_float, SIZE, SIZE)
    return img_resized, rating_onehot

So, as noted this maps onto [0,1]. After working on my results I noticed the apparent discrepancy and instead moved it to [-1,1]. However, to my surprise this greatly dropped training accuracy with other constants staying the same, leading me to believe the correct range is [0,1] despite contrary indicators. I'm still not sure what to make of that; I understand there are different range conventions but the mismatch has me puzzled.

The image pipeline code also had two frustrating aspects. First, there is no WebP support, so it is not possible to get equivalent of PIL image loading when dealing with a mixed bag of formats. Second, while the image loading code is quite robust, there are a few images PIL was able to open that tf.image.decode_image was not able to. Unfortunately, when an image failure occurs in the pipeline, you can't really just have an exception and keep going - it crashes and you don't get a helpful error message. This required me to create a separate script to try opening all the images with TF one-by-one first and create a blacklist of failures when loading up the dataset in training.

However, despite its limitations, the performance of tf.data is amazing. I was able to greatly increase overall performance and keep the GPU relatively well-loaded. My processors were nowhere near maxed so there is likely some headroom left there if I get a second GPU. I'm not sure where SSD reading is with respect to becoming a bottleneck, though.

So the change to training from scratch required changing several variables at once, with some trial and error. This took some time, and two other changes occurred during this time as well. First, I made an update to the dataset, increasing the Q class of images. Secondly, the sampling strategy changed a bit to oversample a bit more fairly. Unfortunately these both muddy the purity of results, but I am concerned primarily with the general progress.

The net result was a training from scratch of MobileNetV2 achieving 72.7% accuracy:


I was excited by this! Forward progress! While I did not achieve any earth-shattering accuracy improvements, I could now try other architectures, variants, and input sizes and have a reasonable baseline to compare.

Training from Scratch, pt. 1 - Motivation

The model currently deployed for Wingman Jr. is a relatively stock MobileNetV2 finetune with a bit of magic at the end. The dataset has been growing for quite some time and as I reached about 200K images I started seriously contemplating attempting to train from scratch.

Why train from scratch?

First, domain mismatch. Almost all base models are trained against ImageNet. While ImageNet is a fantastic proxy for photo-based populations, the domain of photos is not the same as the domain of internet images. The dataset has a significant portion of non-photo images: classical art, anime, stylized logos and icons, line drawings, and paintings. The closer I can get to approximating a target population of internet images, the more training from scratch has a potential to improve the model by retaining the parts that matter.

Second, ability to try new architectures and variations. Couldn't I try new architectures without doing this? Certainly. But many architectures do not have ImageNet pre-trained weights available, so finetuning is not an option. Additionally, having a bit of a baseline against a standard network helps provide a useful backdrop for comparison and helps to hint at whether a new network architecture is less capable vs. simply not having enough data.

Next time I'd like to discuss some of the mechanics of achieving a successful train from scratch; part 2 is now available!

Saturday, April 4, 2020

1.1.1.1 for Families Opt-In Support in Wingman Jr. 1.1.0

I was excited about a new service announced by Cloudflare this week - "1.1.1.1 for Families"! I admit, without an understanding of the company and the technology, that headline might not be the most eye-catching. Let me provide a bit of background.

Cloudflare is a technology company that provides many foundational services for using the internet. One exceptionally important service they provides is the DNS or Domain Name Service. While we think of internet addresses as text-based addresses, these text-based addresses are converted to a numerical form under the hood called an IP address that is used to route traffic. Specifically, the hostname - for example "google.com" - is represented numerically, but not the part of the address afterwards that goes to a specific page. Basically, every single webpage you visit "resolves" the hostname into this IP address by using a "DNS Provider".
One trick that has long been used is to block hostnames that contain questionable content by simply saying using a DNS provider that says "I don't know how to convert yourbadsite.com into an IP address", so all requests for media from that hostname fail. This is a lightweight check, and is a relatively coarse form of a blacklist. Maintaining this blacklist is a gargantuan effort, almost always a commercial one.
So what is this "1.1.1.1 for Families"? Well, two years ago Cloudflare launched their own DNS provider at "1.1.1.1". Now they have extended - free to the public - offerings that can filter out hostnames of known malware and adult content providers.


Wingman Jr. relies on AI to scan images fully client-side, which has the distinct advantage that 1) each image is considered individually rather than being lumped in with a whole site and 2) no communication with an external service provider is needed. However, as I've had at least one user helpfully remind me in an email, video is not blocked. Long term, I would like to support filtering video, but it is a difficult technical challenge to get right - and performant. One thing I can do in the mean time is provide the option to also block images and video by using the lighter weight DNS-based approach. This is now quite feasible thanks to Cloudflare!

So how does it work? Well, roughly speaking you go to the plugin's new settings area and enable DNS-based blocking. That's all you have to do. Under the hood, the plugin will start capturing image and video requests before they even occur and check the hostname with Cloudflare's servers. If Cloudflare says to block it, the image or video request will be aborted - you won't even see the usual Wingman icon or the update to the number of blocked images.

Now here's the thing: while there is a definite upside to this - a second layer of blocking, in some cases better efficiency, and basic video blocking - enabling this option does communicate the domains you are fetching media from with Cloudflare. Additionally, some websites with rather mixed content may end up being categorically blocked. These are tradeoffs - which means I am making this an opt-in only feature.

However, I'm excited about this new option! I believe it makes sense for many users. I also want to thank the user that took the time to write me an email and got me thinking about this - it's great to hear how people are using this plugin and what they'd like to see next. Look for an update in Firefox soon - I plan to release this with version 1.1.0!