Friday, January 24, 2020

Release 1.0 is here!

Woohoo! After well over a year of releases, I'm excited to announce version 1.0!

Why now? What makes it special enough to flip to version 1.0?

The big new feature here is "zones". Now you can pick different zones for different levels of sensitivity on blocking. This can either be manual, or it can be automatically managed based on how many images have recently been blocked (the default). The "neutral" zone is not too different from the current blocking level, but the "trusted" zone that engages when not many images have been blocked recently might just provide you a better tradeoff on accidental image blockages. I've found the experience to be better when I'm using it, and I hope you do too. If you're interested in the backstory on zones, I recommend checking out my series of posts on model sensitivity, starting here: Model Sensitivity - Part One.

While this release is a landmark, it is just the foundation. I'm excited to get your feedback! Speaking of feedback, thanks so much to those who have given feedback. It's been helpful!

Model Sensitivity - Part Three

In the first post, I discussed different ideas of what sensitivity could mean. In the second post, I discussed how tradeoffs in sensitivity were used as the basis for the "zones" software feature, allowing for a "trusted", "neutral", and "untrusted" zone.

In this final post on the initial introduction of sensitivity, I'd like to discuss one more feature: automatic zone selection.

The idea is simple: if the plugin sees that a number of images were recently getting blocked, it might be a good idea to try to move to a less trusted zone because it might mean we're letting too many bad ones through; similarly, if we haven't blocked many images in a while, it might be a good idea to realize we are likely in a more trusted zone and would be better off not blocking extra good ones.

So for example, suppose you are browsing sites you know and love and that don't have any questionable content. However, suppose you then get linked off to a bit more questionable site - but not objectionable enough that it makes sense to leave. In this situation, it would be fabulous if the plugin just started being a bit more picky. With automatic mode, the switch can happen automatically.

For the stats-minded folks out there, internally the number of predicted positives is multiplied by the precision to get an estimator of the number of true positives. The estimated true positives - as a percentage - controls which zone is selected.

For the first release I plan to make automatic mode the default, but the different zones can be selected as well - which will turn off automatic mode.

And that's it for the introduction on sensitivity - I'm quite excited about getting this feature out! I'm looking forward to getting feedback on how well it is working for everyone.

Thursday, January 23, 2020

How the Image Scanning and Blocking Works

In this post, I wanted to briefly discuss how the image blocking mechanism works. Notably, this post is not about how the model works - just the actual mechanics of blocking the image. I believe the first part of the post is relevant to all users, while the second part of the post may be of academic interest to other developers - a little something for everybody.

For everybody:

The image blocking works roughly like this. A web page usually downloads the "outline" of everything first, and then downloads the images and other resources after that. The image blocking works by tying into the step where the other images are downloaded. This means that the image as seen by the webpage has been replaced. This has some important implications. If you were to right-click and save the image, it will save what the webpage received: a placeholder image that does not contain the original image. However, this also means that even if you were to turn the plugin off, the image would stay that way until you did a full page refresh.

There are two interesting side notes.

First, if you directly got a bad image's URL, the image can't be replaced in the normal way due to technical limitations. Instead, you will get a note that says something like "The image <your web address here> cannot be displayed because it contains errors." This is because internally the plugin has aborted the image download.

Second, some web pages work slightly differently: rather than downloading the "outline" and then the images after that, they may include the images directly as part of the web page using something called a "data URL". Notably, Google Images does this for their first "page" of search results, and I believe it contributes to how they load so quickly. To handle this, the plugin scans the web page and replaces these images as well. This works reasonably well, but occasionally it may run into a new way of doing this that I haven't encountered yet and it fails to scan (and thus block) any images. If you suspect this, please comment (or file a bug on GitHub so I can start tracking it right away! https://github.com/wingman-jr-addon/wingman_jr/issues)

One last thing: the way that the scanning/filtering works is not available on Chrome, and sadly will not likely become available. 


Additional technical details for developers:

This uses the webRequest.filterResponseData() interface internally, which has worked well. As noted, that is not available on Chrome and since the existing partial functionality is actively being neutered in the name of performance and security. So, a Chrome port as such is not likely to be possible. I am actively tracking potential ways to do this however; see https://github.com/wingman-jr-addon/wingman_jr/issues/2 and please comment if you know ways to work with this. Also, I've seen this project as well that would be great if I wanted to switch methods. However, I'm just not convinced that scanning at a DOM level is going to be as robust.

Internally, I serve up SVG's as the placeholders. This means I need to change the MIME type, which normally works great - but not for the directly visited URL's, interestingly - which is why those produce an error message instead of the placeholder SVG.

Note the review form wraps the original image into the SVG and makes a translucent overlay.

Along the way, I discovered that data URLs are unfortunately not hooked by the webRequest API. There is a bug in for it, but I think it is unlikely it will be fixed anytime soon. Regrettably, that leaves me only with the hack of scanning every HTML document for base 64 URLs. Fun times with regex!

Google's use of embedding data URL's in special ways has caused more than one issue in the past, see [1] and [2]. If you see an issue with Google or another site, please add a bug over at https://github.com/wingman-jr-addon/wingman_jr/issues

Let me know if you have any questions on this API or just go look at my source. Thanks for reading to the end!

Wednesday, January 22, 2020

Model Sensitivity - Part Two

In part one, I talked about different ideas of what sensitivity could mean and the planned approach for the Wingman Jr. model. Today I'd like to talk about the software features around sensitivity that I'm working on for the 1.0 release.

As discussed last time, a binary classifier is simply a program that says yes or no given some input. In our case, the input is a picture and the model says "pass" or "block" as the two possible outcomes. A typical way of grading this is to consider the tradeoffs between how many images are correctly blocked vs. incorrectly blocked at different thresholds. This allows us to pick tradeoff points for different types of model behavior.

For Wingman Jr., I've picked three tradeoff points that correspond to three ways the user might wish to use the plugin, based on what "zone" they are browsing in.
  1. The user trusts the zone they are browsing in - pick a tradeoff that rarely falsely flags an image as bad, but still catches well over half the bad images as a safety net.
  2. The user does not trust they zone they are browsing in at all - pick a tradeoff that catches almost all bad images, but falsely flags a number of images as well.
  3. The user has a neutral opinion about the visual safety of the zone they are browsing - pick a tradeoff that balances catching most of the bad images but also flags some false positives.
Being able to choose the "trusted zone" is great for situations where the user is seeing a number of a times that the model is saying something is bad when it is not. While bad images should get blocked, it is easy to get frustrated if a bunch of images are getting blocked for no good reason.

Similarly, if you know you are going to visit someplace that is a bit less safe it is a good tradeoff to block almost all the bad ones at the cost of some good ones.

I'm excited to roll out this feature but I have one more thing included that I'd like to discuss in the next post.  The conclusion of the series is now available here: Model Sensitivity - Part Three

Sunday, January 19, 2020

Google Search Results Bugfix

Just wanted to let everyone know today I found a critical bug that affected page load of Google search results. If you haven't noticed, Google has started including icons for websites as part of the normal search experience. I had a bug around the loading that caused some search result pages to hang. Sorry about that - fix is just getting pushed out now!
For the technically curious, this is a hotfix for problem #1 listed at the bug report here: https://github.com/wingman-jr-addon/wingman_jr/issues/33

As always, feedback is welcome. Sorry for any failed page loads you might have experienced!

Saturday, January 18, 2020

Model Sensitivity - Part One

I got my first feature request today for the Wingman Jr. plugin -thank you, anonymous user! A user was asking for a way to change the sensitivity of detection. As it turns out, I've recently been looking at some related things from the model prediction side, so it is good timing.  I'd like to talk today a bit about how machine learning models view sensitivity vs. how humans view sensitivity. (And for you stats nerds out there - set aside the definition of "sensitivity" for just a bit.)

Let's start with how humans typically view sensitivity. Suppose a human is asked to grade how mature a movie is, perhaps using something similar to the MPAA ratings of G, PG, PG-13, R. We generally expect that a human uses more of a gradual rating, where - for example - increasingly violent content might cause the rating to go from G to R. We would expect a gradual, smooth transition as the objectionable content increases. With this in place, we can then say things like "I'd only like to watch movies that are up to PG-13". So here sensitivity might mean something like "a gradual scale that allows for a cutoff at a certain level of objectionability."

However, for machines this isn't always the case. Often times they take a more probabilistic approach. For example, a quite normal approach is to build a "binary classifier" (a program that just says "yes" or "no" for some input) and look at the percentage of "yes" and "no" that come out. So, you might be able to say about a certain model - "this catches 90% of bad images, but blocks 10% of good ones by accident." While it's not ideal that any should fail, this makes the tradeoffs easy to reason about from a statistical perspective.  So here sensitivity may be roughly defined as "a cutoff point that guarantees a certain statistical balance yes and no classes". (Strictly speaking, the statistics definition is solely concerned with the rate of true positives to true positives and false negatives but I would argue most humans have a much squishier view of the meaning.)

Unfortunately, this statistical view - while useful - often leaves the human judge quite dissatisfied. Grading does not fail gracefully. In our hypothetical example, the human might ask something like "Why did this movie with an intense scene of gore get rated G instead of R? If had at least been PG-13 it would be somewhat understandable, but this is terrible!" This type of failure, the lack of understanding and "smoothness" in grading successes and failures, is one of the key challenges facing real-world use of AI today.

For Wingman Jr., I'm planning to take an approach that hopefully captures some of the spirit of both notions of sensitivity.

A bit of background first. Currently the model is built from a selection of graded images falling into one of four categories: safe, questionable, racy, or explicit. The current strategy has been to do a fairly standard image classifier approach with these four classes, but with weighting to penalize confusions between e.g. explicit and safe more heavily than say between questionable and safe.

So first, with respect to the statistical aspect of sensitivity implementation: I'm working on distilling the prediction of the four classes into simply two: safe and not safe, as a numeric score. This allows for analysis with traditional means like the AUC ROC.

With respect to the common sense aspect of sensitivity: while I can't make the grading perfectly smooth, I can instruct the tradeoff to be sensitive to the severity of the image. So, more "explicit" images will be caught than "questionable" images at a given cutoff threshold because I've purposely over-represented "explicit" images.

I have some initial results that seem promising, but I have some work left to do before it's ready for prime time - a useful piece of software is more than just a machine learning model.

Stay tuned!

The next installment of this series is available at Model Sensitivity - Part Two

Model Accuracy

As I start to post about the model I'm building for the Wingman Jr. plugin, I'd like to take a bit of an usual step:

I don't plan to talk about absolute accuracy or test results extensively.*

Why? The usefulness of accuracy is helpful when comparing models against a standard benchmark. However, while some ad hoc data sets have appeared for comparing NSFW detection models, the sheer vastness of both safe and unsafe image classes is staggering - meaning that accuracy scores can, to a degree, be misleading. This may change as new data sets become available, and it may make sense to report on them - hence the asterisk. Additionally, I may discuss relative improvement in the model against my own dataset - that's important for communicating progress.

However, even the internal dataset may change over time in response to flaws found in the data distribution. For example, I have already had to specialize into certain types of sports fairly extensively. Olympic runners - whom most would not identify as NSFW at all - are visually quite similar to swimsuit models. It is important to get these "hard negatives" correct, and if it means that my baseline dataset is not stable - oh well. The result is more useful to you, statistics notwithstanding!

This is not to say that I won't post graphs and charts from time to time - just make sure you take them with a grain of salt as your browsing patterns may be quite different than my collected images.

What I do hope to talk about is the journey: techniques, strategies, features, challenges, and the like. Stay tuned!

Finally - I hope that as I work on improving the model that you can laugh with me as it misclassifies the occasional car, map, or other random picture as a bit naughty. (I just had one user report it was blocking a car photo.) Yahoo's open_nsfw has been known to fail on Charizard, and apparently nsfwjs at one time had it out for Jeff Goldblum - perhaps we can add a few more spectacular failures to the list.

Review Mode

(Note: see update below for resolution.)

A user recently gave me some good feedback: "Confusing settings. Couldn't work out how to hide images again after making them faded white". I believe they are talking about the review mode.

There is currently a setting "Toggle Review Mode" under the right click menu for images. This will cause review mode to be turned on or off so that blocked images can be reviewed. When it is turned on, blocked images will display the original image in a grayed out fashion with the score icon. Also, it is important to know that an image is scanned and modified when it is downloaded. This means that if an image is currently in review mode, you won't see it change to be fully blocked without a page refresh.

Now, obviously some users may not want this feature as it presents a clear method to bypass filtering. On the other hand, many users may want this because they would like to see if the plugin is actually working. Internally, I use it to evaluate the effectiveness of the plugin. So I'm torn about including it or not - which is why it is not clearly listed as a feature right now.

So - I've put together a poll. Let me know what you'd like to see!

Update: I have decided to remove "review mode" as a user-facing feature, due to the addition of a new feature. I am now logging thumbnails of blocked images to the console using Firefox 73's new feature supporting data URI's. I believe this gives the best of both worlds!