Wingman Jr.: 2021

Thursday, December 30, 2021

Thumbnail Scanning Size

Let's talk about thumbnails here for a bit.

Usually when you're dealing with an image classification problem, if you can figure out what the thing is, then you can classify it, which is easy. But for NSFW content, this turns into a bit more of a grey area. Sure, there are some things that - no matter how small - are clearly NSFW. But for others, as an image either scales down in size or becomes more abstract, it starts to cross a line where it goes from R to PG or even G. Where exactly one should draw this line is quite subjective.

I just received a feature request from a helpful user, Sneed:

A setting to optionally block small thumbnails or set an arbitrary minimum/maximum would be nice. The thumbnail for the image on this webpage does not get blocked, when you click it to expand, it IS blocked.

I've definitely run into this too. The current logic is around this area of the code and basically has two conditions to skip blocking: if an image is less than 36x36 pixels or the total image size is less than 1024 bytes. The size in bytes is perhaps less obvious, but the idea is that an offensive image generally needs a certain amount of complexity, so if I can indicate that the image is too small by just the number of bytes, I can skip even decoding the image to find the true dimensions which helps with performance.

This is an area I haven't turned my attention to in quite a while, so I'm going to give it some thought. I'm tracking it over at this issue on Github, so feel free to join the conversation there.

Sunday, December 26, 2021

3.3.0 Now Available!

Three main updates this time!

First, GIFs will finally be scanned on a frame-by-frame basis rather than simply scanning the first frame. Note that the replacement image logic still has a bit of work so blocked images will often appear as a broken GIF. See the blog post for more details! https://wingman-jr.blogspot.com/2021/12/gift-giving-season-is-here.html

The second feature - default zones - comes from a new contributor, Abdullah! (https://github.com/abdullahezzat1) When the addon starts up, you can now pick which zone is set by default in the settings. This is a great feature and I think it will work especially well with the way some people want to use the plugin. Thanks Abdullah!

Finally, I tweaked the flags on the Tensorflow.js library/model startup (for the WebGL backend). This part of the startup is what causes the several second delay, so it's an important place to try to optimize. With the new settings, I've seen my startup time go from about 10 seconds to 5 seconds, but every computer is going to be a bit different. Let me know in the feedback link how it's working for you!

Saturday, December 18, 2021

GIFt Giving Season Is Here!

... that's right! True GIF filtering! (with a side of dad jokes)

Just like socks or ugly pajamas you get for Christmas, it's perhaps something you never thought you'd need. In fact, isn't Wingman Jr. already filtering GIFs..?

Well, sort of. From the browser's standpoint, GIFs occupy a strange space between video and static images. This has some implications. Warning: tech ahead!

First, it is easy (and has been for a long time) to load GIFs with the <image> tag. While the <video> tag is great now, it wasn't until HTML5 that the video tag gained the support it now enjoys. What's interesting though is that folks made a choice to not add GIF support for the new(er) <video> tag - GIF's are left solely to the realm of images.

From a lot of web developers' perspectives, that puts them in a bit of a bind. Why? Well, the <image> tag and related API's for manipulating images don't give you any way to control the animation aspects - for example, you can't control what time in the image to show.

How that impacts Wingman is that we only get the default behavior when we're loading and drawing the image to prepare it for filtering - and that behavior is to simply draw the first frame. That means the rest of the frames don't even get a chance to get filtered because we can't indicate that they're a normal video type. To make matters worse, others have done research that use adversarially designed GIFs to defeat image filters by leveraging animation - for example by putting a black frame for a tiny amount of time before showing the full NSFW image.

Back to GIF support: the lack of good GIF support has fortunately not dampened the spirit of web developers. The workaround has typically been to employ a library that parses and decodes GIFs in a Javascript library, rather than in the browser itself. Unfortunately, decoding video truly is something better left for the browser. However, there are some libraries out there to handle this - see gifuct-js, jsgif, or libgif-js.

But as great as these libraries are, the use case for Wingman is a bit more constrained. While it's true that ultimately the images are going to just get rendered onto a canvas like these libraries do, the addon does its best to consider performance, and canvas and drawing management has actually been one area of a bit of optimization over the years. Additionally, there's a difference between parsing and decoding - these libraries choose to do both. Parsing deals with understanding the video frames and the general content, decoding deals with actually decompressing the bulkier image data. And that's a bit problematic: the Javascript has to implement a simple LZW decompressor to decode each frame into essentially pixels. Not so great if you're trying to do that for a page full of images and both your code and the browser maintain a copy of the pixel data. This reason, along with the fact that I do not wish to become dependent on more libraries if possible, has made me hesitant to pull in true GIF support via one of these libraries. However, it's one of those areas lacking support that has bugged me.

But I woke up from a nap the other day with a burst of creativity, and took a look at the GIF format internals once again with a fresh set of eyes. It came to me that instead of trying to solve both the parsing and decoding problems, I could instead solve just the parsing problem and leave the decoding to the browser.

The way the GIF format works, the structure is basically a header followed by an alternating mix of image frames and blocks that indicate a time gap. It allows for the image frames to stream in and update the existing image being displayed. With the way the details work out, it is possible - and indeed relatively easy - to parse out the GIF's header and image frames, then repackage (remux) each frame into a standalone GIF using the header information.

While the GIF format allows for arbitrary patches to be updated over time on an image, the reality is that most GIF's fully replace (or nearly fully replace) the image on every frame. Particularly for images that are more photographic in nature (as many NSFW images are!), it is a bit more difficult to take a patch-based approach to image drawing, so it is generally avoided.

This overall situation is ideal for Wingman: since most images for the NSFW-filtering use case are full replacements, standalone GIFs can be created as the data streams in, and then filtered just like normal images. The code to do the actual parsing/repackaging is quite small (~250 lines), and the actual decompression is left to the browser - letting the browser do what it does best.

I still have some tweaking to do before it's quite ready, but it's up and running fairly smoothly now. If you're feeling adventurous, go check out the GitHub issue for more details and a pointer to the branch under development!

Thursday, December 9, 2021

Whitelist Feature?

Hello everyone!

I got a new feature request here recently from a user who goes by "pakxo" - they're curious if a whitelists feature could get added to save on processing for known good sites.

Currently, there's a basic, non-configurable whitelist. It helps handle the edge case where sometimes captcha images would get blocked, which is quite frustrating.

If anybody else would like to see this added, let me know through the feedback link in the addon! And again - thanks pakxo for the idea!

Wednesday, November 10, 2021

3.2.0 Released!

3.2.0 is out now! The main thing in it is a new version of the number-crunching library, Tensorflow.js. I'd been waiting to update from version 2.7.0 because I wanted to ensure that the 3.x series was stable and fast. Well, now it's stable and fast! Their team has been hard at work (see, for example, this bug report) - thank you developers! Your experience may be different, but it was about 22% faster on my computer using the WebGL backend, so hopefully you see a speed boost as well. Enjoy!

Thursday, August 26, 2021

3.1.1 Out Now!

3.1.1 is in the process of getting released now! Do note that Mozilla is currently experiencing some issues with their review process, which may or may not end up affecting public availability.

(Update: it is still in review, there is a discussion about it here)

(Update 2: It is still in review (2021/09/13) after ~3 weeks. This is quite odd for Mozilla, and I'm a bit troubled; more troubling still is that the community rep has largely gone silent on Discourse.)

(Update 3: Still in review 2021/09/25, quite troubled at the lack of communication. Do note that you can always pull the latest release directly from Github if you want to try it out before Mozilla updates.)

(Update 4: Apparently Mozilla had an issue around addons using the "proxy" permission, and have issued a workaround. I added the workaround and have released this as 3.1.1!)

This version has two main improvements:

Crash detection - Every once in a while something would crash or go wrong in the addon. Everything would seem to be working, but prediction results would be completely wrong. Now the addon will self-check once in a while and reload itself if it doesn't get back an expected prediction result.
Video scanning improvements - Eventually I'd like to have full MPEG-DASH support. I'm not there yet, but I can now more or less handle peeking at videos that are sequentially read in sections (range requests) - for example, many Reddit videos. Do note that if video scanning is too slow for you, you can turn it off in the addon options area.

I hope you enjoy it! Let me know if you have any feedback using the link in the addon's dropdown.

Saturday, August 7, 2021

Crash Detection

Tonight I was working on a bug you may have seen that is sneaky, so I wanted to blog about it quick and let you know I have a basic fix inbound.

First let's start with the symptom. Have you ever had Wingman Jr. loaded and running in the browser for a while, but then it just seems like questionable stuff starts slipping through and the addon might not be working? (You may have seen this an image- or video-intensive site like Imgur.) You can see the issue over at GitHub.

Well, your hunch is correct! But the way in which it is failing is interesting. Some part of either Firefox or Tensorflow.js crashes. But the end result doesn't crash the whole addon, which would cause a reload. No, instead it simply causes the model prediction results to be incorrect. I've most often seen it send all zeroes back for the model prediction, which turns into "safe".

So to fix this I needed to get a bit creative. The addon will now routinely check itself by using a "watchdog" function. It uses one of the silent mode images already packed into the plugin and checks that the prediction results for that image are the same every so often. If they aren't, the "watchdog" checker will give it another chance or so, and then automatically reload the plugin. To make things a little bit performant for your computer, it will temporarily stop checking when Firefox thinks that the user is idle.

Fortunately this solution seems to be fairly reliable so far in testing, but does have the downside that it may cause a pause during the automatic addon reload. Do note that reloading the addon will cause it to go back to its default automatic mode. If preserving this is important to you, please let me know via an issue in Github or in the feedback link in Wingman Jr. itself.

Friday, July 30, 2021

Next Steps

I've been taking it a bit easier after the release of 3.0.x, thinking about good next steps to work on. But I wanted to write a quick post to keep you in the loop for things I'm thinking about:

Load Performance - I'm always keeping an eye out for performance improvements; in particular, loading takes so long. This is due to Tensorflow.js model load times, as the first time it does an image prediction it has to do a bunch of extra work. But the team has been hard at work on this and I'm evaluating some work they've already done.
Better Video Support - I've gotten some exit survey feedback that they wished videos/animations were better supported. Well, so do I. I'd like to see if I can think up some ways to support certain DASH and HLS videos a bit better; I have some ideas but we'll see. I think a good next target might be Reddit support.
Model Architecture - AI (or more accurately, machine learning) is a booming field right now, with many strong advancements occurring fairly regularly. Vision Transformers (ViT) as well as certain Unsupervised/Semi-Supervised Learning techniques have been making strides recently, so I'm keeping my eyes open for possibilities there. In particular, I'm intrigued by the hybrid CNN/ViT approach taken by Compact Convolutional Transformers (CCT).

I'd also like to take a moment and thank the Tensorflow.js, Firefox, and mux.js teams for helping make the software that makes this possible!

Sunday, March 7, 2021

3.0.0 Released!

I have finished development and testing of 3.0.0 and am excited to say it is now available over on AMO - check it out! https://addons.mozilla.org/en-US/firefox/addon/wingman-jr-filter

See the post 3.0 Inbound Soon for a quick list of new features and updates!

Sunday, February 28, 2021

3.0 Inbound Soon!

I'm excited to tell you about the upcoming 3.0 release! I've finished most of the key development on it and will be moving to a further round of testing/tweaking relatively soon.

So, why move to 3.0 since the move to 2.0 was not long ago? Let's take a look at the feature list.

Video Peek

I've blogged about this before but a simple form of video scanning is coming. It will "peek" at a good chunk of the first part of a "basic" video and block it if bad content is detected over a sustained period. "Basic" videos include most simple embedded and banner ad videos and some services such as TikTok. More advanced "streaming" video services basically need to be handled on a case-by-case basis because behind the scenes they work differently from each other; for now, the only "streaming" site support is YouTube. Here the video is streamed chunk by chunk and each chunk has the first section scanned.

When a video is blocked, the behavior will depend on when the blocking occurred. If it happened right at the beginning, a placeholder Wingman Jr. icon may appear in the video stream. If it happens midstream, it is not easy to insert an image so the video will simply stop playing; there is also a visual indicator that will appear on the Wingman Jr. icon - to be discussed a bit later.

Note this feature is still quite new and may need some further refinement, but I found the current form useful enough that I wanted to share it with you. There will be an option to turn video scanning on and off apart from image scanning. Don't hesitate to send in feedback using the link in the popup options after you've used it - there are a number of constraints on what I'm able to do but I'd like to hear about your experiences to make it as good as possible.

New Model

The model in 2.0.1, SQRXR 62, has been in use for a long time. I've done many experiments to see if I could create a worthy successor, but improvements had long been marginal enough that I did not wish to change the current experience.

However, that has now changed with the advent of SQRXR 112. It builds on a slightly different base model and achieves better results. I'm still working on the final cutoff parameters to use for the release but the bulk of the work is done. If you're a machine learning nerd, you can use the model in your own projects - check it out here.

Silent Mode

The human psyche is such that we are curious creatures - and it is human nature to seek out the "forbidden fruit". Currently the browsing experience accentuates where the blocked images have been; even though they are not visible, it can potentially promote a dark pattern where one can want to click on the image slot to see what was there.

I've long had an issue out to improve this with a "silent mode" where blocked images are instead transparently replaced. I gave it a whirl and so far I'm quite liking the results. The actual implementation places a small watermark with "W" and the image score in the center of the replaced image, so it is discernible if you look closely. However, in a wall of images it does not stand out heavily and cognitively I've found it to be significantly less jarring.

Scanning Progress Feedback

Have you ever wondered in the past: did the addon get stuck? Or is this image just taking a long time or failing to load?

Now you'll be able to have more clarity on that. A simple progress bar has been embedded in the main browser action icon, so you'll be to track how many images are queued up to be scanned. Additionally, it provides a video scanning indicator in the form of a tiny "v" in the bottom right; a blocked video will also cause this area to light up with a different color.

Advanced Option - Tensorflow.js Backend Selection

Tensorflow.js is the library I use to perform the AI model predictions. It has more than one "backend" that can be used to perform the calculations. For many users, the WebGL one is the best default choice. However, one of my users surprised me by sharing that the new WASM backend was faster for them. On my computer it is about 10x slower than the WebGL backend, so this was unexpected. This user requested that I implement a new feature to allow the user to choose the backend - that will be available in this upcoming release as well.

When Will It Be Ready?

As noted, development is mostly wrapped up - at that point it will mostly depend on what gremlins are found during testing. Stay tuned!

Monday, January 18, 2021

Video - A New Challenge

Today I'd like to talk about a requested feature, video scanning. Not having support for video or animated images has been a longstanding gap in functionality, and it's hard not to be frustrated when the still image for a less-than-savory video is blocked ... but the video is still there and plays through.

While conceptually just a parade of individual images, video scanning provides several difficulties that make for a complex challenge. Here are some that I've encountered so far.

Accessing the Video

Some video viewers - like the current implementation at TikTok and some video ads - are relatively simple and just point at a video stream much like images usually point at an image URL.

However, many do not work that way. Many of them use other technologies such as DASH to deliver the video in bits and pieces with different network requests. YouTube takes an approach roughly in this category. These are often seen as separate requests by Wingman Jr. with no one-size-fits-all method to determine that they are part of the same video. Additionally, there are often other complications like the fact that these fragments can be delivered out of order or are not immediately loadable as valid video streams on their own. It would be desirable to not have site-specific logic, but unfortunately some is needed for certain sites.

Balancing Performance and User Experience

While the exact method could be implemented in many ways, it is fair to say that scanning video is a significantly more involved effort than image stills because there is much more data to comb through.

Additionally, what is the desired user experience? Stream video until blocking? Scan all the video or somehow only part of it? These involve tradeoffs in patience vs. thoroughness to the end user.

It is also not always possible to as cleanly inject a replacement video, so the method of alerting the user that a block has occurred vs. a network connection simply stalling out becomes important too.

Moving Images Look Different

Even if audio is ignored, a video is actually fundamentally different in appearance than a succession of stills. Consider: many still photos are not at all blurry, and do not represent any action - they are often carefully framed. Video, on the other hand, may or may not be blurry (or properly lit) and represents a number of frames of strange transitional images. Have you ever paused a video and the main character is stuck in a strange pose or with a funny face? Simply selecting slices from a video most certainly yields some of these images.

Context of motion may also be critical. For example, a photo of an athlete may inappropriately focus on a certain body area and from the context it is clear that the image is questionable. On the other hand, video often transitions through these types of shots quite naturally and they are often not questionable at all in the correct context.

Further, it does not help that currently a number of amateur inappropriate still photos happen to be extracted from poor quality home videos .... which happen to bear similarities in style to home videos with no inappropriate content.

While subtle, there is a real effect here and the net result is that the current model is poor for video. This may be somewhat correctable through small updates to model training, but to truly make this work excellently is likely going to take a number of iterations.

In Closing

Each of these challenges has had some effort put towards a solution, and some are seeing significant progress. Stay tuned!