Wingman Jr.: video

Sunday, February 28, 2021

3.0 Inbound Soon!

I'm excited to tell you about the upcoming 3.0 release! I've finished most of the key development on it and will be moving to a further round of testing/tweaking relatively soon.

So, why move to 3.0 since the move to 2.0 was not long ago? Let's take a look at the feature list.

Video Peek

I've blogged about this before but a simple form of video scanning is coming. It will "peek" at a good chunk of the first part of a "basic" video and block it if bad content is detected over a sustained period. "Basic" videos include most simple embedded and banner ad videos and some services such as TikTok. More advanced "streaming" video services basically need to be handled on a case-by-case basis because behind the scenes they work differently from each other; for now, the only "streaming" site support is YouTube. Here the video is streamed chunk by chunk and each chunk has the first section scanned.

When a video is blocked, the behavior will depend on when the blocking occurred. If it happened right at the beginning, a placeholder Wingman Jr. icon may appear in the video stream. If it happens midstream, it is not easy to insert an image so the video will simply stop playing; there is also a visual indicator that will appear on the Wingman Jr. icon - to be discussed a bit later.

Note this feature is still quite new and may need some further refinement, but I found the current form useful enough that I wanted to share it with you. There will be an option to turn video scanning on and off apart from image scanning. Don't hesitate to send in feedback using the link in the popup options after you've used it - there are a number of constraints on what I'm able to do but I'd like to hear about your experiences to make it as good as possible.

New Model

The model in 2.0.1, SQRXR 62, has been in use for a long time. I've done many experiments to see if I could create a worthy successor, but improvements had long been marginal enough that I did not wish to change the current experience.

However, that has now changed with the advent of SQRXR 112. It builds on a slightly different base model and achieves better results. I'm still working on the final cutoff parameters to use for the release but the bulk of the work is done. If you're a machine learning nerd, you can use the model in your own projects - check it out here.

Silent Mode

The human psyche is such that we are curious creatures - and it is human nature to seek out the "forbidden fruit". Currently the browsing experience accentuates where the blocked images have been; even though they are not visible, it can potentially promote a dark pattern where one can want to click on the image slot to see what was there.

I've long had an issue out to improve this with a "silent mode" where blocked images are instead transparently replaced. I gave it a whirl and so far I'm quite liking the results. The actual implementation places a small watermark with "W" and the image score in the center of the replaced image, so it is discernible if you look closely. However, in a wall of images it does not stand out heavily and cognitively I've found it to be significantly less jarring.

Scanning Progress Feedback

Have you ever wondered in the past: did the addon get stuck? Or is this image just taking a long time or failing to load?

Now you'll be able to have more clarity on that. A simple progress bar has been embedded in the main browser action icon, so you'll be to track how many images are queued up to be scanned. Additionally, it provides a video scanning indicator in the form of a tiny "v" in the bottom right; a blocked video will also cause this area to light up with a different color.

Advanced Option - Tensorflow.js Backend Selection

Tensorflow.js is the library I use to perform the AI model predictions. It has more than one "backend" that can be used to perform the calculations. For many users, the WebGL one is the best default choice. However, one of my users surprised me by sharing that the new WASM backend was faster for them. On my computer it is about 10x slower than the WebGL backend, so this was unexpected. This user requested that I implement a new feature to allow the user to choose the backend - that will be available in this upcoming release as well.

When Will It Be Ready?

As noted, development is mostly wrapped up - at that point it will mostly depend on what gremlins are found during testing. Stay tuned!

Monday, January 18, 2021

Video - A New Challenge

Today I'd like to talk about a requested feature, video scanning. Not having support for video or animated images has been a longstanding gap in functionality, and it's hard not to be frustrated when the still image for a less-than-savory video is blocked ... but the video is still there and plays through.

While conceptually just a parade of individual images, video scanning provides several difficulties that make for a complex challenge. Here are some that I've encountered so far.

Accessing the Video

Some video viewers - like the current implementation at TikTok and some video ads - are relatively simple and just point at a video stream much like images usually point at an image URL.

However, many do not work that way. Many of them use other technologies such as DASH to deliver the video in bits and pieces with different network requests. YouTube takes an approach roughly in this category. These are often seen as separate requests by Wingman Jr. with no one-size-fits-all method to determine that they are part of the same video. Additionally, there are often other complications like the fact that these fragments can be delivered out of order or are not immediately loadable as valid video streams on their own. It would be desirable to not have site-specific logic, but unfortunately some is needed for certain sites.

Balancing Performance and User Experience

While the exact method could be implemented in many ways, it is fair to say that scanning video is a significantly more involved effort than image stills because there is much more data to comb through.

Additionally, what is the desired user experience? Stream video until blocking? Scan all the video or somehow only part of it? These involve tradeoffs in patience vs. thoroughness to the end user.

It is also not always possible to as cleanly inject a replacement video, so the method of alerting the user that a block has occurred vs. a network connection simply stalling out becomes important too.

Moving Images Look Different

Even if audio is ignored, a video is actually fundamentally different in appearance than a succession of stills. Consider: many still photos are not at all blurry, and do not represent any action - they are often carefully framed. Video, on the other hand, may or may not be blurry (or properly lit) and represents a number of frames of strange transitional images. Have you ever paused a video and the main character is stuck in a strange pose or with a funny face? Simply selecting slices from a video most certainly yields some of these images.

Context of motion may also be critical. For example, a photo of an athlete may inappropriately focus on a certain body area and from the context it is clear that the image is questionable. On the other hand, video often transitions through these types of shots quite naturally and they are often not questionable at all in the correct context.

Further, it does not help that currently a number of amateur inappropriate still photos happen to be extracted from poor quality home videos .... which happen to bear similarities in style to home videos with no inappropriate content.

While subtle, there is a real effect here and the net result is that the current model is poor for video. This may be somewhat correctable through small updates to model training, but to truly make this work excellently is likely going to take a number of iterations.

In Closing

Each of these challenges has had some effort put towards a solution, and some are seeing significant progress. Stay tuned!