Monday, January 18, 2021

Video - A New Challenge

Today I'd like to talk about a requested feature, video scanning. Not having support for video or animated images has been a longstanding gap in functionality, and it's hard not to be frustrated when the still image for a less-than-savory video is blocked ... but the video is still there and plays through.

While conceptually just a parade of individual images, video scanning provides several difficulties that make for a complex challenge. Here are some that I've encountered so far.

Accessing the Video

Some video viewers - like the current implementation at TikTok and some video ads - are relatively simple and just point at a video stream much like images usually point at an image URL.

However, many do not work that way. Many of them use other technologies such as DASH to deliver the video in bits and pieces with different network requests. YouTube takes an approach roughly in this category. These are often seen as separate requests by Wingman Jr. with no one-size-fits-all method to determine that they are part of the same video. Additionally, there are often other complications like the fact that these fragments can be delivered out of order or are not immediately loadable as valid video streams on their own. It would be desirable to not have site-specific logic, but unfortunately some is needed for certain sites.

Balancing Performance and User Experience

While the exact method could be implemented in many ways, it is fair to say that scanning video is a significantly more involved effort than image stills because there is much more data to comb through.

Additionally, what is the desired user experience? Stream video until blocking? Scan all the video or somehow only part of it? These involve tradeoffs in patience vs. thoroughness to the end user.

It is also not always possible to as cleanly inject a replacement video, so the method of alerting the user that a block has occurred vs. a network connection simply stalling out becomes important too.

Moving Images Look Different

Even if audio is ignored, a video is actually fundamentally different in appearance than a succession of stills. Consider: many still photos are not at all blurry, and do not represent any action - they are often carefully framed. Video, on the other hand, may or may not be blurry (or properly lit) and represents a number of frames of strange transitional images. Have you ever paused a video and the main character is stuck in a strange pose or with a funny face? Simply selecting slices from a video most certainly yields some of these images.

Context of motion may also be critical. For example, a photo of an athlete may inappropriately focus on a certain body area and from the context it is clear that the image is questionable. On the other hand, video often transitions through these types of shots quite naturally and they are often not questionable at all in the correct context.

Further, it does not help that currently a number of amateur inappropriate still photos happen to be extracted from poor quality home videos .... which happen to bear similarities in style to home videos with no inappropriate content.

While subtle, there is a real effect here and the net result is that the current model is poor for video. This may be somewhat correctable through small updates to model training, but to truly make this work excellently is likely going to take a number of iterations.

In Closing

Each of these challenges has had some effort put towards a solution, and some are seeing significant progress. Stay tuned!