Thursday, December 30, 2021

Thumbnail Scanning Size

 Let's talk about thumbnails here for a bit.

Usually when you're dealing with an image classification problem, if you can figure out what the thing is, then you can classify it, which is easy. But for NSFW content, this turns into a bit more of a grey area. Sure, there are some things that - no matter how small - are clearly NSFW. But for others, as an image either scales down in size or becomes more abstract, it starts to cross a line where it goes from R to PG or even G. Where exactly one should draw this line is quite subjective.

I just received a feature request from a helpful user, Sneed:

A setting to optionally block small thumbnails or set an arbitrary minimum/maximum would be nice. The thumbnail for the image on this webpage does not get blocked, when you click it to expand, it IS blocked.

I've definitely run into this too. The current logic is around this area of the code and basically has two conditions to skip blocking: if an image is less than 36x36 pixels or the total image size is less than 1024 bytes.  The size in bytes is perhaps less obvious, but the idea is that an offensive image generally needs a certain amount of complexity, so if I can indicate that the image is too small by just the number of bytes, I can skip even decoding the image to find the true dimensions which helps with performance.

This is an area I haven't turned my attention to in quite a while, so I'm going to give it some thought. I'm tracking it over at this issue on Github, so feel free to join the conversation there.

No comments:

Post a Comment