As I start to post about the model I'm building for the Wingman Jr. plugin, I'd like to take a bit of an usual step:
I don't plan to talk about absolute accuracy or test results extensively.*
Why? The usefulness of accuracy is helpful when comparing models against a standard benchmark. However, while some ad hoc data sets have appeared for comparing NSFW detection models, the sheer vastness of both safe and unsafe image classes is staggering - meaning that accuracy scores can, to a degree, be misleading. This may change as new data sets become available, and it may make sense to report on them - hence the asterisk. Additionally, I may discuss relative improvement in the model against my own dataset - that's important for communicating progress.
However, even the internal dataset may change over time in response to flaws found in the data distribution. For example, I have already had to specialize into certain types of sports fairly extensively. Olympic runners - whom most would not identify as NSFW at all - are visually quite similar to swimsuit models. It is important to get these "hard negatives" correct, and if it means that my baseline dataset is not stable - oh well. The result is more useful to you, statistics notwithstanding!
This is not to say that I won't post graphs and charts from time to time - just make sure you take them with a grain of salt as your browsing patterns may be quite different than my collected images.
What I do hope to talk about is the journey: techniques, strategies, features, challenges, and the like. Stay tuned!
Finally - I hope that as I work on improving the model that you can laugh with me as it misclassifies the occasional car, map, or other random picture as a bit naughty. (I just had one user report it was blocking a car photo.) Yahoo's open_nsfw has been known to fail on Charizard, and apparently nsfwjs at one time had it out for Jeff Goldblum - perhaps we can add a few more spectacular failures to the list.
No comments:
Post a Comment