Showing posts with label ai. Show all posts

Sunday, February 28, 2021

3.0 Inbound Soon!

I'm excited to tell you about the upcoming 3.0 release! I've finished most of the key development on it and will be moving to a further round of testing/tweaking relatively soon.

So, why move to 3.0 since the move to 2.0 was not long ago? Let's take a look at the feature list.

Video Peek

I've blogged about this before but a simple form of video scanning is coming. It will "peek" at a good chunk of the first part of a "basic" video and block it if bad content is detected over a sustained period. "Basic" videos include most simple embedded and banner ad videos and some services such as TikTok. More advanced "streaming" video services basically need to be handled on a case-by-case basis because behind the scenes they work differently from each other; for now, the only "streaming" site support is YouTube. Here the video is streamed chunk by chunk and each chunk has the first section scanned.

When a video is blocked, the behavior will depend on when the blocking occurred. If it happened right at the beginning, a placeholder Wingman Jr. icon may appear in the video stream. If it happens midstream, it is not easy to insert an image so the video will simply stop playing; there is also a visual indicator that will appear on the Wingman Jr. icon - to be discussed a bit later.

Note this feature is still quite new and may need some further refinement, but I found the current form useful enough that I wanted to share it with you. There will be an option to turn video scanning on and off apart from image scanning. Don't hesitate to send in feedback using the link in the popup options after you've used it - there are a number of constraints on what I'm able to do but I'd like to hear about your experiences to make it as good as possible.

New Model

The model in 2.0.1, SQRXR 62, has been in use for a long time. I've done many experiments to see if I could create a worthy successor, but improvements had long been marginal enough that I did not wish to change the current experience.

However, that has now changed with the advent of SQRXR 112. It builds on a slightly different base model and achieves better results. I'm still working on the final cutoff parameters to use for the release but the bulk of the work is done. If you're a machine learning nerd, you can use the model in your own projects - check it out here.

Silent Mode

The human psyche is such that we are curious creatures - and it is human nature to seek out the "forbidden fruit". Currently the browsing experience accentuates where the blocked images have been; even though they are not visible, it can potentially promote a dark pattern where one can want to click on the image slot to see what was there.

I've long had an issue out to improve this with a "silent mode" where blocked images are instead transparently replaced. I gave it a whirl and so far I'm quite liking the results. The actual implementation places a small watermark with "W" and the image score in the center of the replaced image, so it is discernible if you look closely. However, in a wall of images it does not stand out heavily and cognitively I've found it to be significantly less jarring.

Scanning Progress Feedback

Have you ever wondered in the past: did the addon get stuck? Or is this image just taking a long time or failing to load?

Now you'll be able to have more clarity on that. A simple progress bar has been embedded in the main browser action icon, so you'll be to track how many images are queued up to be scanned. Additionally, it provides a video scanning indicator in the form of a tiny "v" in the bottom right; a blocked video will also cause this area to light up with a different color.

Advanced Option - Tensorflow.js Backend Selection

Tensorflow.js is the library I use to perform the AI model predictions. It has more than one "backend" that can be used to perform the calculations. For many users, the WebGL one is the best default choice. However, one of my users surprised me by sharing that the new WASM backend was faster for them. On my computer it is about 10x slower than the WebGL backend, so this was unexpected. This user requested that I implement a new feature to allow the user to choose the backend - that will be available in this upcoming release as well.

When Will It Be Ready?

As noted, development is mostly wrapped up - at that point it will mostly depend on what gremlins are found during testing. Stay tuned!

Sunday, December 13, 2020

Firefox 83 Fix - Get 2.0!

(Update: 2.0 is out!)

Fix in Progress

There is a fix in progress in Wingman Jr., it changes quite a bit of code, but it should be out in a bit. I have a fully working solution and am testing it. If you upgrade, you will likely see a note about hidden tabs; this is expected and is due to the nature of the solution.

What Went Wrong

First, it's important to know that to make the AI work, the computer does a lot of math for each image it scans. A typical computer has more than one type of chip on it that knows how to do math. There's the CPU, which is for general purpose math. Then - on most modern computers - there's the GPU, which is for large numbers of parallel calculations of the same kind - which works great for both graphics/video as well as for AI.

Having a GPU - and indeed a fast GPU - can make a big difference. In some cases, it scans images 10x faster or even more than the CPU. The AI library I use for helping run calculations, Tensorflow.js, is careful to ensure that it goes as fast as possible.

So what happened with Firefox 83?

Prior to Firefox 83, there was bug in Firefox in certain cases. Basically, there is a special call that you can make that says "give me access to the GPU, but if there's a big performance problem because something about loading the GPU isn't quite right, don't give me access to the GPU at all - just let me know and fail". For the most part, this function was working correctly in Firefox 82 and prior. However, it didn't work in all cases. In some cases, it would give access to the GPU even when it maybe was taking a performance hit.

In Firefox 83, the Mozilla team fixed the glitch. So wouldn't this make things better?

Not quite. Basically, the fact that Wingman was trying to load the GPU in the background in the addon wasn't fully supported in some cases. So when Tensorflow.js tried to get access to the GPU in this way, it would now correctly fail and say, "I won't give you access to the GPU because there is a bit of a performance hit". This meant that Tensorflow.js would fall back to doing all the calculations on the CPU, even if the performance-hit GPU would still be much faster.

If you're one of the unfortunate users like myself that encountered this, it made the browsing experience basically unusable. The whole browser would seem to lock up on loading pages and things would take forever to load.

Partial Mitigation

This spring, Tensorflow.js also released another method of running AI models, the "WASM backend". It still used the CPU, but it did some advanced tricks that leveraged some things basically all modern CPU's can do, and it made the CPU case much faster. So much faster, in fact, that in some cases it was as good as the GPU or maybe even a tiny bit better. (See here for Google's blog post on the matter).

I added this as a fallback method for calculation, and it helped some users. But for some users (like myself), the performance with this method is still unbearably slow.

Options

One option I pursued for fixing this was to have Tensorflow.js use the GPU even if there were performance issues noted by Firefox. This loading option is not exposed by Tensorflow.js, but they were kind enough to consider adding it as an option.

While this might work for some, it might end up being the wrong choice for others. If it was the wrong choice, then the system should by all rights fallback to the "WASM backend" but would not if we forced it to use the GPU. Likely then the right thing to do would be to expose an option in Wingman to pick which method to use, but this makes for a potentially poor default experience.

As the true nature of the bug unfolded as the excellent team at Mozilla looked into my bug report, it started to become clearer that 1) a real bug had been fixed and that 2) existing performance may have been suboptimal already! Additionally, there was a critical realization: it wasn't that the GPU couldn't necessarily be loaded quickly - it's that the addon background setting wasn't programmed quite correctly to allow it do so. However, this meant that if you could load the GPU in a different setting, it might work as expected - for example, in a "normal" web page setting. So how could this be accomplished?

New Architecture

In the past, there was more or less one place where code would run: the background of the addon. This approach is simple, light, and generally works great. But now we needed to do processing in a "normal" web page. An addon can also create and load web pages, too.

So the solution was to split the code into two parts: the "background" and a "processor" running on a normal page. The two parts need to talk back and forth in deep conversation in order to work. The "background" says things like "here's a request and the data flowing in for it" and the "processor" says things like "here are the scan results you asked for". The addon ecosystem makes this straightforward to accomplish, but it's a lot of plumbing and a large rip up of the existing code.

I've finished the rewrite and am ensuring the changes are stable. While there is some overhead in this approach (due to the two sides being in conversation), there are also some advantages. One of those is that it is much easier to load more than one processor if needed. So far I have not yet been able to see a performance gain out of this, but in the future I may be able to use the GPU and WASM backends together to see a bit of a performance boost.

It is probably apparent now why there might be a warning about hidden tabs. Wingman creates the "processor" tabs as web pages so that they work properly, but they're not helpful for the user to see, so it immediately hides them. That's all the tab hiding that Wingman does, but it still drives needing the "tabHide" permission and will prompt a new message after the upgrade.

Final Notes and 2.0

This is a big change in the overall architecture - large enough that I plan to change the version to 2.0 to reflect what has happened.

I may try to squeeze in a couple other features or fixes, but stay tuned for a new release soon.

Saturday, March 28, 2020

The Quest for New Hardware, pt. 3 - conclusion

Well, I was able to put to rest the main remaining issue from last time: the fan control needing to trigger off the temperature of the GPU. I spent quite a bit of time on this but at the end of the day the primary things seems to have been that I needed to have both IPMI updated and the NVidia driver loaded for the "Optimal" fan setting to work properly and voila - the fans now stabilize the GPU temperature at 60C!

Now I simply need to work on tuning the running of the model, but that's a task for another day. While there have been frustrating times at points, I've enjoyed learning about servers, IPMI management and I've enjoyed getting my feet wet over at the Homelab subreddit - the folks there have been welcoming and knowledgeable!

Tuesday, March 24, 2020

The Quest for New Hardware, pt. 2

In the last post, I talked about the different paths one could take to do powerful machine learning on the cheap. I opted to buy an older GPU server, the SuperMicro 2027GR-TRF. I now have all the pieces and have it more or less all working, so I thought I'd share some things I ran into along the way and some embarrassing stories along the way.

Here was what I ended up ordering:

The SuperMicro 2027GR-TRF. I picked a used box from Garland Computers off Ebay for $470+$75 shipping. The specs on it were:

2x E5-2670 2.6 GHz 8-core CPU's

64GB DDR3 RAM

4 GPU slots (but K80's are dual-slot, so 2x K80 slots)

10x RAID slots

A Western Digital Red 500GB SSD drive from Newegg for $90. If you're not familiar, the "blue" line is consumer-grade, and the "red" line is a bit more enterprise-grade. I may be hitting it pretty hard, and I've heard it plays better with RAID, which may be important in the future.
An open box K80 from MET servers on Ebay for $325+$25 shipping.
An old used server console for about $75+$30 shipping on Ebay.
As it turns out, the K80 did not come with the extra converter to 2 PCI, so I ended up buying a generic cable.
A PS/2 to USB adapter.
A Kill-a-watt to monitor power usage from my local Menards.

So, all in all right around $1120, not including any server rack solution. Not bad!

Here are some things I learned - if you are a server administrator please enjoy a good laugh!

Power buttons are a bit different on servers. Basically, when you plug in the server its fans always go, so I thought it might be on. The main manual never had a picture of the full box with where the parts were, so it took me longer than I care to admit to realize that the power button was on what I thought was a mounting bracket! However, I kept seeing references to a control center with status lights in the manual. Finally, I saw a ribbon cable underneath some packaging on the side and saw that one of the "mounting brackets" actually had buttons. This wasn't shipped attached to the main device. (See this angled picture from a related server to get an idea of what I'm talking about.)
I didn't have my SSD in quite right in the cartridge so the cartridge would go all the way but never truly connect. Took a bit to figure out why it wasn't showing up in BIOS.
I've installed NVidia drivers on more than one occasion, but ran into a new issue. In the past, I've generally been using the box beforehand, and have lots of the standard tools already set up. This time I was going from a fresh install. I was installing via the apt package route. However, I failed to notice that the package - while appearing to install successfully - had a warning message about not being able to build the kernel module. I had to go get the kernel headers package manually and then it worked just fine. For some reason this step doesn't seem to make into many of the installation guides out there.
The SuperMicro 2027GR-TRF product page clearly states there is a PS/2 port for keyboard/mouse. I wasn't sure if it was one for each or just one, but at any rate I can assure you it does not have one externally - only USB. And neither the manual nor the motherboard manual seem to make mention of one. So, I needed to buy the extra adapter to make it work. Fortunately, this was cheap and I already needed to wait on the power adapter for the K80.
I didn't have detailed instructions for the K80 installation specifically. The K80 is a dual-slot card, so I was a bit unsure how to handle the general placement and/or what other hardware I might need. As noted, you need the extra power adapter. I ended up attaching it to the bottom slot to start with. However, I ran into overheating while doing some minor stress testing. The K80 shows up as two cards in nvidia-smi. The first card would always end up getting quite hot, encountering thermal issues at about 92C and turning off. So I tried switching the card to the top slot. No difference.The K80 is passively cooled and has several forum posts warning that trying to use it in anything but an official NVidia-certified integrator's server is likely to cause problems - part of the reason I went with a GPU server in the first place once I landed on the K80. Unfortunately, NVidia's official integrator list does NOT go back as far as the K80, so I had to rely on the SuperMicro's word that they support it. Unfortunately this symptom is identical to not having a proper system to support the K80 (see also this important explanation here). Fortunately, however, the server itself has good power and cooling capabilities - and it only seems to be the closed-loop monitoring of the cooling that was having issues. To that end, I decided to see if I could more manually increase the fan speeds as they seemed to be running a rather low speed all the time for the GPU. This box's BIOS did not expose the fan controls, but the IPMI management did, so I was able to set it that way. Unfortunately the options were "optimal for all fans" or "turn all fans to 11", so it now sounds rather like a jet engine. I actually went and grabbed hearing protection after a while. But, at least it is a cool jet engine. Now under reasonable (but not full) load, it was stabilizing around 51C for the hotter of the two cards.

So I learned a few things, but it was awesome to see it all up and running. I can now run batch sizes of at least 256 images once I spread across the cards for a 224x224 MobileNetV2, so I have definitely hit my target. However, I haven't yet gotten a good chance to try training from scratch and with current world circumstances around COVID-19, it is unfortunately a bit lower priority.

Friday, March 13, 2020

The Quest for New Hardware, pt. 1

Recently I've been considering what types of hardware setups might be the next natural step for training the model. In the past I've used a Jetson TX1 devkit, which is a rather modest piece of hardware and allows me to do a bit heavier finetuning. I have this running on its own in a back bedroom, and it's helpful to have a box that is dedicated to that rather than trying to share time with my laptop or a desktop, for example.

However, the dataset I train on has grown considerably, now over 200K images. This is large enough that I believe it may be reasonable to consider training from scratch rather than finetuning. Training from scratch, however, requires much more computing power - and ideally could run in significantly larger batch sizes.

In fact, one of the key things I wanted to be able to do was run batch sizes on par with training "modern" CNN's from scratch using ImageNet. This varies widely, but let's say ~100 images or so per batch as a reasonable target. Generally speaking, this means more GPU RAM.

However, I'm doing my best to keep on a relatively tight budget - ideally something around $1000. This is lower than many new setups typically run, even ones aimed at the budget-conscious consumer - see for example this blog post with this nice, new box settling in at a reasonable $1700. This necessitated some deep introspection about what I was truly try to achieve, and ultimately I settled on roughly this set of priorities:

GPU RAM
Speed
Future expansion

My thought was that by prioritizing RAM, I could train almost any model - but perhaps at a slower speed.

With these priorities in mind, I considered several possibilities for a new or second-hand setup. With a different set of constraints than many considering a machine learning rig, I was pleasantly surprised by the sheer diversity of possible solutions.

Purchase a Jetson AGX Xavier devkit. The price on these seems to have been reducing, and now the devkit just got upgraded to 32GB of RAM shared between GPU and CPU.
Build a desktop and add higher-quality consumer-level cards like the popular 1080 TI or perhaps even the newer 2080 to it.
Build a specialized rig and purchase many second-hand crypto mining rig cards - sort of a quantity over quality approach.
Look to older server solutions and see what was available.

After quite some time of searching, I settled on option #4. I discovered that I could find the now relatively old NVidia K80 cards in GPU servers for an acceptable price, providing a surprisingly strong GPU RAM/$ ratio at a satisfactory level of speed. I've never really looked into the world of servers before, and it was an enjoyable journey - but a bit of an overwhelming one at first. I joined the subreddit /r/homelab, a friendly place to discuss running servers at home. (I'd like especially to thank merkuron for their help!)

Ultimately I settled on an older GPU server by SuperMicro, the 2027GR-TRF. I currently plan to add one K80 to it, but it has support for up to two if I wish to expand in the future. I have recently been working on acquiring the full solution in various pieces, primarily through Ebay. I have gotten most of the parts but need a few more before I can put it all together, so stay tuned for more updates!

Friday, January 24, 2020

Model Sensitivity - Part Three

In the first post, I discussed different ideas of what sensitivity could mean. In the second post, I discussed how tradeoffs in sensitivity were used as the basis for the "zones" software feature, allowing for a "trusted", "neutral", and "untrusted" zone.

In this final post on the initial introduction of sensitivity, I'd like to discuss one more feature: automatic zone selection.

The idea is simple: if the plugin sees that a number of images were recently getting blocked, it might be a good idea to try to move to a less trusted zone because it might mean we're letting too many bad ones through; similarly, if we haven't blocked many images in a while, it might be a good idea to realize we are likely in a more trusted zone and would be better off not blocking extra good ones.

So for example, suppose you are browsing sites you know and love and that don't have any questionable content. However, suppose you then get linked off to a bit more questionable site - but not objectionable enough that it makes sense to leave. In this situation, it would be fabulous if the plugin just started being a bit more picky. With automatic mode, the switch can happen automatically.

For the stats-minded folks out there, internally the number of predicted positives is multiplied by the precision to get an estimator of the number of true positives. The estimated true positives - as a percentage - controls which zone is selected.

For the first release I plan to make automatic mode the default, but the different zones can be selected as well - which will turn off automatic mode.

And that's it for the introduction on sensitivity - I'm quite excited about getting this feature out! I'm looking forward to getting feedback on how well it is working for everyone.

Wednesday, January 22, 2020

Model Sensitivity - Part Two

In part one, I talked about different ideas of what sensitivity could mean and the planned approach for the Wingman Jr. model. Today I'd like to talk about the software features around sensitivity that I'm working on for the 1.0 release.

As discussed last time, a binary classifier is simply a program that says yes or no given some input. In our case, the input is a picture and the model says "pass" or "block" as the two possible outcomes. A typical way of grading this is to consider the tradeoffs between how many images are correctly blocked vs. incorrectly blocked at different thresholds. This allows us to pick tradeoff points for different types of model behavior.

For Wingman Jr., I've picked three tradeoff points that correspond to three ways the user might wish to use the plugin, based on what "zone" they are browsing in.

The user trusts the zone they are browsing in - pick a tradeoff that rarely falsely flags an image as bad, but still catches well over half the bad images as a safety net.
The user does not trust they zone they are browsing in at all - pick a tradeoff that catches almost all bad images, but falsely flags a number of images as well.
The user has a neutral opinion about the visual safety of the zone they are browsing - pick a tradeoff that balances catching most of the bad images but also flags some false positives.

Being able to choose the "trusted zone" is great for situations where the user is seeing a number of a times that the model is saying something is bad when it is not. While bad images should get blocked, it is easy to get frustrated if a bunch of images are getting blocked for no good reason.

Similarly, if you know you are going to visit someplace that is a bit less safe it is a good tradeoff to block almost all the bad ones at the cost of some good ones.

I'm excited to roll out this feature but I have one more thing included that I'd like to discuss in the next post. The conclusion of the series is now available here: Model Sensitivity - Part Three

Saturday, January 18, 2020

Model Sensitivity - Part One

I got my first feature request today for the Wingman Jr. plugin -thank you, anonymous user! A user was asking for a way to change the sensitivity of detection. As it turns out, I've recently been looking at some related things from the model prediction side, so it is good timing. I'd like to talk today a bit about how machine learning models view sensitivity vs. how humans view sensitivity. (And for you stats nerds out there - set aside the definition of "sensitivity" for just a bit.)

Let's start with how humans typically view sensitivity. Suppose a human is asked to grade how mature a movie is, perhaps using something similar to the MPAA ratings of G, PG, PG-13, R. We generally expect that a human uses more of a gradual rating, where - for example - increasingly violent content might cause the rating to go from G to R. We would expect a gradual, smooth transition as the objectionable content increases. With this in place, we can then say things like "I'd only like to watch movies that are up to PG-13". So here sensitivity might mean something like "a gradual scale that allows for a cutoff at a certain level of objectionability."

However, for machines this isn't always the case. Often times they take a more probabilistic approach. For example, a quite normal approach is to build a "binary classifier" (a program that just says "yes" or "no" for some input) and look at the percentage of "yes" and "no" that come out. So, you might be able to say about a certain model - "this catches 90% of bad images, but blocks 10% of good ones by accident." While it's not ideal that any should fail, this makes the tradeoffs easy to reason about from a statistical perspective. So here sensitivity may be roughly defined as "a cutoff point that guarantees a certain statistical balance yes and no classes". (Strictly speaking, the statistics definition is solely concerned with the rate of true positives to true positives and false negatives but I would argue most humans have a much squishier view of the meaning.)

Unfortunately, this statistical view - while useful - often leaves the human judge quite dissatisfied. Grading does not fail gracefully. In our hypothetical example, the human might ask something like "Why did this movie with an intense scene of gore get rated G instead of R? If had at least been PG-13 it would be somewhat understandable, but this is terrible!" This type of failure, the lack of understanding and "smoothness" in grading successes and failures, is one of the key challenges facing real-world use of AI today.

For Wingman Jr., I'm planning to take an approach that hopefully captures some of the spirit of both notions of sensitivity.

A bit of background first. Currently the model is built from a selection of graded images falling into one of four categories: safe, questionable, racy, or explicit. The current strategy has been to do a fairly standard image classifier approach with these four classes, but with weighting to penalize confusions between e.g. explicit and safe more heavily than say between questionable and safe.

So first, with respect to the statistical aspect of sensitivity implementation: I'm working on distilling the prediction of the four classes into simply two: safe and not safe, as a numeric score. This allows for analysis with traditional means like the AUC ROC.

With respect to the common sense aspect of sensitivity: while I can't make the grading perfectly smooth, I can instruct the tradeoff to be sensitive to the severity of the image. So, more "explicit" images will be caught than "questionable" images at a given cutoff threshold because I've purposely over-represented "explicit" images.

I have some initial results that seem promising, but I have some work left to do before it's ready for prime time - a useful piece of software is more than just a machine learning model.

Stay tuned!

The next installment of this series is available at Model Sensitivity - Part Two

Model Accuracy

As I start to post about the model I'm building for the Wingman Jr. plugin, I'd like to take a bit of an usual step:

I don't plan to talk about absolute accuracy or test results extensively.*

Why? The usefulness of accuracy is helpful when comparing models against a standard benchmark. However, while some ad hoc data sets have appeared for comparing NSFW detection models, the sheer vastness of both safe and unsafe image classes is staggering - meaning that accuracy scores can, to a degree, be misleading. This may change as new data sets become available, and it may make sense to report on them - hence the asterisk. Additionally, I may discuss relative improvement in the model against my own dataset - that's important for communicating progress.

However, even the internal dataset may change over time in response to flaws found in the data distribution. For example, I have already had to specialize into certain types of sports fairly extensively. Olympic runners - whom most would not identify as NSFW at all - are visually quite similar to swimsuit models. It is important to get these "hard negatives" correct, and if it means that my baseline dataset is not stable - oh well. The result is more useful to you, statistics notwithstanding!

This is not to say that I won't post graphs and charts from time to time - just make sure you take them with a grain of salt as your browsing patterns may be quite different than my collected images.

What I do hope to talk about is the journey: techniques, strategies, features, challenges, and the like. Stay tuned!

Finally - I hope that as I work on improving the model that you can laugh with me as it misclassifies the occasional car, map, or other random picture as a bit naughty. (I just had one user report it was blocking a car photo.) Yahoo's open_nsfw has been known to fail on Charizard, and apparently nsfwjs at one time had it out for Jeff Goldblum - perhaps we can add a few more spectacular failures to the list.