The model currently deployed for Wingman Jr. is a relatively stock MobileNetV2 finetune with a bit of magic at the end. The dataset has been growing for quite some time and as I reached about 200K images I started seriously contemplating attempting to train from scratch.
Why train from scratch?
First, domain mismatch. Almost all base models are trained against ImageNet. While ImageNet is a fantastic proxy for photo-based populations, the domain of photos is not the same as the domain of internet images. The dataset has a significant portion of non-photo images: classical art, anime, stylized logos and icons, line drawings, and paintings. The closer I can get to approximating a target population of internet images, the more training from scratch has a potential to improve the model by retaining the parts that matter.
Second, ability to try new architectures and variations. Couldn't I try new architectures without doing this? Certainly. But many architectures do not have ImageNet pre-trained weights available, so finetuning is not an option. Additionally, having a bit of a baseline against a standard network helps provide a useful backdrop for comparison and helps to hint at whether a new network architecture is less capable vs. simply not having enough data.
Next time I'd like to discuss some of the mechanics of achieving a successful train from scratch; part 2 is now available!
No comments:
Post a Comment