Showing posts with label hardware. Show all posts
Showing posts with label hardware. Show all posts

Saturday, January 15, 2022

Welcome, Linux Users!

 Recently there was a bump in new users - and the Firefox stats seem to indicate that many of you use Linux. Welcome friends!

Now, I've got a question for you. I don't use Linux desktop as my daily driver, so I need your help in understanding how the performance is looking so far. This addon relies on some level of hardware acceleration ... which hasn't always had the best support on Linux. If you've got a second, I'd love to hear from you over at this poll on Google Forms - can you please help me out?

One last note: if you are finding that the default WebGL backend isn't working well on your machine, it might be worth it to try the WASM backend, which you can pick in the options for the addon.

Hope you enjoy Wingman!

 

Update: Poll results from 4 users - glad to hear it's working well for you!

Saturday, March 28, 2020

The Quest for New Hardware, pt. 3 - conclusion

Well, I was able to put to rest the main remaining issue from last time: the fan control needing to trigger off the temperature of the GPU. I spent quite a bit of time on this but at the end of the day the primary things seems to have been that I needed to have both IPMI updated and the NVidia driver loaded for the "Optimal" fan setting to work properly and voila - the fans now stabilize the GPU temperature at 60C!

Now I simply need to work on tuning the running of the model, but that's a task for another day. While there have been frustrating times at points, I've enjoyed learning about servers, IPMI management and I've enjoyed getting my feet wet over at the Homelab subreddit - the folks there have been welcoming and knowledgeable!

Tuesday, March 24, 2020

The Quest for New Hardware, pt. 2

In the last post, I talked about the different paths one could take to do powerful machine learning on the cheap. I opted to buy an older GPU server, the SuperMicro 2027GR-TRF. I now have all the pieces and have it more or less all working, so I thought I'd share some things I ran into along the way and some embarrassing stories along the way.

Here was what I ended up ordering:
  1. The SuperMicro 2027GR-TRF. I picked a used box from Garland Computers off Ebay for $470+$75 shipping. The specs on it were:
    • 2x E5-2670 2.6 GHz 8-core CPU's
    • 64GB DDR3 RAM
    • 4 GPU slots (but K80's are dual-slot, so 2x K80 slots)
    • 10x RAID slots
  2. A Western Digital Red 500GB SSD drive from Newegg for $90. If you're not familiar, the "blue" line is consumer-grade, and the "red" line is a bit more enterprise-grade. I may be hitting it pretty hard, and I've heard it plays better with RAID, which may be important in the future.
  3. An open box K80 from MET servers on Ebay for $325+$25 shipping.
  4. An old used server console for about $75+$30 shipping on Ebay.
  5. As it turns out, the K80 did not come with the extra converter to 2 PCI, so I ended up buying a generic cable.
  6. A PS/2 to USB adapter.
  7.  A Kill-a-watt to monitor power usage from my local Menards.
So, all in all right around $1120, not including any server rack solution. Not bad!

Here are some things I learned - if you are a server administrator please enjoy a good laugh!
  1. Power buttons are a bit different on servers. Basically, when you plug in the server its fans always go, so I thought it might be on. The main manual never had a picture of the full box with where the parts were, so it took me longer than I care to admit to realize that the power button was on what I thought was a mounting bracket!  However, I kept seeing references to a control center with status lights in the manual. Finally, I saw a ribbon cable underneath some packaging on the side and saw that one of the "mounting brackets" actually had buttons. This wasn't shipped attached to the main device. (See this angled picture from a related server to get an idea of what I'm talking about.)
  2. I didn't have my SSD in quite right in the cartridge so the cartridge would go all the way but never truly connect. Took a bit to figure out why it wasn't showing up in BIOS.
  3. I've installed NVidia drivers on more than one occasion, but ran into a new issue. In the past, I've generally been using the box beforehand, and have lots of the standard tools already set up. This time I was going from a fresh install. I was installing via the apt package route. However, I failed to notice that the package - while appearing to install successfully - had a warning message about not being able to build the kernel module. I had to go get the kernel headers package manually and then it worked just fine. For some reason this step doesn't seem to make into many of the installation guides out there.
  4.  The SuperMicro 2027GR-TRF product page clearly states there is a PS/2 port for keyboard/mouse. I wasn't sure if it was one for each or just one, but at any rate I can assure you it does not have one externally - only USB. And neither the manual nor the motherboard manual seem to make mention of one. So, I needed to buy the extra adapter to make it work. Fortunately, this was cheap and I already needed to wait on the power adapter for the K80.
  5. I didn't have detailed instructions for the K80 installation specifically. The K80 is a dual-slot card, so I was a bit unsure how to handle the general placement and/or what other hardware I might need. As noted, you need the extra power adapter. I ended up attaching it to the bottom slot to start with. However, I ran into overheating while doing some minor stress testing. The K80 shows up as two cards in nvidia-smi. The first card would always end up getting quite hot, encountering thermal issues at about 92C and turning off. So I tried switching the card to the top slot. No difference.The K80 is passively cooled and has several forum posts warning that trying to use it in anything but an official NVidia-certified integrator's server is likely to cause problems - part of the reason I went with a GPU server in the first place once I landed on the K80. Unfortunately, NVidia's official integrator list does NOT go back as far as the K80, so I had to rely on the SuperMicro's word that they support it. Unfortunately this symptom is identical to not having a proper system to support the K80 (see also this important explanation here). Fortunately, however, the server itself has good power and cooling capabilities - and it only seems to be the closed-loop monitoring of the cooling that was having issues. To that end, I decided to see if I could more manually increase the fan speeds as they seemed to be running a rather low speed all the time for the GPU. This box's BIOS did not expose the fan controls, but the IPMI management did, so I was able to set it that way. Unfortunately the options were "optimal for all fans" or "turn all fans to 11", so it now sounds rather like a jet engine. I actually went and grabbed hearing protection after a while. But, at least it is a cool jet engine. Now under reasonable (but not full) load, it was stabilizing around 51C for the hotter of the two cards.
So I learned a few things, but it was awesome to see it all up and running. I can now run batch sizes of at least 256 images once I spread across the cards for a 224x224 MobileNetV2, so I have definitely hit my target. However, I haven't yet gotten a good chance to try training from scratch and with current world circumstances around COVID-19, it is unfortunately a bit lower priority.

Friday, March 13, 2020

The Quest for New Hardware, pt. 1


Recently I've been considering what types of hardware setups might be the next natural step for training the model. In the past I've used a Jetson TX1 devkit, which is a rather modest piece of hardware and allows me to do a bit heavier finetuning. I have this running on its own in a back bedroom, and it's helpful to have a box that is dedicated to that rather than trying to share time with my laptop or a desktop, for example.

However, the dataset I train on has grown considerably, now over 200K images. This is large enough that I believe it may be reasonable to consider training from scratch rather than finetuning. Training from scratch, however, requires much more computing power - and ideally could run in significantly larger batch sizes.

In fact, one of the key things I wanted to be able to do was run batch sizes on par with training "modern" CNN's from scratch using ImageNet. This varies widely, but let's say ~100 images or so per batch as a reasonable target. Generally speaking, this means more GPU RAM.

However, I'm doing my best to keep on a relatively tight budget - ideally something around $1000. This is lower than many new setups typically run, even ones aimed at the budget-conscious consumer - see for example this blog post with this nice, new box settling in at a reasonable $1700. This necessitated some deep introspection about what I was truly try to achieve, and ultimately I settled on roughly this set of priorities:
  1. GPU RAM
  2. Speed
  3. Future expansion
My thought was that by prioritizing RAM, I could train almost any model - but perhaps at a slower speed.

With these priorities in mind, I considered several possibilities for a new or second-hand setup. With a different set of constraints than many considering a machine learning rig, I was pleasantly surprised by the sheer diversity of possible solutions.
  1. Purchase a Jetson AGX Xavier devkit. The price on these seems to have been reducing, and now the devkit just got upgraded to 32GB of RAM shared between GPU and CPU.
  2. Build a desktop and add higher-quality consumer-level cards like the popular 1080 TI or perhaps even the newer 2080 to it.
  3. Build a specialized rig and purchase many second-hand crypto mining rig cards - sort of a quantity over quality approach.
  4. Look to older server solutions and see what was available.
After quite some time of searching, I settled on option #4. I discovered that I could find the now relatively old NVidia K80 cards in GPU servers for an acceptable price, providing a surprisingly strong GPU RAM/$ ratio at a satisfactory level of speed. I've never really looked into the world of servers before, and it was an enjoyable journey - but a bit of an overwhelming one at first. I joined the subreddit /r/homelab, a friendly place to discuss running servers at home. (I'd like especially to thank merkuron for their help!)

Ultimately I settled on an older GPU server by SuperMicro, the 2027GR-TRF. I currently plan to add one K80 to it, but it has support for up to two if I wish to expand in the future. I have recently been working on acquiring the full solution in various pieces, primarily through Ebay. I have gotten most of the parts but need a few more before I can put it all together, so stay tuned for more updates!