Recent comments in /f/deeplearning

one_eyed_sphinx OP t1_j7tzoiq wrote

>eco

so this is the fine point that I want to understand, what I am trying to optimize with the build is the data transfer time, how much time it takes to load a model from RAM to VRAM. if I have10 models that need 16 GB of VRAM to run, the need to share resources. so I want to "memory hot swap" (I don't know if there is a proper term for it, I found "Bin packing") the models on an incoming request. so the data transfer is somewhat critical in my point of view and as I understand it, only the PCI speed is the bottleneck here, correct me if I'm wrong.

1

suflaj t1_j7s57dy wrote

At the moment a 7950X in eco mode combined with a ROG Strix X670E seems to be the best combo.

It running in 8x mode on PCI-E gen 4 doesn't really matter, according to benchmarks the performance difference is a few %. It will take a lot of time in 16x because it's pretty much the same speed. It will not get significantly faster with a different mobo, you're limited by the GPU itself, not the interface.

3

DMLearn t1_j7pq8wc wrote

The model is trained by getting rewarded for fooling a model that tries to distinguish between the real and fake images. So no, it won’t be perfect, but it’s going to be good enough to trick a model the vast majority of the time because that is literally a part of the training. Not just a small part, that’s is the central tenet of the training and optimization of generative models, generative ADVERSARIAL networks.

1

levand t1_j7o5zeb wrote

This is inherently a super hard problem, because (to oversimplify) the loss function of any AI generating NN is to minimize the difference between a human generated and AI generated images. So the state of the art for detection & generation is always going to be pretty close.

9

johnGettings OP t1_j7lqkj2 wrote

Yes, definitely agree. The project started as one thing, then turned into another, then another. I was only doing the coin grading for fun and wasn't planning on actually implementing it anywhere. So I switched gears and just focused on building a high resolution ResNet, regardless of what would be best for the actual coin grading.

There are probably better solutions, especially for this size of a dataset, and maybe a sliding window is necessary to achieve very high accuracy.

But I think this model can still be useful and preferable for some datasets of large images with fine patterns. Or at the very least preferred for simplicitys sake.

1

GufyTheLire t1_j7ljbmj wrote

Do you expect the model to learn subtle details useful for classification from a relatively small training dataset? Wouldn't it be a better approach to train a defect detector for the model to know what is important on your images and then classify found features? Maybe this is the reason why large classification models are not widely used?

2

beautyofdeduction OP t1_j7jqohn wrote

I wish I can send you my Github. But the original Attention is All You Need paper trained on sequences of length 25000 on multiple K80's (stated by the authors), which has only 12GB vram. Yes they used multiple GPUs, but afaik each GPU needs to be able to handle its own batch. Or maybe not? Again I wish I could show you my code.

1

neuralbeans t1_j7jdiqz wrote

A sequence length of 6250 is massive! It's not just 6250*6250 since you're not multiplying one float per pair of sequence items. You're multiplying the key and value vectors together per pair of sequence items, and this is done for every attention head (in parallel). I think you're seriously under estimating the problem.

What transformer is this which accepts a sequence length of 6250?

1