Recent comments in /f/deeplearning
johnGettings OP t1_j7gpg3x wrote
A 224x224 image is sufficient for most classification tasks, but there are instances where the fine details of a large image need to be analyzed. Hi-ResNet is the ResNet50 architecture expanded (with the same rules from the paper) to allow for higher resolution images.
I was working on a coin grading project and found that accuracy could not surpass 30% because the image size completely obscured the necessary details of the coin. One option is to tile the image, run them each through a classifier, and combine the outputs. Another is to just try a classifier with a higher resolution input, which is actually kind of difficult to find. Maybe I did not look hard enough, but I figured it would be a good exercise to build this out regardless.
It may come in handy for you later. It's a very simple function with 3 arguments that returns a Hi-ResNet Tensorflow model.
Long_Two_6176 t1_j7gc9rz wrote
Remember also that computations, not just parameter count, cost GPU memory. Check your intermediate tensor sizes
grigorij-dataplicity t1_j7f6n95 wrote
Reply to comment by Vegetable-Skill-9700 in Launching my first-ever open-source project and it might make your ChatGPT answers better by Vegetable-Skill-9700
Ok, waiting for your update!
neuralbeans t1_j7f68rv wrote
Parameters are a tiny portion of the values in GPU. The number of activations grows quadratically with sequence size.
BellyDancerUrgot t1_j7f0u7u wrote
Reply to comment by beautyofdeduction in Why does my Transformer blow GPU memory? by beautyofdeduction
Okay yeah Idk wtf I was typing. Yes 0.176gb for just the parameters. U still have to account for dense representations of long sequences, that too 8 times, activations, gradients and all these multiplied by the number of layers. There was a formula to approximate the value I read somewhere online. Activations I think take up way more memory than the model itself.
The memory requirement is roughly inline with most mid size transformer models I think.
beautyofdeduction OP t1_j7eqr8c wrote
Reply to comment by BellyDancerUrgot in Why does my Transformer blow GPU memory? by beautyofdeduction
8 Bytes * 22M = 0.176 GB?
BellyDancerUrgot t1_j7eq93o wrote
Reply to comment by beautyofdeduction in Why does my Transformer blow GPU memory? by beautyofdeduction
Each Float64 is 4 bytes. U said u have 22M parameters.
Also besides ur params and activations u still have gradients + sequences are mapped for each attention head so multiply that by 8 as well.
For context I think deeplabv3 which iirc is a model with 58mil parameters was trained on 8 V100s.
Edit : I clearly had a brain stroke while writing the first part so ignore
beautyofdeduction OP t1_j7epm3u wrote
Reply to comment by BellyDancerUrgot in Why does my Transformer blow GPU memory? by beautyofdeduction
Can you elaborate?
BellyDancerUrgot t1_j7ec7oj wrote
~ 83gb I think, not 500mb
hinsonan t1_j7e1y9z wrote
There is some work being done to train/inference large models across multiple machines.
sEi_ t1_j7e113t wrote
Afaik you can not split the process into 'small parts' as the whole model needs to be in the VRAM when processing it. And a consumer computer have a hard time to utilize 250+ GB VRAM.
But with the development speed in this area maybe the hurdle will be overcome soon™.
[deleted] t1_j7d3s2t wrote
Reply to comment by Appropriate_Ant_4629 in What hardware specifications are generally required for AI/ML/DL by AnimeFreak888
[deleted]
[deleted] t1_j7d3jfx wrote
Reply to comment by Some-Assistance-7812 in What hardware specifications are generally required for AI/ML/DL by AnimeFreak888
[deleted]
[deleted] t1_j7d37wj wrote
Reply to comment by Appropriate_Ant_4629 in What hardware specifications are generally required for AI/ML/DL by AnimeFreak888
[deleted]
Appropriate_Ant_4629 t1_j7clc8s wrote
The LAION project ( https://laion.ai/ ) is probably the closest thing to this.
They have a great track record on similar scale projects. They've partnered with /r/datahoarders and volunteers on creation of training sets including their 5.8 billion image/text-pair dataset that they used to train a better version of CLIP.
Their actual training of models tends to be done on some of the larger European supercomputers, though. If I recall correctly, their CLIP-derivative was trained with time donated on JUWELS. Too hard to split up such jobs into average-laptop-sized tasks.
earthsworld t1_j7cc7l5 wrote
> train some model
which model? who's creating it? who's testing it? and who the fuck is paying for it?
lawless_c t1_j7cbt36 wrote
Communication between nodes would become a big bottle neck.
junetwentyfirst2020 t1_j7c7gpj wrote
You should consider diving into the topic a little deeper. What you’re talking about is distributing the computation, which is something that is already being done at some scale or another when there is more than one gpu or multiple machines. An outside example of this that you can donate your computers compute to SETI.
Your question about wether it can beat an existing implementation of gpt is the most complicated question ever posed in the history of humanity. It sounds like you’re assuming that this will have more compute than a dedicated system, but there’s a little more to getting something that performs better than just compute. Compute is a bottle neck, but only one of many.
ze_baco t1_j7c6yhd wrote
This is federated learning. We sure can do this, but it would require a lot of cooperation...
XecutionStyle t1_j78shby wrote
Reply to comment by hiro_ono in please help a bunch of students?(with pre annotated data set) we were assigned to this task with no prior knowledge of ML i don't know where to begin with we tried a couple of method which ultimately failed id be thankful for anyone who would tell me in steps what to do with this data[D] by errorr_unknown
^ Simply this
XecutionStyle t1_j78jsua wrote
Reply to please help a bunch of students?(with pre annotated data set) we were assigned to this task with no prior knowledge of ML i don't know where to begin with we tried a couple of method which ultimately failed id be thankful for anyone who would tell me in steps what to do with this data[D] by errorr_unknown
Error drives learning:
If Error ∝ (Target - Output)
Then you start your network with random weights (so the Output is random and error is large). When you pass the error back through the network, the weights are adjusted proportional to the error. Over time, the weights will settle where (Target - Output) is as low as possible.
This concept is true for any situation: if you're working with image data, no matter what architecture is used to produce the Output, you still compare it with Target (or 'label', Length of Pagrus for your case), and pass the Error back through the network to improve it iteratively.
Try building the simplest neuron: 1 input -> 1 output and use backpropagation to train until convergence.
For your assignment you could use a CNN (but a simple Feed-forward network would work too as you're just outputting 1 value for total length, so it's really a regression task) to get the Output, and train its weights which are internally shared (the window you shift across the image) which are trained the same way. You compute the output, compare it with the actual length of the Pagrus fish (you passed in as input), get the Error and the method above to improve the Network for the task applies.
harry-hippie-de t1_j78b01o wrote
There's a difference in training and inference. Hardware requirements for training are larger.
hiro_ono t1_j7830om wrote
Reply to please help a bunch of students?(with pre annotated data set) we were assigned to this task with no prior knowledge of ML i don't know where to begin with we tried a couple of method which ultimately failed id be thankful for anyone who would tell me in steps what to do with this data[D] by errorr_unknown
Use a CNN and change the output to a single unit with no activation. It’s a regression problem so train with mean squared error. Just find a tutorial for CNNs in tensorflow or PyTorch
foxracing4500 t1_j77rdn3 wrote
Depends on your budget? How much are you looking to spend?
ia3leonid t1_j7hgcoq wrote
Reply to Why does my Transformer blow GPU memory? by beautyofdeduction
Gradients are also stored and take as much memory as weights + activations, or more for some optimisers (Adam also tracks statistics for each weight, for example )