Recent comments in /f/deeplearning

johnGettings OP t1_j7gpg3x wrote

A 224x224 image is sufficient for most classification tasks, but there are instances where the fine details of a large image need to be analyzed. Hi-ResNet is the ResNet50 architecture expanded (with the same rules from the paper) to allow for higher resolution images.

I was working on a coin grading project and found that accuracy could not surpass 30% because the image size completely obscured the necessary details of the coin. One option is to tile the image, run them each through a classifier, and combine the outputs. Another is to just try a classifier with a higher resolution input, which is actually kind of difficult to find. Maybe I did not look hard enough, but I figured it would be a good exercise to build this out regardless.

It may come in handy for you later. It's a very simple function with 3 arguments that returns a Hi-ResNet Tensorflow model.

3

BellyDancerUrgot t1_j7f0u7u wrote

Okay yeah Idk wtf I was typing. Yes 0.176gb for just the parameters. U still have to account for dense representations of long sequences, that too 8 times, activations, gradients and all these multiplied by the number of layers. There was a formula to approximate the value I read somewhere online. Activations I think take up way more memory than the model itself.

The memory requirement is roughly inline with most mid size transformer models I think.

3

BellyDancerUrgot t1_j7eq93o wrote

Each Float64 is 4 bytes. U said u have 22M parameters.

Also besides ur params and activations u still have gradients + sequences are mapped for each attention head so multiply that by 8 as well.

For context I think deeplabv3 which iirc is a model with 58mil parameters was trained on 8 V100s.

Edit : I clearly had a brain stroke while writing the first part so ignore

1

Appropriate_Ant_4629 t1_j7clc8s wrote

The LAION project ( https://laion.ai/ ) is probably the closest thing to this.

They're looking for volunteers to help work on their fully F/OSS ChatGPT successor now. A video describing the help they need can be found here.

They have a great track record on similar scale projects. They've partnered with /r/datahoarders and volunteers on creation of training sets including their 5.8 billion image/text-pair dataset that they used to train a better version of CLIP.

Their actual training of models tends to be done on some of the larger European supercomputers, though. If I recall correctly, their CLIP-derivative was trained with time donated on JUWELS. Too hard to split up such jobs into average-laptop-sized tasks.

29

junetwentyfirst2020 t1_j7c7gpj wrote

You should consider diving into the topic a little deeper. What you’re talking about is distributing the computation, which is something that is already being done at some scale or another when there is more than one gpu or multiple machines. An outside example of this that you can donate your computers compute to SETI.

Your question about wether it can beat an existing implementation of gpt is the most complicated question ever posed in the history of humanity. It sounds like you’re assuming that this will have more compute than a dedicated system, but there’s a little more to getting something that performs better than just compute. Compute is a bottle neck, but only one of many.

9

XecutionStyle t1_j78jsua wrote

Error drives learning:

If Error ∝ (Target - Output)

Then you start your network with random weights (so the Output is random and error is large). When you pass the error back through the network, the weights are adjusted proportional to the error. Over time, the weights will settle where (Target - Output) is as low as possible.

This concept is true for any situation: if you're working with image data, no matter what architecture is used to produce the Output, you still compare it with Target (or 'label', Length of Pagrus for your case), and pass the Error back through the network to improve it iteratively.

Try building the simplest neuron: 1 input -> 1 output and use backpropagation to train until convergence.

For your assignment you could use a CNN (but a simple Feed-forward network would work too as you're just outputting 1 value for total length, so it's really a regression task) to get the Output, and train its weights which are internally shared (the window you shift across the image) which are trained the same way. You compute the output, compare it with the actual length of the Pagrus fish (you passed in as input), get the Error and the method above to improve the Network for the task applies.

1

hiro_ono t1_j7830om wrote

3