Recent comments in /f/deeplearning
Ok_Firefighter_2106 t1_iy6t95h wrote
Reply to Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
2,3
2:For example you use zero values for initialization, due to the symmetric nature of NN, now all neurons become the same, then the multi-layer NN is equal to a simple linear regression since the NN fails to break the symmetry. Therefore, is the problem is non-linear, the NN just can't learn.
​
3: as explained in other answers.
nutpeabutter t1_iy3z9lc wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
There is indeed a non-zero gradient. However, symmetric initialization introduces a plethora of problems:
- The only way to break the symmetry is through the random biases. A fully symmetric network effectively means that individual layers act as a though they are a single weight(1 input 1 output layer), this means that it cannot learn complex functions until the symmetry is broken. Learning will thus be highly delayed as it has to first break the symmetry before being able to learn a useful function. This can explain the plateau at the start.
- Similar weights at the start, even if symmetry is broken, will lead to poor performance. It is easy to get trapped in local minima if your outputs are constrained due to your weights not having sufficient variance, there is a reason why weights are typically randomly initalized
- Random weights also allow for more "learning pathways" to be established, by pure chance alone, a certain combination of weights will be slightly more correct than others. The network can then abuse this to speed up it's learning, by changing it's other weights to support these pathways. Symmetric weights do not possess such an advantage.
Own-Archer7158 t1_iy3q6b6 wrote
Reply to comment by canbooo in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
You are right, thank you
canbooo t1_iy3pylo wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
Bad initialization can be a problem if you do it yourself (i.e. bad scaling of weights) and if you are not using batch or other kinds of normalizations, since it might make your neurons die. E.g. a tanh neuron with too large input scale will only predict -1 or 1 for all data, which leads it to being dead, i.e. not learning anything due to 0 grad for the entire data set.
Own-Archer7158 t1_iy3mec9 wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
Note that the minimal loss is reached when the parameters make neural network predictions the closest to the real labels
Before that, the gradient is non zero generally (except for an very very unlucky local minimum)
You could see the case of the linear regression with least square error as loss to understand better the underlying optimization problem (in one dimension, it is a square function to minimize, so no local minimum)
Own-Archer7158 t1_iy3m6j0 wrote
Reply to comment by nutpeabutter in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
If all weight are the same (assume 0 to be simple) then the output of the function/neural network is far from the objective/label
The gradient is therefore non zero
And finally the parameters are updated : theta = theta + learning_rate*grad_theta(loss)
And when the parameters are updated the loss is changed
Usually, the parameters are randomly choosen
nutpeabutter t1_iy3kb5n wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
>Bad initialization is rarely a problem
What if all weights are the same?
Own-Archer7158 t1_iy3h8pp wrote
Reply to comment by Own-Archer7158 in Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
If the learning rate is zero, the update rule of the params makes the params unchanged
The data balancing does not change the loss (it only changes the overfitting) and same for the regularization strength too low
Bad initialization is rarely a problem (with a lack of chance you could get a local minimum directly but rare event)
Own-Archer7158 t1_iy3h1oa wrote
Reply to Can someone pls help me with this question and explain to me why you chose your answers? Many thanks! by CustardSignificant24
3 is the only possible solution
Rishh3112 t1_iy3dnqd wrote
Reply to Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
Hey, I work for a start-up and the ai models are trained in a AWS cloud system. I suggest on having a AWS server since the company will be requiring host service later on, and it's easier to hold a cloud server and manage it. The security system of AWS is pretty good and hosting APIs is a lot easier with a cloud server. Even training model is a lot quicker since building a AWS standard system will be quite expensive and not just the system cost but the power consumption of the system will also be high. When considering about the cost between a AWS system and a physical system in office the factor of power consumption negates the monthly cost of a cloud system. Also a cloud system could be used on any system from around the world but for a physical system in office, you would require to setup a VPN to access and the system needs to be on power to use it whenever you want. AWS server will charge only while you are using the server with a minimal monthly charge. In comparison a physical system will be initially expensive and the cost of electricity, vpn and systems and room for cooling will cost you more than a AWS server in the longer run. Hope you found this helpful.
Character-Ad9862 OP t1_iy3ccby wrote
Reply to comment by CKtalon in Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
Yea im a bit worried the graphics card could be a little out of date as well. However, the RTX 6000 most likely will have a significantly higher prize shield which might be too much considering my budget. Is there any alternative card out there that could meet my requirements?
thefizzlee t1_iy38vtp wrote
Reply to Best GPU for deep learning by somebodyenjoy
I'm gonna assume the Nvidia A100 80gb edition is out of your budget but that is the gold standard for machine learning, they're usually deployed in clusters of 8 together but one is already better than 2 3090s for deeplearning.
If however you want to choose between 2 3090s or a 4090 and you're running into vram issues I'd go for the dual 3090, clustering gpus is very well supported in machine learning so you'll essentially getting double the performance and from what I know 1 4090 isn't faster than 2 3090s plus you'll double your vram
Edit: if you want to safe a buck and you're software supports it you could also look into the new Radeon rx 7900 xtx, as long a you don't need Cuda support or tensor cores
BipolarStoicist t1_iy2u5fu wrote
Reply to comment by CKtalon in Best GPU for deep learning by somebodyenjoy
Also the cpu might become a bottleneck for this reason
VinnyVeritas t1_iy2nq1w wrote
Reply to Best GPU for deep learning by somebodyenjoy
Just get two 4090 and power limit them to 350 watts using "nvidia-smi", there's almost no performance loss and you don't need NVLink anyway to do multi-gpu training.
CKtalon t1_iy2n7t0 wrote
Reply to Best GPU for deep learning by somebodyenjoy
Get the 4090. Besides you only have 32GB ram. Feeding 2 GPUs with data can be a bottleneck.
CKtalon t1_iy2n56h wrote
Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy
NVLink doesn’t pool VRAM no matter what Nvidia’s marketing says. I have NVLink. It just doesn’t.
somebodyenjoy OP t1_iy2a1r9 wrote
Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy
Exactly what I was thinking. Thanks!
--dany-- t1_iy29wqs wrote
Reply to comment by somebodyenjoy in Best GPU for deep learning by somebodyenjoy
I’m saying 2x 3090s are not much better than a 4090. According to lambda labs benchmarks a 4090 is about 1.3 to 1.9 times faster than a 3090. If you’re after speed then a 4090 definitely makes more sense as it’s only slightly slower but is much more power efficient and cheaper than 2x 3090s.
somebodyenjoy OP t1_iy23k2m wrote
Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy
I do hyperparameter tuning too, so the same model will have to train multiple times. More times the better, as I can try more architectures. So speed is important. But you’re saying that 4090 is not much better than 3090 in terms of speed huh
--dany-- t1_iy2149l wrote
Reply to comment by somebodyenjoy in Best GPU for deep learning by somebodyenjoy
Not too much by some benchmarks. So speed is not your point here. Your main concern is if the model and training data can fit your VRAM.
somebodyenjoy OP t1_iy20svg wrote
Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy
Hi, thanks for your reply. So 2 3090s will be faster than one 4090, correct?
--dany-- t1_iy1zq6n wrote
Reply to Best GPU for deep learning by somebodyenjoy
3090 has NVLink bridge to connect two cards to pool memories together. Theoretically you’ll have 2x computing power and 48GB VRAM to do the job. If VRAM size is important for your big model and you have a beefy PSU then this is the way to go. Otherwise just go with a 4090.
If you don’t need to train a model frequently, colab or some paid gpu rental services might be easier for your wallet and power bill. For example it’s only about $2 per hour to rent 4x RTX A6000 from some rentals.
CKtalon t1_iy1mmlw wrote
Reply to Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
A6000 is almost 2 years old. The newer version the RTX 6000 (yes confusing naming convention) is coming out in about 3 months time, although it might not be easy to get your hands on one.
Character-Ad9862 OP t1_iy0n2uw wrote
Reply to comment by sweeetscience in Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
Really appreciate your insights. Having that extra dependency layer is something that has also worried me.
jazzzzzzzzzzzzzzzy t1_iy7fk3v wrote
Reply to Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
https://www.youtube.com/watch?v=_okxGdHM5b8
​
Discussion of the orignal paper.