LW_Master OP t1_j0llruo wrote on December 17, 2022 at 4:24 PM

Reply to comment by botfiddler in About PlaidML... by LW_Master

Last time I used it the step is just "Choose your engine", in the command line plaidml first detect the drivers, then you choose which driver you want to use, after that the system will run according to your preferences. (This is reference to use plaidml btw, for using amd outside plaidml I heard you use ROCm but it's locked to enterprise grade gpu afaik)

But yeah, cuda literally just install tensorflow cuda and you're good to go (I think, I never do it myself since I only have laptop version of 1050 and my family's PC use Radeon)...

botfiddler t1_j0ll3rr wrote on December 17, 2022 at 4:19 PM

Reply to comment by LW_Master in About PlaidML... by LW_Master

How difficult is it, compared to using a Nvidia GPU? I know AMD can be used, but the claim is that it is extra work and a lot of programs in the field are focused on Nvidia or require such GPUs.

LW_Master OP t1_j0lkdbz wrote on December 17, 2022 at 4:14 PM

Reply to comment by nlgranger in About PlaidML... by LW_Master

Last time I check the github there is an update early 2022 iirc, hopefully someone revive it again since it's the only way so far for non-Nvidia user can use gpu power for deep learning. If only recent generation nvidia isn't that expensive... Is there any news for ROCm in consumer AMD gpu (the rx 6000 series and the like)?

nlgranger t1_j0lf7c9 wrote on December 17, 2022 at 3:36 PM

Reply to About PlaidML... by LW_Master

That's the idea except the project is essentially dead.

LW_Master OP t1_j0l06bq wrote on December 17, 2022 at 1:26 PM

Reply to comment by Present-Ad-8531 in About PlaidML... by LW_Master

https://plaidml.github.io/plaidml/ Tips by my experience, uninstall tensorflow if you already have one but keep the keras. Somehow plaidml don't want to run if tensorflow already there. It should work whether you have tensorflow or not but I found it's the otherwise

Edit: if you have any progress to it please tell me about the experience.

Present-Ad-8531 t1_j0kzwpn wrote on December 17, 2022 at 1:24 PM

Reply to comment by LW_Master in About PlaidML... by LW_Master

Wow. It’s great that it works. Can you send some links here? It’s peculiar how it’s not known.

LW_Master OP t1_j0kzr4y wrote on December 17, 2022 at 1:22 PM

Reply to comment by Present-Ad-8531 in About PlaidML... by LW_Master

Yeah, looking at some forums talking about it (and I have tested their benchmark myself in RX 5600XT) it is doable. But the question is why no one literally talking or hyping about it? This thing exist since 2017 iirc...

Present-Ad-8531 t1_j0kz9qg wrote on December 17, 2022 at 1:17 PM

Reply to comment by LW_Master in About PlaidML... by LW_Master

Damn. I answered a question right above this here by mistake. I apologise.

Btw, if what you’ve said is true, then we can run DL in AMD also. MacBooks can also be used in training. It’s a giant leap no?

LW_Master OP t1_j0kz2z8 wrote on December 17, 2022 at 1:15 PM

Reply to comment by [deleted] in About PlaidML... by LW_Master

I think you are in the wrong subreddit..

[deleted] t1_j0kwxqg wrote on December 17, 2022 at 12:52 PM

Reply to About PlaidML... by LW_Master

[deleted]

elbiot t1_j0k91nm wrote on December 17, 2022 at 7:26 AM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

I think unravel is a tuple so you can just star unpack it to use it as indices without having to do anything else with it

Logon1028 OP t1_j0k4avf wrote on December 17, 2022 at 6:26 AM

Reply to comment by elbiot in Efficient Max Pooling Implementation by Logon1028

I had thought about that already, but I decided not to cast the tuple just because I can't see it being faster that what I already have. The problem is how restrictive numpy.unravel_index is. It only operates on 1D arrays and you can't pick an axis. So I have to use for loops to account for that. And I am already saving the unravelled coordinates into two numpy arrays. One for the x axis and one for the y axis of the coordinates (arrays created on layer initialization only for efficiency). I see no way to improve beyond what I currently have without having an alternative to the numpy.unravel_index function. Do you know any better alternatives to unravel_index? Or am I just screwed unless I write my own function to unravel the indices.

I don't really have anyone in my life that knows deep learning implementation. Which is why I am asking random people on reddit lol. It is a sad world for me sometimes.

elbiot t1_j0k3evv wrote on December 17, 2022 at 6:16 AM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

I'm away from a computer for a while but you could cast the tuple to an array I assume. And since creating an array is expensive and you'll keep needing an array of the same shape every step, you could just hold onto it and assign values into it instead of re-creating it every time

Logon1028 OP t1_j0jmq8c wrote on December 17, 2022 at 3:27 AM

Reply to comment by elbiot in Efficient Max Pooling Implementation by Logon1028

Well I actually did end up making a performance improvement after thinking over your suggestion a little bit. Essentially inside my triple for loop I was doing...

np.unravel_index(np.argmax(strided_result[depth][x][y], axis=None), strided_result[depth][x][y].shape)

But if I take part of your suggestion and squash the last two dimensions of the strided array I can perform argmax on axis 3 all at once OUTSIDE the for loop (which numpy probably parallelizes). This resulted in a roughly 30% improvement in performance.

layers = [
    Convolutional((1, 28, 28), 5, 2),
    Relu(),
    MaxPooling2D((2, 24, 24), 2, stride=(2,2), padding=(0,0)),
    Convolutional((2, 12, 12), 5, 2),
    Relu(),
    Flatten((2, 8, 8)),
    Dense(2 * 8 * 8, 10),
    Softmax()
]

The above model in my library takes about 10 minutes to train on the entire mnist dataset (for 5 epochs with batch size 1). Which in my opinion is acceptable as this is just an educational library.

@elbiot However, I still need the triple nested for loop. Unfortunately np.unravel_index returns a python tuple for some strange reason instead of a numpy array. Which makes it extremely awkward to work with. Not sure if you have any suggestions on a np.unravel_index function that returns a numpy result for better parallelization?

elbiot t1_j0ivmop wrote on December 16, 2022 at 11:44 PM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

Yeah, I was just thinking in 1D. Im not at a computer so I can't try anything but roughly what I'm thinking is you have a (H, W, D) array and use stride tricks to get a (H, W, D, wx, wy). If you could get that to be (H, W, D, wx*wy) then argmax could give you a (H, W, D) array of indices. I dunno if you can reshape a strided array or use strides to get the shape in question

rubbledubbletrubble OP t1_j0iib4p wrote on December 16, 2022 at 10:05 PM

Reply to comment by BrotherAmazing in Why does adding a smaller layer between conv and dense layers break the model? by rubbledubbletrubble

The 1000 layer is the softmax layer. I am using a pretrained model and training the classification layers. My logic is to reduce the number of output layers the feature extractor to reduce the number of total parameters.

For example: If mobilenet outputs 1280 and I had a 1000 unit dense layer. The parameters would be 1.28 million. But if I added a 500 unit layer in the middle, it would make the network smaller.

I know the question is bit vague. I was just curious

Logon1028 OP t1_j0ifail wrote on December 16, 2022 at 9:43 PM

Reply to comment by elbiot in Efficient Max Pooling Implementation by Logon1028

Not really. The strided result is a 5 dimensional array. [depth][output_x][output_y][stride_x][stride_y]. I am basically applying an argmax (to the last two axes) then an unravel to get the 2d array of indices for every element in [depth][x][y]. You can see the unravel method (for 2D) on this page...https://numpy.org/doc/stable/reference/generated/numpy.argmax.html

My problem is I basically have a 3d array of strides. And each stride itself is also a 2d array of values. I need to apply the unravel and argmax to every stride (hence the triple nested for loop to iterate over the first 3 dimensions). I don't see how reshaping would allow me to apply a function to the first 3 dimensions more efficiently.

I have been reading to see if I can somehow vectorize the unravel and argmax then apply it to the first 3 dimensions.

M4mb0 t1_j0gwl88 wrote on December 16, 2022 at 3:41 PM

Reply to comment by Outrageous_Room_3167 in I have 6x3090 looking to build a rig by Outrageous_Room_3167

Usually you'd get something like this https://www.supermicro.com/en/products/system/GPU/4U/SYS-420GP-TNR to host many cards, but with FE one I think they are too large. My best guess is to go for a 5H or even 6H mining server chassis.

vade t1_j0gpows wrote on December 16, 2022 at 2:54 PM

Reply to comment by Outrageous_Room_3167 in I have 6x3090 looking to build a rig by Outrageous_Room_3167

It’s fine - we haven’t had a huge issue due to it but we work on video related projects so memory is always a boon. That’s all!

Outrageous_Room_3167 OP t1_j0gpkqs wrote on December 16, 2022 at 2:53 PM

Reply to comment by vade in I have 6x3090 looking to build a rig by Outrageous_Room_3167

Isn't that amount of RAM good enough for the setup? Or you think it's a bottle neck in the training, just new to all of this :)

Outrageous_Room_3167 OP t1_j0gpeu8 wrote on December 16, 2022 at 2:52 PM

Reply to comment by M4mb0 in I have 6x3090 looking to build a rig by Outrageous_Room_3167

The FE is 3 slot, 2 slot blower were too expensive. No, we don't care about noise.

vade t1_j0gp6ha wrote on December 16, 2022 at 2:50 PM

Reply to comment by Outrageous_Room_3167 in I have 6x3090 looking to build a rig by Outrageous_Room_3167

We have a Ryzen 3950 max ram (128gb) - but I’d like to get a system that supports more ram and 16x on all pci slots. But alas - $$$

Outrageous_Room_3167 OP t1_j0gp4cf wrote on December 16, 2022 at 2:50 PM

Reply to comment by 100drunkenhorses in I have 6x3090 looking to build a rig by Outrageous_Room_3167

Yeah we have a ton of extrude here our company does robotics stuff so bunch of tools and parts to build anything.

Outrageous_Room_3167 OP t1_j0goxzp wrote on December 16, 2022 at 2:49 PM

Reply to comment by vade in I have 6x3090 looking to build a rig by Outrageous_Room_3167

>I run 3x 3090 in a single case, without water cooling, but using one PCI riser and keeping the case open to allow for airflow. This is on a single 1600w PSU, no NVLink.

Oh sweet. So maybe we go the route of 2, 3x3090 machines. Might be the better direction. Was considering 2000W PSU & 4x3090. How much memory & what CPU are you using?

Outrageous_Room_3167 OP t1_j0gopig wrote on December 16, 2022 at 2:47 PM

Reply to comment by suflaj in I have 6x3090 looking to build a rig by Outrageous_Room_3167

You know typing it out, i had some doubts LOL thanks

Recent comments in /f/deeplearning