Recent comments in /f/deeplearning
BobDope t1_iqwqnaw wrote
JOMAMA Classification Model 192
beingsubmitted t1_iqvs8ix wrote
Reply to comment by camaradorjk in What is the best audio classification model to use? by camaradorjk
All you need to know over time is the pitch being played, which is a frequency. The audio file represents a waveform, and all you need to know is the frequency of that waveform over time. There's no need for anything sequential. 440Hz is "A" no matter where it comes in a sequence. It's A if it comes after C, and it's A if it comes after F#.
A sequential model might be useful for natural language, for example, because meaning is carried between words. "Very tall" and "Not Tall" are different things. "He was quite tall" and "No one ever accused him of not being tall" are remarkably similar things. Transcribing music is just charting the frequency over time.
That said, you cannot get the frequency from a single data point, so there is a somewhat sequential nature to things, but it's really just that you need to transform the positions of the waveform over time into frequency, which the fourier transform does. When music visualizations show you an EQ (equalizer) chart to go with your music, this is what they're doing - showing you how much of various frequencies are present at a given time in the music, using a FFT. A digital equalizer similarly transforms audio into a frequency spectrum, allows you to adjust the frequency spectrum, and then transforms back into a waveform.
camaradorjk OP t1_iqvmzun wrote
Reply to comment by camaradorjk in What is the best audio classification model to use? by camaradorjk
Not really the music sheets, just the musical notes. I will display the proper finger position on a flute based on the musical notes predicted. I'm trying to create a learning tool.
camaradorjk OP t1_iqvje9f wrote
Reply to comment by beingsubmitted in What is the best audio classification model to use? by camaradorjk
Thank you so much for taking the time to answer my question. You're right on the first one, my goal is to transcribe music or flute music into notes. But I'm a little confused about why there's no need to use a deep learning model because I thought in the first place that I could also use sequential models. Could you elaborate on that for me? Thank you so much.
PS:I will surely look into your recommendation about FFT.
beingsubmitted t1_iqvdo7x wrote
I'm a little unclear - there are three different things you might be trying to do here. The first would be transcription - taking an audio file and interpreting it into notes. That wouldn't typically require deep learning on it's own, just a fourier transform. The second would be isolating a specific instrument in an ensemble - finding just the recorder in a collection of different instruments all playing different things. The third would be generation, inferring unplayed future notes based on previous notes.
Are you wanting to transcribe, isolate, generate, or some combination?
I'm thinking you're wanting to transcribe. If that's the case, FFT (fast fourier transform) would be the algo to choose. If you google "FFT music transcription" you'll get a lot of info. https://ryan-mah.com/files/posts.amt_part_2.main.pdf
[deleted] t1_iqv8voj wrote
[removed]
Iamhandsomesorry t1_iqv8tt6 wrote
Ligma Classification Network
PleaseKillMeNowOkay OP t1_iqufvaa wrote
Reply to comment by sydjashim in Neural network that models a probability distribution by PleaseKillMeNowOkay
This seems interesting. I'll give this a shot. Thanks!
sydjashim t1_ique162 wrote
Reply to comment by PleaseKillMeNowOkay in Neural network that models a probability distribution by PleaseKillMeNowOkay
I have got a quick guess here.. maybe can be of help to you.. take the n-1 layers weights of your first learned model (trained weights) then try finetuning with the 4 outputs and observe either your validation loss is improving.
If so, then later you can take the untrained initial weights of your first model (till n-1th layer) then trying converging them with 4 outputs. This step is mentioned such that you have got a model started training from scratch for 4 outputs but having the same initial weights for both the models.
Why am i saying this ?
Well. I think you could try in this way since you expect to keep maximum params esp. model parameters (weights) similar while running the comparision between them.
PleaseKillMeNowOkay OP t1_iqu4rs2 wrote
Reply to comment by sydjashim in Neural network that models a probability distribution by PleaseKillMeNowOkay
Same initialization but not the exact weights. However, I've run the experiments enough times with the same result for me to be sure that the initial weights aren't an issue.
Chigaijin t1_iqu05wy wrote
Reply to comment by Icy-Put177 in New Laptop for Deep/Machine Learning by MyActualUserName99
Razer makes them with input from Lambda Labs, a deep learning company that hosts cloud GPUs and supports onsite builds as well. Lambda provides support and have been very helpful the few times I've needed to reach them. Base model (Linux) is $3.5k with a year warranty, $4.1k for the same with a 2 year warranty, and $5k for a dual Linux/windows machine that has a 3 year warranty. All machines have the same specs, so it's really support/warranty you're paying for.
thebear96 t1_iqtxrnx wrote
Reply to comment by sydjashim in Neural network that models a probability distribution by PleaseKillMeNowOkay
Well I assumed that the network had more layers and so more parameters. More parameters can represent data much better and quicker. For example if you had a dataset with 30 features, and you use a Linear layer with 64 neurons, it should be able to represent each data point much quicker and easier than let's say a linear layer with 16 neurons. That's why I think the model would get converged quicker. But in OPs case his hidden layers are the same, only the output layer has more neurons. In that case we won't have a quick convergence.
sydjashim t1_iqtuqdt wrote
Reply to comment by thebear96 in Neural network that models a probability distribution by PleaseKillMeNowOkay
Can you reason out why the model will get converged quicker ?
sydjashim t1_iqtu9hp wrote
Did you keep same initial weights for both the networks ?
PleaseKillMeNowOkay OP t1_iqscxo9 wrote
Reply to comment by SimulatedAnnealing in Neural network that models a probability distribution by PleaseKillMeNowOkay
The simpler model had lower training loss with the same number of epochs. I tried training the second model until it had the same training loss as the first model, which took much longer. The validation did not improve and had a slight upward trend, which I know means that it's overfitting.
SimulatedAnnealing t1_iqs94b6 wrote
Reply to comment by PleaseKillMeNowOkay in Neural network that models a probability distribution by PleaseKillMeNowOkay
The most plausible explanation is overfitting. How do they compare in terms of error in the train set?
Best_Definition_4385 t1_iqr6aca wrote
Reply to comment by Chigaijin in New Laptop for Deep/Machine Learning by MyActualUserName99
>The Tensorbook is only $3500 unless you're looking at the dual boot model
This is not a comment about you, it's just a general comment. Considering the fact that setting up a dual boot system takes minimal time and expertise, if someone decides to spend $500 for a dual boot model, do you really think they have the computer skills that would need a powerful laptop? I mean, they can't figure out how to dual boot but they want to do deep learning? lmao
Best_Definition_4385 t1_iqr5bxp wrote
Something tells me you don't do deep learning yet
thebear96 t1_iqr04o9 wrote
Reply to comment by PleaseKillMeNowOkay in Neural network that models a probability distribution by PleaseKillMeNowOkay
Ideally it should. In that case you will have a worse performance for the second architecture. When you compare you'll have to say that. But it's pretty expected that the second architecture will not perform as well as the first one, so I'm not sure if there's much use comparing. But it's definitely doable.
PleaseKillMeNowOkay OP t1_iqqz3lp wrote
Reply to comment by thebear96 in Neural network that models a probability distribution by PleaseKillMeNowOkay
I could add more linear layers and based on my experiments it would probably help but my intention is to compare my new model with the old one for which I presume the architecture should be as close as possible.
thebear96 t1_iqqykoz wrote
Reply to comment by PleaseKillMeNowOkay in Neural network that models a probability distribution by PleaseKillMeNowOkay
That shouldn't create a lot of difference but yes the performance should be worse than the first network in that case. It's far easier to predict two outputs than four. You can try increasing linear layers and using a slower learning rate to see if the model improves.
PleaseKillMeNowOkay OP t1_iqqxw6h wrote
Reply to comment by thebear96 in Neural network that models a probability distribution by PleaseKillMeNowOkay
I wouldn't call it a bigger network necessarily. The second network has two more output neurons compared to the first. Rest are the same. How much difference that makes. Idk
thebear96 t1_iqqxkb4 wrote
Reply to comment by PleaseKillMeNowOkay in Neural network that models a probability distribution by PleaseKillMeNowOkay
That's strange. It could be a data quantity issue. Bigger networks typically will need more data to perform well.
PleaseKillMeNowOkay OP t1_iqqxd7o wrote
Reply to comment by thebear96 in Neural network that models a probability distribution by PleaseKillMeNowOkay
Yes, I trained until the validation loss stopped improving, and then some more just to make sure.
mr_birrd t1_iqxl8ol wrote
Reply to What is the best audio classification model to use? by camaradorjk
If it's only pure recorder and since it can only play one note per time, you could find the frequency using the fourier transform. Best if you use denoising before. Then you can just map the loudest frequencies to actual notes.