Recent comments in /f/deeplearning
DrXaos t1_irl3dr7 wrote
Reply to comment by perfopt in Help regularization and dropout are hurting accuracy by perfopt
It's information theory. If prior is uniform across the 100 classes (i.e. 1/100) (worst case) it takes -log(p) = log2(100) bits hypothetically to specify one actual label. Imagine it were 64 labels, then the explicit encoding is obvious, 6 bits. Information theory still works without explicit physical encoding in the appropriate limit. If priors are non-uniform it's even lower. There are 6865 examples. That's all the independent information about the labels which exists.
If you were to write out all the labels in a file, it could be compressed to no less than 45.5k bits if their probability distribution were uniform. So with hypothetically 45.5k bits in arbitrary free params you could memorize the labels. Of course in modeling there are practical constraints and regularization so this doesn't happen at that level but it should give you some pause. I know there are non-classical statistical behaviors with big models like double descent but I'm not sure we're there in this problem.
I think you're may be trying to do too much blind modeling without thinking. If you had to classify or cluster the signals by eyeball what would you look at? Can you start with a linear model? What features would you put in for that? If you're doing something like the MFCC from 'librosa' (as the youtube) there's all sorts of complex time-domain and frequency domain signal processing parameters in there that will strongly influence the results---I would concentrate on those foremost. As a first cut instead of going directly to a high parameter classifier which requires iterative stochastic training I would use a preliminary but fast-to-compute and (almost) deterministically optimizable criterion to help suggest your input space and signal processing parameters. What about clustering? If you had to do simple clustering in a Euclidean input space (you could literally program this and measure performance----how many observations are closer to the class centroid than someone else's centroid? Or just measure distances if it's not the correct centroid) what space would you use? Can you optimize to get good performance on that? Once you do that, then a high-effort complex classifier like a deep net would have a good head start and would help push performance further.
Or even what would a Naive Bayes model look like? Can you make/select features for that?
Also, one big consideration, often in audio classification there is a time translation invariance, in that the exact moment of the start isn't a physically important parameter; akin to image subset classification with 2-d x-y spatial translational invariance. If that's true then you could do lots of augmentation and make more signals of the same class with some translation operators applied for your train set.
Also consider performance measures different from 0/1 accuracy. Is that 'top 1' accuracy? And if the background accuracy is 0.01 (1/100 chance to get it right) then 0.2 might be considered good.
The no-information background performance is making a score proportional to the prior probabilities or maybe logodds thereof. Measure lift above that.
perfopt OP t1_irkuz3j wrote
Reply to comment by DrXaos in Help regularization and dropout are hurting accuracy by perfopt
I don’t follow the computation. How 45.5k bits?
I tried a model with [512,512,512] (perceptrons in each layer) and that performed very poorly < 0.2 accuracy.
DrXaos t1_irjz674 wrote
log2(100) is about 6.64 and with 6865 samples that's 45.5K bits needed to fully encode/memorize the labels. You have way more than that in the effective # of bits in the free parameters. 25 million parameters? I train models on binary classification with 5000 params and a million observations.
You need some feature engineering and simplification of the model.
Are you doing something like this? https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
Your frequency grid might be far too fine and you may need some windowing/filtering processing first. What's the structure of the 1723,13 input?
Given this is there some sort of informed unsupervised transformation to lower dimensionality you could use before the supervised classifier?
What you're seeing is the limits of purely blind statistical modeling, and since your dataset size isn't so big you'll have to build in some priors about the underlying 'physics' somehow through processing or structuring your model.
[deleted] t1_irjkoem wrote
[deleted]
perfopt OP t1_iritkyy wrote
Reply to comment by kingfung1120 in Help regularization and dropout are hurting accuracy by perfopt
Certainly. I've got to travel a couple of days but Tue after work I'll be back on this.
kingfung1120 t1_irirqor wrote
Reply to comment by perfopt in Help regularization and dropout are hurting accuracy by perfopt
Look forward to receiving updates from you ;)
perfopt OP t1_iriqicz wrote
Reply to comment by kingfung1120 in Help regularization and dropout are hurting accuracy by perfopt
Yes you are correct. I am flattening (1723, 13) shape data.
I will try out CNN as well.
kingfung1120 t1_iripkba wrote
Reply to comment by perfopt in Help regularization and dropout are hurting accuracy by perfopt
I haven’t handled audio data before, but it seems like you are flattening a [1723, 13]shape data into a vector(correct me if I am wrong), which is definitely going to affect the information that the model can learn since the data is sequential and it is in 2-D.
Unfortunately, I haven’t studied/read anything related to audio data deep learning, I couldn’t give you anymore in-depth opinion, but based on my understanding, using a CNN or anything recurrent should improve the model performance better than fine-tuning a MLP.
[deleted] t1_iripikc wrote
[deleted]
perfopt OP t1_iril212 wrote
Reply to comment by kingfung1120 in Help regularization and dropout are hurting accuracy by perfopt
The data is MFCCs created from audio files. Sort of like this - https://www.youtube.com/watch?v=szyGiObZymo
kingfung1120 t1_iriks76 wrote
What is the type of data that you are inputing into the model?
perfopt OP t1_irij7vh wrote
Reply to comment by manuLearning in Help regularization and dropout are hurting accuracy by perfopt
I tried that as well with similar results when adding L2+dropout
manuLearning t1_irij2hl wrote
Reply to comment by perfopt in Help regularization and dropout are hurting accuracy by perfopt
A rule of thumb is to take around 30% as val set
perfopt OP t1_irig9zc wrote
Reply to comment by chatterbox272 in Help regularization and dropout are hurting accuracy by perfopt
I see. I’ll try increasing the data used. My fear is that it may lead to a some categories having much less data than others.
L2 0.001 and Dropout 0.1
chatterbox272 t1_irifzyq wrote
Your model is a teeny-tiny MLP, your dataset is relatively small, it's entirely possible that you're unable to extract rich enough information to do better than 70% on the val set.
You also haven't mentioned how much L2 or Dropout you're using, nor how they do on their own. Both of those methods come with their own hyperparameters which need to be tuned.
perfopt OP t1_iriens4 wrote
Reply to comment by manuLearning in Help regularization and dropout are hurting accuracy by perfopt
For creating test and val I used test_train_spilt from sklearn
I'll I manually examine it.
But in general shouldn't the distribution be OK?
inputs_train, inputs_test, targets_train, targets_test = train_test_split(inputs, targets, test_size=0.1)
manuLearning t1_irie761 wrote
Reply to comment by perfopt in Help regularization and dropout are hurting accuracy by perfopt
I had always good experiences with dropout. Try to put a dropout layer of around 0.75 after your first layer and onedropout layer before your last layer. You can also put a light 0.15 layer before your first layer.
How similar is the test and val set?
perfopt OP t1_iridohw wrote
Reply to comment by jellyfishwhisperer in Help regularization and dropout are hurting accuracy by perfopt
Thank you for the response. I am breaking my data to train and validation sets. Do you mean another set for test?
The baseline is overfitting - test accuracy is really high and val accuracy is much lower. That is why I added L2+Dropout
Since the validation accuracy is still very low (52%) should I not focus on improving that?
jellyfishwhisperer t1_iriddyc wrote
Regularization and drop out helps with overfitting. It will almost always reduce your training accuracy. What you need is a testing dataset and compare there.
Knurpel t1_ireu5b5 wrote
Reply to comment by GPUaccelerated in Deeplearning and multi-gpu or not by ronaldxd2
Ooops. I'm only familiar w/ the 90
GPUaccelerated t1_ireeqet wrote
Reply to Deeplearning and multi-gpu or not by ronaldxd2
It’s definitely worth the test! Have some fun and play around with tensorflow. Once you have the 3 cards set up to work on the same job, test it. Compare your results to running individual jobs. I personally think they’ll do better alone but you should check it out for yourself. :) have fun!
GPUaccelerated t1_ireefhv wrote
Reply to comment by Knurpel in Deeplearning and multi-gpu or not by ronaldxd2
The 3070ti and 3080ti do not support nvlink.
Knurpel t1_ird8vtd wrote
Reply to comment by MyActualUserName99 in Deeplearning and multi-gpu or not by ronaldxd2
>extremely easy
... depends on your coding proficiency. As I said, "it's not as easy as sticking in another GPU."
MyActualUserName99 t1_irapufa wrote
Reply to comment by Knurpel in Deeplearning and multi-gpu or not by ronaldxd2
If you’re using Tensorflow, adding multiple GPUs is extremely easy. Just have to call some functions and make a strategy:
kingfung1120 t1_irmd87k wrote
Reply to comment by DrXaos in Help regularization and dropout are hurting accuracy by perfopt
Hi, I am still quite new to data science, this is the first time I see someone using information theory to measure whether a neural network has suitable amount of parameters.
Do you mind sharing more? Like the reference, or some examples. I would love to know more about this. Thank you!