Recent comments in /f/deeplearning

kingfung1120 t1_irmd87k wrote

Hi, I am still quite new to data science, this is the first time I see someone using information theory to measure whether a neural network has suitable amount of parameters.

Do you mind sharing more? Like the reference, or some examples. I would love to know more about this. Thank you!

1

DrXaos t1_irl3dr7 wrote

It's information theory. If prior is uniform across the 100 classes (i.e. 1/100) (worst case) it takes -log(p) = log2(100) bits hypothetically to specify one actual label. Imagine it were 64 labels, then the explicit encoding is obvious, 6 bits. Information theory still works without explicit physical encoding in the appropriate limit. If priors are non-uniform it's even lower. There are 6865 examples. That's all the independent information about the labels which exists.

If you were to write out all the labels in a file, it could be compressed to no less than 45.5k bits if their probability distribution were uniform. So with hypothetically 45.5k bits in arbitrary free params you could memorize the labels. Of course in modeling there are practical constraints and regularization so this doesn't happen at that level but it should give you some pause. I know there are non-classical statistical behaviors with big models like double descent but I'm not sure we're there in this problem.

I think you're may be trying to do too much blind modeling without thinking. If you had to classify or cluster the signals by eyeball what would you look at? Can you start with a linear model? What features would you put in for that? If you're doing something like the MFCC from 'librosa' (as the youtube) there's all sorts of complex time-domain and frequency domain signal processing parameters in there that will strongly influence the results---I would concentrate on those foremost. As a first cut instead of going directly to a high parameter classifier which requires iterative stochastic training I would use a preliminary but fast-to-compute and (almost) deterministically optimizable criterion to help suggest your input space and signal processing parameters. What about clustering? If you had to do simple clustering in a Euclidean input space (you could literally program this and measure performance----how many observations are closer to the class centroid than someone else's centroid? Or just measure distances if it's not the correct centroid) what space would you use? Can you optimize to get good performance on that? Once you do that, then a high-effort complex classifier like a deep net would have a good head start and would help push performance further.

Or even what would a Naive Bayes model look like? Can you make/select features for that?

Also, one big consideration, often in audio classification there is a time translation invariance, in that the exact moment of the start isn't a physically important parameter; akin to image subset classification with 2-d x-y spatial translational invariance. If that's true then you could do lots of augmentation and make more signals of the same class with some translation operators applied for your train set.

Also consider performance measures different from 0/1 accuracy. Is that 'top 1' accuracy? And if the background accuracy is 0.01 (1/100 chance to get it right) then 0.2 might be considered good.

The no-information background performance is making a score proportional to the prior probabilities or maybe logodds thereof. Measure lift above that.

2

DrXaos t1_irjz674 wrote

log2(100) is about 6.64 and with 6865 samples that's 45.5K bits needed to fully encode/memorize the labels. You have way more than that in the effective # of bits in the free parameters. 25 million parameters? I train models on binary classification with 5000 params and a million observations.

You need some feature engineering and simplification of the model.

Are you doing something like this? https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Your frequency grid might be far too fine and you may need some windowing/filtering processing first. What's the structure of the 1723,13 input?

Given this is there some sort of informed unsupervised transformation to lower dimensionality you could use before the supervised classifier?

What you're seeing is the limits of purely blind statistical modeling, and since your dataset size isn't so big you'll have to build in some priors about the underlying 'physics' somehow through processing or structuring your model.

2

kingfung1120 t1_iripkba wrote

I haven’t handled audio data before, but it seems like you are flattening a [1723, 13]shape data into a vector(correct me if I am wrong), which is definitely going to affect the information that the model can learn since the data is sequential and it is in 2-D.

Unfortunately, I haven’t studied/read anything related to audio data deep learning, I couldn’t give you anymore in-depth opinion, but based on my understanding, using a CNN or anything recurrent should improve the model performance better than fine-tuning a MLP.

2

chatterbox272 t1_irifzyq wrote

Your model is a teeny-tiny MLP, your dataset is relatively small, it's entirely possible that you're unable to extract rich enough information to do better than 70% on the val set.

You also haven't mentioned how much L2 or Dropout you're using, nor how they do on their own. Both of those methods come with their own hyperparameters which need to be tuned.

4

perfopt OP t1_iridohw wrote

Thank you for the response. I am breaking my data to train and validation sets. Do you mean another set for test?

The baseline is overfitting - test accuracy is really high and val accuracy is much lower. That is why I added L2+Dropout

Since the validation accuracy is still very low (52%) should I not focus on improving that?

1

GPUaccelerated t1_ireeqet wrote

It’s definitely worth the test! Have some fun and play around with tensorflow. Once you have the 3 cards set up to work on the same job, test it. Compare your results to running individual jobs. I personally think they’ll do better alone but you should check it out for yourself. :) have fun!

1