Recent comments in /f/deeplearning
No_Cryptographer9806 t1_j6nfqhq wrote
Reply to Best practice for capping a softmax by neuralbeans
I am curious why do you want to do that? You can always post process the logits but forcing the Network to learn it will cause harm to the underlying representation imo
chatterbox272 t1_j6n3vx6 wrote
Reply to comment by neuralbeans in Best practice for capping a softmax by neuralbeans
My proposed function does that. Let's say you have two outputs, and don't want either to go below 0.25. Your minimum value already adds up to 0.5, so you rescale the softmax to add up to 0.5 as well, giving you a sum of 1 and a valid distribution.
nutpeabutter t1_j6n2eaf wrote
Reply to Best practice for capping a softmax by neuralbeans
Taking a leaf out of RL, you can add an additional entropy loss.
Alternatively, clip the logits but apply STE (copy gradients) on backprop
neuralbeans OP t1_j6n0ima wrote
Reply to comment by chatterbox272 in Best practice for capping a softmax by neuralbeans
I want the output to remain a proper distribution.
chatterbox272 t1_j6myph4 wrote
Reply to Best practice for capping a softmax by neuralbeans
If the goal is to keep all predictions above a floor, the easiest way is to make the activation into floor + (1 - floor * num_logits) * softmax(logits). This doesn't have any material impact on the model, but it imposes a floor.
If the goal is to actually change something about how the predictions are made, then adding a floor isn't going to be the solution though. You could modify the activation function some other way (e.g. by scaling the logits, normalising them, etc.), or you could impose a loss penalty for the difference between the logits or the final predictions.
Lankyie t1_j6mjvpy wrote
Reply to comment by neuralbeans in Best practice for capping a softmax by neuralbeans
yeah true, you can implement that by factoring everything back to the sum of 1 though
emilrocks888 t1_j6mjnk7 wrote
Reply to comment by neuralbeans in Best practice for capping a softmax by neuralbeans
Sorry, dictionary issue. I meant Self Attention (I ve edited previous answer)
neuralbeans OP t1_j6mjhog wrote
Reply to comment by emilrocks888 in Best practice for capping a softmax by neuralbeans
What's this about del attention?
emilrocks888 t1_j6mjf7m wrote
Reply to Best practice for capping a softmax by neuralbeans
I would scale logits before softmax, like it’s been done in self attention.Actually that scaling in self attn is to make the final dist of the attention weights to be smooth.
neuralbeans OP t1_j6miw6o wrote
Reply to comment by Lankyie in Best practice for capping a softmax by neuralbeans
It needs to remain a valid softmax distribution.
FastestLearner t1_j6mhjd2 wrote
Reply to Best practice for capping a softmax by neuralbeans
Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.
For example, if current min logit = m and allowed minimum = u, current max logit = n and allowed maximum = v, then the following loss function should help:
Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)
The max terms ensure that no loss is added when the logits are all within the allowed range. Use lamba1 and lambda2 to scale each term so that they roughly match the CE loss in strength.
Lankyie t1_j6mf6pt wrote
Reply to Best practice for capping a softmax by neuralbeans
max[softmax, lowest accepted probability]
neuralbeans OP t1_j6md46u wrote
Reply to comment by like_a_tensor in Best practice for capping a softmax by neuralbeans
That will just make the model learn larger logits to undo the effect of the temperature.
like_a_tensor t1_j6mcv1v wrote
Reply to Best practice for capping a softmax by neuralbeans
I'm not sure how to fix a minimum probability, but you could try softmax with a high temperature.
LiquidDinosaurs69 t1_j6kzkhm wrote
Reply to comment by hugio55 in Hobbyist: desired software to run evolution by hugio55
Check out Lenia artificial life simulator on YouTube. Similar concept to evolution, pretty sick. Might scratch your itch
hugio55 OP t1_j6jies6 wrote
Reply to comment by Extra-most-best in Hobbyist: desired software to run evolution by hugio55
Thanks for these replies - I appreciate it. I have a lot to dig into. I think the barrier of entry may be higher than I had hoped for.
hugio55 OP t1_j6jhyb3 wrote
Reply to comment by LiquidDinosaurs69 in Hobbyist: desired software to run evolution by hugio55
Hey LiquidDinosaur - thanks for this info. I have been hearing about open AI quite a bit and thus will dive in deep into what they have to offer. I will say that anything C++ (or C, or A through Z for that matter) will be beyond my breadth, but that's OK. I still enjoy watching it happen from some of the pros on youtube.
Sorry-Resolution-334 t1_j6hqdft wrote
Sorry-Resolution-334 t1_j6hqaom wrote
廿4土44
suflaj t1_j6hfkdj wrote
Reply to comment by XecutionStyle in Why did the original ResNet paper not use dropout? by V1bicycle
> BN is used to reduce covariate shift, it just happened to regularize.
The first part was hypothesized, but not proven. It is a popular belief, like all other hypotheses why BN works so well.
> Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).
What does becoming big mean? Dropout was introduced in 2012 and used ever since. It was never big in the sense that you would always use it.
It is certainly false that Dropout was used because of ResNets or immediately after them for CNNs, as the first paper proving that there is benefit in using Dropout for convolutional layers was in 2017: https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12
> I doubt what you're saying is true, that they're effectively the same.
I never said that.
raulkite OP t1_j6h7xth wrote
Reply to comment by Fourstrokeperro in M2 pro vs M2 max by raulkite
Explanation in source link. https://twitter.com/danielgross/status/1619417503561818112.
It’s a implementation using neural engine. And it’s near half A100 performance doing interference
florisjuh t1_j6h1ffh wrote
Reply to comment by Kuchenkiller in How can I start to study Deep learning? by Ill-Sprinkles9588
Probably good to accompany it with a more practical book (or courses) though, such as Sebastian Raschka's Machine Learning with PyTorch and scikit-learn or Francois Chollet's Deep Learning with Python (Keras/Tensorflow). Also I found Dive into Deep Learning https://d2l.ai to be a pretty nice resource to learn about more SOTA deep learning models and techniques.
Autogazer t1_j6h0bi5 wrote
Reply to comment by Severe-Improvement32 in If anyone know answer of my question, please tell me by Severe-Improvement32
That’s not how unsupervised training works. All training requires data, unsupervised just means that the data isn’t labeled.
XecutionStyle t1_j6ggq37 wrote
Reply to comment by suflaj in Why did the original ResNet paper not use dropout? by V1bicycle
BN is used to reduce covariate shift, it just happened to regularize. Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).
I doubt what you're saying is true, that they're effectively the same. Try putting one after the other to see the effect. Two drop-out layers or BN layers in contrast have no problem co-existing.
edit: sorry what I mean is the variants of drop-out that work with CNNs (that don't have detrimental effects) haven't existed then.
neuralbeans OP t1_j6nmccc wrote
Reply to comment by No_Cryptographer9806 in Best practice for capping a softmax by neuralbeans
It's for reinforcement learning to keep the model exploring possibilities.