Recent comments in /f/deeplearning

chatterbox272 t1_j6myph4 wrote

If the goal is to keep all predictions above a floor, the easiest way is to make the activation into floor + (1 - floor * num_logits) * softmax(logits). This doesn't have any material impact on the model, but it imposes a floor.

If the goal is to actually change something about how the predictions are made, then adding a floor isn't going to be the solution though. You could modify the activation function some other way (e.g. by scaling the logits, normalising them, etc.), or you could impose a loss penalty for the difference between the logits or the final predictions.

1

FastestLearner t1_j6mhjd2 wrote

Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.

For example, if current min logit = m and allowed minimum = u, current max logit = n and allowed maximum = v, then the following loss function should help:

Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)

The max terms ensure that no loss is added when the logits are all within the allowed range. Use lamba1 and lambda2 to scale each term so that they roughly match the CE loss in strength.

5

hugio55 OP t1_j6jhyb3 wrote

Hey LiquidDinosaur - thanks for this info. I have been hearing about open AI quite a bit and thus will dive in deep into what they have to offer. I will say that anything C++ (or C, or A through Z for that matter) will be beyond my breadth, but that's OK. I still enjoy watching it happen from some of the pros on youtube.

1

suflaj t1_j6hfkdj wrote

> BN is used to reduce covariate shift, it just happened to regularize.

The first part was hypothesized, but not proven. It is a popular belief, like all other hypotheses why BN works so well.

> Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

What does becoming big mean? Dropout was introduced in 2012 and used ever since. It was never big in the sense that you would always use it.

It is certainly false that Dropout was used because of ResNets or immediately after them for CNNs, as the first paper proving that there is benefit in using Dropout for convolutional layers was in 2017: https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12

> I doubt what you're saying is true, that they're effectively the same.

I never said that.

0

florisjuh t1_j6h1ffh wrote

Probably good to accompany it with a more practical book (or courses) though, such as Sebastian Raschka's Machine Learning with PyTorch and scikit-learn or Francois Chollet's Deep Learning with Python (Keras/Tensorflow). Also I found Dive into Deep Learning https://d2l.ai to be a pretty nice resource to learn about more SOTA deep learning models and techniques.

2

XecutionStyle t1_j6ggq37 wrote

BN is used to reduce covariate shift, it just happened to regularize. Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

I doubt what you're saying is true, that they're effectively the same. Try putting one after the other to see the effect. Two drop-out layers or BN layers in contrast have no problem co-existing.

edit: sorry what I mean is the variants of drop-out that work with CNNs (that don't have detrimental effects) haven't existed then.

1