Submitted by neuralbeans t3_10puvih in deeplearning
I'd like to train a neural network where the softmax output has a minimum possible probability. During training, none of the probabilities should go below this minimum. Basically I want to avoid the logits from becoming too different from each other so that none of the output categories are ever completely excluded in a prediction, a sort of smoothing. What's the best way to do this during training?
FastestLearner t1_j6mhjd2 wrote
Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.
For example, if current min logit =
mand allowed minimum =u, current max logit =nand allowed maximum =v, then the following loss function should help:Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)The
maxterms ensure that no loss is added when the logits are all within the allowed range. Uselamba1andlambda2to scale each term so that they roughly match theCE lossin strength.