Recent comments in /f/deeplearning

Internal-Diet-514 t1_itwdhg2 wrote

To start I’d down sample the number of images that don’t have any mass in them (or upsample the ones with mass) for the training data while keeping an even balance in the test/ validation. Others have said above that the loss functions is better suited to see an even representation. This is an easy way to do it without writing a custom data loader and you can see if that’s the problem before diving deeper.

2

Yeinstein20 t1_ituv2bx wrote

Could you give a few more details on what kind of images you have, what you are trying to segment, your model...? Are you calculating you dice score and dice loss on foreground and background? Its usually a good idea to calculate it on the foreground and if you have more than one foreground class take the mean. That should already help a lot with class imbalance. Also I would add cross entropy or focal loss in addition to dice loss, that's something I have found to work well in general. You can also modify your data loader such that it will oversample foreground during training (say you have a batch size of 2 and force that at least one image has foreground). It's probably also a good idea to find a good baseline to compare against so you get a better idea how your performance is.

3

Deep_Quarter t1_ittr5p4 wrote

Hey, what you are trying is a form of sample weighting. It basically says data imbalance is the loss functions problem.

What you need to do is write a better data loader. Make sure that the imbalance is handled at the data loader by customising it to load batches that are balanced. Easier said than done i know, but this is where concepts like sampling and class weighting come in.

Second thing you can do is to train on a smaller resolution. A proper data pipeline paired with a good loss function like dice or tversky or focal loss can help you get a benchmark from which to improve on. Just search segmentation loss in github.

Lastly, you can reframe the problem to something simple like box regression or heatmap. This helps if the mask region is relatively larger or smaller compared to the input resolution.

4

pornthrowaway42069l t1_itsbufj wrote

I'd try some baseline/simpler models on the same data and see how it performs. Maybe the model just can't do any better, that's always a good one to check before panicking.

You can also try to use K-means or DBSCAN or something like that, and try to get 2 clusters of results - see if those algos can segment your data better than your network. If so, maybe the network is set up incorrectly somehow, if not, maybe something funky happening to your data in pipeline.

2

beingsubmitted t1_itht9o5 wrote

Reply to Two GAN's by manli29

I don't see how you would train them that way - you can't use the output of a discriminator as the input of a generator - that wouldn't get you what you want. You could train them in parallel, one network and discriminator doing only b&w restoration, and the other doing only colorization.

The way images work and the eye (part of the science behind why jpeg is so useful) is that we're much more sensitive to luminance information than color information. You could take the output of colorized image in hsl color space and replace the luminance with that of the generated restored photo. Doing it this way, you could force the separation of two generators using only one discriminator, as well - one generator only affecting the hue and saturation of the final image, and the other only affecting the luminance.

That said, with the more recent breakthroughs, it seems that networks are proving more successful as generalists than specialists. For example, it's believed that whisper performs better on each language because it's trained on all languages, as counter-intuitive as it may seem.

1

kaarrrlll t1_itfyuqj wrote

Reply to comment by manli29 in Two GAN's by manli29

Having two gans is not a problem. Although for different purpose it exists since a long time (cyclegan). What's important is that your loss and/or other constraints must be precise to make sure to avoid one gan learns both and other learns identity. It's also double the concerns for instability during training. Good luck!

2

Yeinstein20 t1_itfpan2 wrote

Reply to Two GAN's by manli29

I feel like I've read a paper where they do something similar to this but I'm not completely sure. I'll try finding it.

Edit: maybe remind me of that in case I forget about it

1

TheRealSerdra t1_itfiwui wrote

Reply to Two GAN's by manli29

What exactly do you want to do that requires two GANs? And are you planning on just chaining the generators?

1

suflaj t1_it686mk wrote

This would depend on whether or not you believe newer noisy data is more important. I would not use it generally because it's not something you can guarantee on all data and would have to be theoretically confirmed beforehand, which might be impossible given a task.

If I wanted to reduce the noisiness of pseudo-labels I would not want to introduce additional biases on the data itself, so I'd rather do sample selection, which seems to be what the newest papers suggest to do. Weight averaging is introducing biases akin to what weight normalization techniques did, which were partially abandoned in favour of different approaches, ex. larger batch sizes, because they proved to be more robust and performant in practice as we got models more different than the ML baselines we based our findings on.

Now, if I wasn't aware of papers that came out this year, maybe I wouldn't be saying this. That's why I recommended you stick to newer papers, becuase problems are never really fully solved and newer solutions tend to make bigger strides than optimizing older ones.

1

suflaj t1_it66q7y wrote

Reply to comment by Lee8846 in EMA / SWA / SAM by Ttttrrrroooowwww

While it is true that the age of a method does not determine its value, the older a method is, the more likely the performance gains you get are surpassed by some other method or model.

Specifically I do not see why I would use any weight averaging over a better model or training technique.

> In this case, an ensemble of models might not help.

Because you'd just use a bigger batch size

1