Recent comments in /f/deeplearning

emad_eldeen t1_j5uvdma wrote

One way is to use data augmentation to increase the samples size.

The other way is also to use another dataset that can be available online with more samples, consider it as a source domain, and use it to train your CNN model. Then you can use either Transfer learning or semi-supervised domain adaptation to adapt the model to your target domain.

2

AtmarAtma t1_j5unz1i wrote

Is it a case that you have only 100 images of crack? Or is it you have 100 images with crack and without crack? For a similar problem microscratch or scratch, it is quite common to get only handful of defective wafers but plenty of wafers are available without that defect class.

1

ShadowStormDrift t1_j5ufflp wrote

With 100 images all data augmentation is going to give you is an overfit network.

You do not have enough images. Try get a few thousand then maybe you'll get results that aren't complete bullshit.

Speak to whoever is funding this. 100 images to solve a non trivial problem is a joke.

7

Practical_Square4577 t1_j5u1yxi wrote

Give it a try with data augmentation. (And don't forget to split you dataset into a train set and a test set).

For example flip and rotate will multiply your number of images by 12.

Create a black and white version will multiply by an extra factor of 2.

And then you can go with random crops, random rotations, random colour modifications, random shear, random scaling.

This will give you a potentially infinite amount of image variation.

You can also use dropout as part of your network to avoid overfiting.

And on top of that, remember that when working with convolutional neural networks, an image is not a single datapoint. Each pixel (and it's attached neighbourhood) is a datapoint, so you potentially have thousands of training sample per image depending on the receptive field of your CNN.

One thing to be careful about when designing you data augmentation pipeline is to make sure the chip / crack is visible after the cropping, so make sure to visually check what you feed into your network.

5

Zealousideal-Copy463 OP t1_j5tk24z wrote

Ohh, I didn't know that about GCP, so you can point a VM to a bucket and it just "reads" the data? you don't have to "upload" data into the VM?

As I said in a previous comment, my problem with AWS (S3 and Sagemaker), is that the data is in a different network, and even though is still an AWS network, you have to move data around and that takes a while (when it's 200 GB of data).

1

Zealousideal-Copy463 OP t1_j5tjusi wrote

Thanks for your comment! I have tried using ec2 and keeping data in EBS but not sure if it's the best solution, what is your workflow there?

I'm playing around mostly with NLP and image models. Right now I'm trying to process videos, like 200GB for a retrieval problem, what I do is: get frames, get feature vectors from pre trained resnet, and resnext (this takes a lot of time). And then I train a siamese network on all of those vectors. As I said I have tried with s3 and sagemaker, but I have to move data into sagemaker notebooks and I waste a lot of time there. Also tried to process stuff in ec2 but setting the whole thing took me a while (downloading data, installing libraries, creating scripts in the shell to process videos, etc).

1

Zealousideal-Copy463 OP t1_j5tj30h wrote

My first idea was a 3090, but I'm not based in the US, and getting a used GPU here is risky, it's easy to be scammed. A 4080 is around 2000$ here, 3090 new is 1800$, and a 4900 is 2500$. So I thought that if I decide to get a desktop, I should "just" go for the 4090 cause is 500-700$ more but I'd get double the speed than a 3090 and 8+ vram.

1

Zealousideal-Copy463 OP t1_j5tih5j wrote

Sorry, I wrote it in a hurry and now I realize it came out wrong.

What I meant is that in my experience dealing with: moving data between buckets/vms, uploading data, logging into a terminal via ssh or using notebooks that crash from time to time (Sagemaker is a bit buggy), or just training cloud models has some annoyances that are hard to avoid and make the whole experience horrible. So, maybe I should "just buy a good GPU" (4090 is a "good" deal where I live) and stop trying stuff around in the cloud.

1

FuB4R32 t1_j5tdt1c wrote

We use Google cloud buckets + tensorflow - it works well since you can always point a VM to a cloud bucket (e.g. tfrecords) and it just has access to the data. I know you can do something similar in Jax, haven't tried pytorch. It's the same in a Colab notebook. Not sure if you can point it to a cloud location from local machine though but as others are saying the 4090 might not be the best use case (e.g. you can use a TPU in a Colab notebook to get similar performance)

1

cyranix t1_j5s8lbz wrote

While this comment is getting a handful of downvotes (probably for its sarcastic tone), I do want to add something here: Personally, I think the best way to learn is by doing, and there are a lot of really great tutorials on things you can do with deep learning (yes, you can find them by doing a google search), however I found that I was really taxing my laptop trying to do some of the tutorials for instance from sentdex... BUT it turns out that Google has a research platform you can use for FREE that gives you access to GPUs and TPUs specifically for the purposes of doing ML tasks... So check out that channel, and then check out https://colab.research.google.com/ for a great platform to start putting your code together!

1

v2thegreat t1_j5s39fb wrote

It really depends on how often you think you'll train with the mode

If it's something that you'll do daily for at least 3 months, then I'd argue you can justify the 4090.

Otherwise, if this is a single model you want to play around with, then use an appropriate ec2 instance with gpus (remember: start with a small instance and then upgrade the instance as you need more compute, and remember to turn off your instance when you're not using it)

I don't really know what type of data you're playing around with (if it's image, text, or audio data for example), but you should be able to get pretty far without using a by doing small scale experiments and debugging, and then finally using a gpu for the final training

You can also use tensorflow datasets that have the ability to stream data from disk during training time, meaning that you won't need to store all of your files in memory during training, and be able to get away with a fairly decent computer.

Good luck!

7

suflaj t1_j5r5bfw wrote

For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.

Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.

2