Recent comments in /f/deeplearning
CuriousCesarr OP t1_j5utezn wrote
Reply to comment by Lankyie in Looking for someone with good NN/ deep learning experience for a paid project by CuriousCesarr
I'll keep you up to date once things get more momentum. ;)
AtmarAtma t1_j5unz1i wrote
Reply to Classify dataset with only 100 images? by Murii_
Is it a case that you have only 100 images of crack? Or is it you have 100 images with crack and without crack? For a similar problem microscratch or scratch, it is quite common to get only handful of defective wafers but plenty of wafers are available without that defect class.
ShadowStormDrift t1_j5ufflp wrote
Reply to Classify dataset with only 100 images? by Murii_
With 100 images all data augmentation is going to give you is an overfit network.
You do not have enough images. Try get a few thousand then maybe you'll get results that aren't complete bullshit.
Speak to whoever is funding this. 100 images to solve a non trivial problem is a joke.
NinjaUnlikely6343 OP t1_j5udl0q wrote
Reply to comment by PsecretPseudonym in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
Makes sense!
Murii_ OP t1_j5ubusu wrote
Reply to comment by Practical_Square4577 in Classify dataset with only 100 images? by Murii_
I love you <3 Thank you!
PsecretPseudonym t1_j5ub8bd wrote
Reply to comment by NinjaUnlikely6343 in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
> “Didn't know you could use a portion of the dataset and expect approximately what you'd get with the whole set”
See: sampling
Practical_Square4577 t1_j5u1yxi wrote
Reply to Classify dataset with only 100 images? by Murii_
Give it a try with data augmentation. (And don't forget to split you dataset into a train set and a test set).
For example flip and rotate will multiply your number of images by 12.
Create a black and white version will multiply by an extra factor of 2.
And then you can go with random crops, random rotations, random colour modifications, random shear, random scaling.
This will give you a potentially infinite amount of image variation.
You can also use dropout as part of your network to avoid overfiting.
And on top of that, remember that when working with convolutional neural networks, an image is not a single datapoint. Each pixel (and it's attached neighbourhood) is a datapoint, so you potentially have thousands of training sample per image depending on the receptive field of your CNN.
One thing to be careful about when designing you data augmentation pipeline is to make sure the chip / crack is visible after the cropping, so make sure to visually check what you feed into your network.
EnlightenMePlss t1_j5tzjo3 wrote
Check out the relevant kaggle competitions and study the most popular notebooks for it:
https://www.kaggle.com/c/tumor-diagnosis
https://www.kaggle.com/c/rsna-miccai-brain-tumor-radiogenomic-classification
Also checkout the other RSNA competitions:
https://www.kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection
https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge
https://www.kaggle.com/competitions/rsna-str-pulmonary-embolism-detection
https://www.kaggle.com/competitions/rsna-intracranial-hemorrhage-detection
https://www.kaggle.com/competitions/rsna-miccai-brain-tumor-radiogenomic-classification
https://www.kaggle.com/competitions/rsna-breast-cancer-detection
NinjaUnlikely6343 OP t1_j5tmucg wrote
Reply to comment by thatpretzelife in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
Thanks for the advice! I'm actually already SSH tunneling to the immense computing resources at Compute Canada. It's still extremely long haha
Zealousideal-Copy463 OP t1_j5tk24z wrote
Reply to comment by FuB4R32 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Ohh, I didn't know that about GCP, so you can point a VM to a bucket and it just "reads" the data? you don't have to "upload" data into the VM?
As I said in a previous comment, my problem with AWS (S3 and Sagemaker), is that the data is in a different network, and even though is still an AWS network, you have to move data around and that takes a while (when it's 200 GB of data).
Zealousideal-Copy463 OP t1_j5tjusi wrote
Reply to comment by v2thegreat in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Thanks for your comment! I have tried using ec2 and keeping data in EBS but not sure if it's the best solution, what is your workflow there?
I'm playing around mostly with NLP and image models. Right now I'm trying to process videos, like 200GB for a retrieval problem, what I do is: get frames, get feature vectors from pre trained resnet, and resnext (this takes a lot of time). And then I train a siamese network on all of those vectors. As I said I have tried with s3 and sagemaker, but I have to move data into sagemaker notebooks and I waste a lot of time there. Also tried to process stuff in ec2 but setting the whole thing took me a while (downloading data, installing libraries, creating scripts in the shell to process videos, etc).
Zealousideal-Copy463 OP t1_j5tj30h wrote
Reply to comment by incrediblediy in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
My first idea was a 3090, but I'm not based in the US, and getting a used GPU here is risky, it's easy to be scammed. A 4080 is around 2000$ here, 3090 new is 1800$, and a 4900 is 2500$. So I thought that if I decide to get a desktop, I should "just" go for the 4090 cause is 500-700$ more but I'd get double the speed than a 3090 and 8+ vram.
Zealousideal-Copy463 OP t1_j5tih5j wrote
Reply to comment by agentfuzzy999 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Sorry, I wrote it in a hurry and now I realize it came out wrong.
What I meant is that in my experience dealing with: moving data between buckets/vms, uploading data, logging into a terminal via ssh or using notebooks that crash from time to time (Sagemaker is a bit buggy), or just training cloud models has some annoyances that are hard to avoid and make the whole experience horrible. So, maybe I should "just buy a good GPU" (4090 is a "good" deal where I live) and stop trying stuff around in the cloud.
FuB4R32 t1_j5tdt1c wrote
We use Google cloud buckets + tensorflow - it works well since you can always point a VM to a cloud bucket (e.g. tfrecords) and it just has access to the data. I know you can do something similar in Jax, haven't tried pytorch. It's the same in a Colab notebook. Not sure if you can point it to a cloud location from local machine though but as others are saying the 4090 might not be the best use case (e.g. you can use a TPU in a Colab notebook to get similar performance)
ChingBlue t1_j5tdh79 wrote
Off the top of my head you can either use Grid Search to test hyperparam combinations, Random Search to randomize hyperparams and Neural search uses ML to optimize hyperparameter tuning. You can use finetuners for this as well.
thatpretzelife t1_j5sy10d wrote
As another option, if you haven’t already tried look into a cloud computing solution. For me it cut my image processing uni assignment down from a couple hours to process to a minute. Google colab’s free or use something like Paperspace which you pay ~8USD but is much faster
like_a_tensor t1_j5st0c1 wrote
Reply to comment by agentfuzzy999 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Even bigger gains with fp8 support in CUDA 12.
cyranix t1_j5s8lbz wrote
Reply to comment by JJJJJJtti in What are the best ways to learn about deep learning? by Tureep
While this comment is getting a handful of downvotes (probably for its sarcastic tone), I do want to add something here: Personally, I think the best way to learn is by doing, and there are a lot of really great tutorials on things you can do with deep learning (yes, you can find them by doing a google search), however I found that I was really taxing my laptop trying to do some of the tutorials for instance from sentdex... BUT it turns out that Google has a research platform you can use for FREE that gives you access to GPUs and TPUs specifically for the purposes of doing ML tasks... So check out that channel, and then check out https://colab.research.google.com/ for a great platform to start putting your code together!
v2thegreat t1_j5s39fb wrote
It really depends on how often you think you'll train with the mode
If it's something that you'll do daily for at least 3 months, then I'd argue you can justify the 4090.
Otherwise, if this is a single model you want to play around with, then use an appropriate ec2 instance with gpus (remember: start with a small instance and then upgrade the instance as you need more compute, and remember to turn off your instance when you're not using it)
I don't really know what type of data you're playing around with (if it's image, text, or audio data for example), but you should be able to get pretty far without using a by doing small scale experiments and debugging, and then finally using a gpu for the final training
You can also use tensorflow datasets that have the ability to stream data from disk during training time, meaning that you won't need to store all of your files in memory during training, and be able to get away with a fairly decent computer.
Good luck!
incrediblediy t1_j5rokou wrote
Reply to comment by agentfuzzy999 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
or even try to get an used 3090. if OP can afford 4090, just go with that.
Final-Rush759 t1_j5rkm71 wrote
The best way to learn is to go through tutorials from various source just to get a feel of what's like to do ML and DL. Then go through the theory including data pipeline, target, models, loss function, gradient, optimizer etc.
NinjaUnlikely6343 OP t1_j5rjhvi wrote
Reply to comment by suflaj in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
Thanks a lot! I'll try that and keep you posted
suflaj t1_j5r5bfw wrote
Reply to comment by NinjaUnlikely6343 in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.
Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.
NinjaUnlikely6343 OP t1_j5r4d3x wrote
Reply to comment by suflaj in Efficient way to tune a network by changing hyperparameters? by NinjaUnlikely6343
Thanks a lot for the detailed response! Didn't know you could use a portion of the dataset and expect approximately what you'd get with the whole set. I'm currently just testing different learning rates, but I thought about having a go at dropout rate as well.
emad_eldeen t1_j5uvdma wrote
Reply to Classify dataset with only 100 images? by Murii_
One way is to use data augmentation to increase the samples size.
The other way is also to use another dataset that can be available online with more samples, consider it as a source domain, and use it to train your CNN model. Then you can use either Transfer learning or semi-supervised domain adaptation to adapt the model to your target domain.