Recent comments in /f/deeplearning
[deleted] t1_iyeob8t wrote
muchomuchacho t1_iydxu0y wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
Data streaming
somebodyenjoy OP t1_iydqmu0 wrote
Reply to comment by HiPattern in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
This is perfect, I won’t have to invest in additional RAM. Thanks for the tip!
HiPattern t1_iyd91t0 wrote
Reply to comment by somebodyenjoy in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
hdf5 files are quite nice for that. You can write your X / y datasets in chunks into the file. When you access a batch, then it will only read the part of the hdf5 file where the batch is.
​
You can also use multiple numpy files, e.g. one for each batch, and then handle the file management in the sequence generator.
Rishh3112 t1_iyd0opv wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I would suggest on splitting the dataset and saving the weights everytime you train one set and train the next set using the weights of the previous weights.
somebodyenjoy OP t1_iycyur2 wrote
Reply to comment by HiPattern in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I do the same using numpy files, but they only let me load the whole data which is too big in the first place. Tensorflow let’s us load in batches huh, I’ll look into this
HiPattern t1_iycypm1 wrote
Reply to comment by somebodyenjoy in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
You can preprocess, then write the data into a hdf5 file, and read the preprocessed data batch wise from the hdf5 file!
somebodyenjoy OP t1_iycwvf4 wrote
Reply to comment by HiPattern in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
Very interesting, but I wanted the model to preprocess the data only once. This way, it’ll preprocess at every epoch
robbsc t1_iycws54 wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
For tensorflow, you have to learn to use tensorflow datasets. https://www.tensorflow.org/datasets
You could also save your dataset as an hdf5 file using h5py, then use the tensorflow_io from_hdf5() to load your data. https://www.tensorflow.org/io
Hdf5 is the "traditional" (for lack of a better word) way of loading numpy data that is too big to fit in memory. The downside is that it is slow at random indexing, so people don't use it as much anymore for training networks.
Pytorch datasets are a little easier in my opinion.
HiPattern t1_iycvpfp wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
Write a generator that feeds the data in batches:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
somebodyenjoy OP t1_iycub7d wrote
Reply to comment by Ttttrrrroooowwww in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I haven’t heard of mem mapping, seems like something I should look into, thanks!
Ttttrrrroooowwww t1_iyctkhw wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
Normally your dataloader gets single samples from your dataset. Such as reading an image one by one. In that case RAM is never a problem.
If that is not an option for you (why I would not know), then numpy memmaps might be for you. Basically an array thats read from disk, not from RAM. I use it to handle arrays that are Billions of values.
IshanDandekar t1_iycnrjg wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
How big is your RAM? Maybe you can try cloud resources to get a better machine, leverage GPUs too if it is an image dataset
suflaj t1_iyclnwf wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
Images are loaded from disk, perhaps with some caching.
The most efficient simple solution would be to have workers that fill up a buffer that acts like a queue for data.
somebodyenjoy OP t1_iyclml5 wrote
Reply to comment by Alone_Bee_6221 in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I’ve changed the tuner class before. I should try this when I run into this issue
somebodyenjoy OP t1_iycl8b8 wrote
Reply to comment by incrediblediy in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I meant RAM. I know I can reduce the batch size for VRAM. I’ve solved problems by loading the whole dataset into the RAM and training it. But your answer is interesting as well
incrediblediy t1_iycjg6d wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
You can use your own preprocessing on top of keras preprocessing and data loader, or you can use a custom code for all together.
According to https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator ,
Deprecated: tf.keras.preprocessing.image.ImageDataGenerator is not recommended for new code. Prefer loading images with tf.keras.utils.image_dataset_from_directory and transforming the output tf.data.Dataset with preprocessing layers
You can do mini batch training depending on available VRAM, even with a batch size of 1. I assume you are referring to VRAM as RAM, as we hardly do deep learning with CPU for image datasets.
example: you can use data_augmentation pipeline step to have control over preprocessing like this (I used this code with older TF version (2.4.0 or 2.9.0.dev may be) and might need to change function locations for new version as above)
train_ds = tensorflow.keras.preprocessing.image_dataset_from_directory(
image_directory,
labels='inferred',
label_mode='int',
class_names=classify_names,
validation_split=0.3,
subset="training",
shuffle=shuffle_value,
seed=seed_value,
image_size=image_size,
batch_size=batch_size,
)
data_augmentation = tensorflow.keras.Sequential(
[
tensorflow.keras.layers.experimental.preprocessing.RandomFlip("horizontal"),
tensorflow.keras.layers.experimental.preprocessing.RandomRotation(0.1),
]
)
augmented_train_ds = train_ds.map( lambda x, y: (data_augmentation(x, training=True), y))
Alone_Bee_6221 t1_iycimeo wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I would probably suggest splitting into chunks of data or you could try to implement you own dataset class to load images lazily.
Difficult-Race-1188 OP t1_iyc8451 wrote
Reply to comment by BrotherAmazing in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
https://arxiv.org/pdf/2210.05189.pdf
Read this paper, it's been proven that neural networks are decision trees, not a mere approximation but precisely that only. 3rd line in the abstract.
suflaj t1_iyc4hvj wrote
Reply to comment by majinLawliet2 in Building ResNet for Tabular Data Regression Problem by eternalmathstudent
I know, however try finding a resnet pretrained on something other than a CV dataset.
Redditagonist t1_iybkfcc wrote
Friends don’t let friends use deep learning on tabular data
rjog74 t1_iybbf02 wrote
Reply to comment by eternalmathstudent in Building ResNet for Tabular Data Regression Problem by eternalmathstudent
Any particular reason why resnet only and looking for general purpose residual blocks
carbocation t1_iybb5a8 wrote
Reply to comment by eternalmathstudent in Building ResNet for Tabular Data Regression Problem by eternalmathstudent
Yes, which is why I think you’ll find that link of particular interest since they comment on it (and attention).
majinLawliet2 t1_iyb9sec wrote
You need to understand why you want to use resnet architecture. The key reason for using resnet is that as NNs get deeper the inputs to each successive layers become smaller and smaller. This can be circumvented by adding the input data to the output of some layers later.
So in the case of tabular data you need to see why you want a NN and if it is the only thing that works for you. Next question is whether you want to have a very deep NN necessarily. What if you transformed the inputs? If yes, then you should be able to build a skip connections trivially.
BrotherAmazing t1_iyeq8zq wrote
Reply to comment by Difficult-Race-1188 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
Interesting—I will have a read when I have time to read and check the math/logic. Thanks!
I do think I am allowed to remain skeptical for now because this was just posted as a pre-print with a single author a month ago and has not been vetted by the community.
Besides, if there is an equivalence between recurrent neural networks, convolutional neural networks, fully connected networks, policies learned with deep reinforcement learning, and all of this regardless of the architecture, how the network is trained, and so on, and there always exists a decision tree that is equivalent, then I would say:
Very interesting
Decision trees are then more flexible and powerful than we give them credit for, not NNs are less flexible and less powerful than they have been proven to be.
What is it about decision trees that makes people not use them in practice for anything too complicated on full motion video, etc? How does one construct the decision tree “from scratch” via training except by training the NN first, then building a decision tree that represents the NN? I wouldn’t say “they’re the same” from an engineering and practical point of view if one can be trained efficiently and the other cannot, but can only be built once the trained NN already exists.