Recent comments in /f/deeplearning
eternalmathstudent OP t1_iyb3ny8 wrote
Reply to comment by carbocation in Building ResNet for Tabular Data Regression Problem by eternalmathstudent
I did not want to use resnet as is, I'm not requiring the convolutional layer itself. I'm looking for general purpose residual blocks with skip connections
carbocation t1_iyb3f36 wrote
While convolution is a bit funky with tabular data (what locality are you exploiting?), I think that attention is a mechanism that might make sense in the deep learning context for tabular data. For example, take a look at recent work such as https://openreview.net/forum?id=i_Q1yrOegLY (code and PDF linked from there).
BrotherAmazing t1_iyazrs4 wrote
Reply to comment by Difficult-Race-1188 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
Again, they can approximate any function or algorithm. This is proven mathematically.
Just because people are confounded by examples of DNNs that don’t seem to do what they want them to do, and just because people do not yet understand how to construct DNNs that exist that can indeed do these things does not mean they are “dumb” or limited.
Perhaps you are constructing them wrong. Perhaps the engineers are the dumb ones? 🤷🏼
Sometimes people literally argue, just with plain english and not mathematics, that basic mathematically proven concepts are not true.
If you had a mathematical proof that showed DNNs were equivalent to decision trees or incapable of performing certain tasks, with a mathematical proof, neat! If you argue DNNs can’t perform tasks that can be reduced to functions or algorithms though, and do it in mere language without mathematical proofs, I’m not impressed yet!
Difficult-Race-1188 OP t1_iyaxhbe wrote
Reply to comment by BrotherAmazing in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
The argument goes much further, NNs are not exactly learning the data distribution. If they had, the affine transformation problem would have been already taken care of, there would have been no need for data augmentation by rotating or flipping. Also approximating any algorithm doesn't necessarily mean the underlying data is following a distribution made out of any known algorithm. Also, Neural network struggle even to learn simple mathematical functions, all they do in the approximation is make piecewise assumptions of algorithms.
Here's the grokking paper review that told that NN couldn't generalize to this equation:
x³ + xy² + y (mod 97)
Article: https://medium.com/p/9dbbec1055ae
Original paper: https://arxiv.org/abs/2201.02177
BrotherAmazing t1_iyaux7r wrote
A deep neural network can approximate and function.
A deep recurrent neural network can approximate any algorithm.
The are mathematically proven facts. Can the same be said about “a bunch of decision trees in hyperspace”? If so, then I would say “a bunch of decision trees in hyperspace” are pretty darn powerful, as are deep neural networks. If not, then I would say the author has made a logical error somewhere along the way in his very qualitative reasoning. Plenty of thought experiments in language with “bulletproof” arguments have led to “contradictions” in the past, only for a subtle logical error to be unveiled when we stop using language and start using mathematics.
Difficult-Race-1188 OP t1_iya8hg7 wrote
Reply to comment by Creepy_Disco_Spider in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
I've tried adding information from a lot of other resources. Not just one paper. And all of them are mentioned in the article.
VinnyVeritas t1_iya34at wrote
Correlation is not causation.
Creepy_Disco_Spider t1_iya0x5i wrote
You can just cite the original paper
Youness_Elbrag t1_iy9lq76 wrote
I think that NN is general structure algorithm can learn anything from data depends on problem and it can approximate between data distributions , such automa NN is good example
freaky1310 t1_iy91uxy wrote
Reply to comment by Salt-Improvement9604 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
TL;DR: Each model tries to solve the problems that affect the current state-of-the-art model.
Theoretically, yes. Practically, definitely not.
I’ll try to explain myself, please let me know if something I say is not clear. The whole point of training NNs is to find an approximator that could provide correct answers to our questions, given our data. The different architectures that have been designed through the years address different problems.
Namely, CNNs addressed the curse of dimensionality: using MLPs and similar architecture wouldn’t scale on “large” images (large means larger than 64x64) because the number of connections would increase exponentially on the number of neurons of each layer. Therefore, convolution has been found to provide a nice approximation of aggregated pixels (called “features” from now on) and CNNs were born.
After that, expressiveness has been a problem: for example, stacking too many convolutions would erase too much information on one side, and significantly decrease inference time on the other side. To address this, researchers have found recurrent units to be useful for retaining lost information and propagate it through the network. Et voilá, RNNs are born.
Long story short: each different type of architecture was born to solve the problems of another kind of models, while introducing new issues and limitations at the same time.
So, to go back to your first question: can NNs approximate everything? Not everything everything, but a “wide variety of interesting functions”. In practice, they can try to approximate everything that you will need to, even though some limitation will always stay there.
suflaj t1_iy8xun6 wrote
If you have tabular data, the solution is to use XGBoost. Resnets are pretrained on imagenet, meaning if you need it pretrained for any other task, you'll have to do it yourself. I do not see how the task would benefit from a ResNet.
BellyDancerUrgot t1_iy8x636 wrote
Wonder why if so NNs are so much better on unstructured data while being trickier and in general more useless on structured data compared to tree based classifiers and boosted classifiers.
pornthrowaway42069l t1_iy8srkr wrote
Reply to comment by Sadness24_7 in Keras metrics and losses by Sadness24_7
Ah, I see. During training, the loss and metrics you see are actually moving averages, not exact losses/metrics in that epoch. I can't find the documentation rn, but I know I seen it before. What this means is that losses/metrics during training won't be a "good" gauge to compare with, since they include information from previous epochs.
Salt-Improvement9604 t1_iy8rho5 wrote
Reply to comment by hp2304 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
>Any ML classifier or regressor is basically a function approximator.
so what is the point of designing all these different learning algorithms?
All we need is more data and even the simplest linear model with interaction terms will be enough?
Sadness24_7 OP t1_iy8qtwh wrote
Reply to comment by pornthrowaway42069l in Keras metrics and losses by Sadness24_7
i did write my own metric based on examples from keras. But since i have to do it using callbacks and their backend, it works only on one output at a time meaning both predictions and true values are vector.
What i meant by that is that when i call model.fit(....) i tells me at each epoch something like this:
Epoch 1/500
63/63 [==============================] - 1s 6ms/step - loss: 4171.9570 - root_mean_squared_error: 42.4592 - val_loss: 2544.3647 - val_root_mean_squared_error: 44.4907
​
where root_mean_squared_error is a custom metric as follow.
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
which when called directly wants data in a for of vector, meaning this function has to be called for each output separatly.
In order to better optimize my model i need to understand how the losses/metrics are calculated so it results in one number (as shown above during training)
hp2304 t1_iy8lyyf wrote
Any ML classifier or regressor is basically a function approximator.
The function space isn't continuous but rather discrete, discretized by the dataset points. Hence, increasing the size of dataset can help in increasing overall accuracy. This is relatable with Nyquist criterion. Less data and its more likely our approximation is wrong. Given the dimensions of input space and range of each input variable, the dataset size is nothing. E.g. for 224x224 rgb input image, input space has total pow(256, 224x224x3) possible input values, which is unimaginably large number, mapping each to a correct class label (total 1000 classes) is very difficult for any approximator. Hence, one can never get 100% accuracy.
RichardBJ1 t1_iy7xvbr wrote
Was interested when I first head about this concept. People seemed to respond with either thinking it was ground shaking, …..or alternatively that it stood to reason that given enough splits it would be the case! Do you think though, that from a practical usage perspective this doesn’t help much because there are so many decisions…. Article has a lot more than just that though and a nice provocative title.
Difficult-Race-1188 OP t1_iy7pdev wrote
Reply to comment by freaky1310 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
So the paper which talks about the Spline theory of DL says that even in latent representation NN are incapable of interpolation and that's a very important thing to know about. If we know this then we can design loss functions that works to better understand the global manifold structures.
Difficult-Race-1188 OP t1_iy7p779 wrote
Reply to comment by ivan_kudryavtsev in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
It might behave in a similar fashion to DT, but DT doesn't make abstract feature representation and that is something important.
Difficult-Race-1188 OP t1_iy7p4ap wrote
Reply to comment by xtof54 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
It does, if we know that NN behaves like DT then we can design new loss functions that take the internal structure into account. One of the research areas in this regard is Lipschitz Regularization. Adding such regularization makes NN behave more smoothly.
ivan_kudryavtsev t1_iy7ncq9 wrote
Maybe, a decision tree is an example of a NN? I mean that NN is more generic structure because it may include an arbitrary Neuron design and custom layers design?
xtof54 t1_iy7kgtw wrote
Researchers know that, but it does not help in any way to better understand DNN. A bunch of DT is not more explainable than a DNN
freaky1310 t1_iy7ielr wrote
Thanks for pointing out the article, it’s going to be useful for a lot of people.
Anyway, when we refer to the “black box” nature of DNNs we don’t mean “we don’t know what’s going on”, but rather “we know exactly what’s going on in theory, but there are so many simple calculations that it’s impossible for a human being to keep track of them”. Just think of a simple ConvNet for MNIST classification like AlexNet: it has ~62M parameters, meaning that all the simple calculations (gradients update and whatnot) are performed A LOT of times in a single backward pass.
Also, DNNs often work with a latent representation, which adds another layer of abstraction for the user: the “reasoning” part happens in a latent space that we don’t know anything about, except some of its properties (and again, if we make the calculations we actually do know exactly what it is, it’s just unfeasible to do them).
To address these points, several research projects have focused on network interpretability, that is, finding ways of making sense of NNs’ reasoning process. Here’s a review written in 2021 regarding this.
Difficult-Race-1188 OP t1_iy7gm4d wrote
Reply to comment by jazzzzzzzzzzzzzzzy in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
This paper is a bit short, I've drawn conclusions from multiple papers like Spline's theory of Deep learning, why adversarial attacks exist, and the interpolation/extrapolation regime of Neural Nets.
majinLawliet2 t1_iyb97za wrote
Reply to comment by suflaj in Building ResNet for Tabular Data Regression Problem by eternalmathstudent
Resnet is just the architecture. It has nothing to do with pertaining data. You can have skip connections with linear layers as well.