Recent comments in /f/deeplearning
[deleted] t1_ja8d3hh wrote
Reply to comment by trajo123 in Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
[deleted]
alam-ai t1_ja8czf5 wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.
Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.
trajo123 t1_ja8cyw2 wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?
yannbouteiller t1_ja8cd7n wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?
FunBit9789 t1_ja8aim7 wrote
Reply to comment by augusts99 in Implementation of RNN as post-processing by augusts99
Ah ok so if it were also a NN you could have outputs feed directly from model to the other with multiple heads but with a random forest as the first model my suggestion doesn’t really make sense.
augusts99 OP t1_ja7jo91 wrote
Reply to comment by thehallmarkcard in Implementation of RNN as post-processing by augusts99
Okay thank you for the feedback! That could be interesting! Model 1 is a Random Forest model and uses different input than the LSTM, and at the moment I think for my skill level it may be a too big of a hassle to make the models predict simultaneously. Or what is meant with stacking the models if I may ask?
nibbajenkem t1_ja7d93f wrote
Reply to comment by JJ_00ne in How would you approach this task? by JJ_00ne
What I mean is it doesn't make sense to use deep learning here.
JJ_00ne OP t1_ja7087f wrote
Reply to comment by nibbajenkem in How would you approach this task? by JJ_00ne
Yes it's more a way to exercise rather than a real necessity
JJ_00ne OP t1_ja706gr wrote
Reply to comment by usesbinkvideo in How would you approach this task? by JJ_00ne
Basically it's an "To do what you want to do, do what you want to do"
usesbinkvideo t1_ja6stdw wrote
Reply to How would you approach this task? by JJ_00ne
Here let me ChatGPT this for you:
Cost Function: Since you want to create an organization chart that meets specific criteria, such as having at least one male and one female employee in each sector, you could use a custom cost function that takes these criteria into account. One option could be to penalize the model heavily for each violation of these criteria. For example, you could add a large penalty to the cost function if a sector does not have at least one male and one female employee.
Activation Function: The choice of activation function depends on the structure of your model and the specific problem you're trying to solve. Since you have a binary classification problem (assigning each worker to a sector), you could use the sigmoid activation function for the output layer to produce a probability score for each sector. The input layer and hidden layers could use the ReLU activation function, which has been shown to work well in many types of neural networks.
Setting Minimum Employees Based on Sex: You mentioned that each sector requires at least one male and one female employee. You could enforce this requirement by adding constraints to the model. For example, you could use a custom constraint that checks the number of male and female employees in each sector after each batch and enforces the requirement that each sector has at least one male and one female employee. This would ensure that your model meets the specific requirements of your problem.
I_will_delete_myself t1_ja6e9do wrote
Dude this is like saying you are trying to build a ship with a rubber duckey. Use the right tools.
thehallmarkcard t1_ja604kn wrote
Reply to comment by augusts99 in Implementation of RNN as post-processing by augusts99
So with no other info on your methodology I can’t think of any issue with this. In some sense you’re RNN may be modeling the trend component and the other model measuring the volatility. But that’s hard to say not knowing any more. I am curious if you tried stacking the models directly such that the weights optimize through both models simultaneously. But that depends what kind of models you have and isn’t necessarily better just different.
augusts99 OP t1_ja5y2rt wrote
Reply to comment by thehallmarkcard in Implementation of RNN as post-processing by augusts99
Yeah! Currently what I do is that Model 1 makes predictions based on certain input features making predictions timestep for timestep. The LSTM model uses the predicted sequence together with other variable sequences to make the predictions more robust and stable, as well as more making it have more correct trends. Atleast, that is the idea.
thehallmarkcard t1_ja5shr4 wrote
Reply to Implementation of RNN as post-processing by augusts99
Am I understanding correctly that you train one model from input features to output minimizing the error to the true output then take the predictions of this first model and feed it into the RNN with other features and again minimize the loss to the true output?
nibbajenkem t1_ja5568v wrote
Reply to How would you approach this task? by JJ_00ne
Doesnt seem like anything you need deep learning for
Schlonksi t1_ja4v96x wrote
>im doing a convolutional Neural Newtork in R code
Why on earth would you do that? Why are you even implementing your own new if you're just using a vanilla CNN?
jazzzzzzzzzzzzzzzy t1_ja3yg3d wrote
Why R? It seems you want to make things hard for yourself. Just use PyTorch.
bigfoot1144 t1_ja3xywq wrote
Why are you doing it in R? Not saying you shouldn't but it's much harder than in python. If you're doing it as a learning exercise that is fair enough.
augusts99 OP t1_ja3n6aa wrote
Reply to Implementation of RNN as post-processing by augusts99
Perhaps I should elaborate that the predicted sequence made by model 1 is not the only sequence of the LSTM model. I also use different variable sequences for which I hope the LSTM uses these to understand the correct trends.
trajo123 t1_ja3lwj9 wrote
The architecture depends on what task you want to solve: classification, semantic segmentation, detection/localization?
On another note, by choosing to do deep learning on image-like data such as MRI in R you are making your job more difficult from the get-go as there are many more tool and documentation resources available for Python.
[deleted] OP t1_ja32trh wrote
Reply to comment by PaleontologistDue620 in My Neural Net is stuck, I've run out of ideas by [deleted]
Another update, I am reading the first yolo paper:
>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.
Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..
Some-Assistance-7812 t1_j9yb5oh wrote
Reply to comment by AbCi16 in Using Jupyter via GPU by AbCi16
Sure, let me know!
AnDaoLe t1_j9ul0cf wrote
There's a bunch of papers that show large neural networks are actually just memorizing data as well
suflaj t1_j9sxn6g wrote
Reply to comment by Dropkickmurph512 in Why bigger transformer models are better learners? by begooboi
You say it doesn't help, yet double descent says otherwise. You do not early stop transformer models the way you do with other models, outside of maybe finetuning on a similar task.
But pretraining - no way. Big transformers are trained by setting some hyperparameters, and then checking them out the next day. If the model learned something, you keep on doing that, and if it diverged you load the last good checkpoint, change the hyperparameters and train with that.
Early stopping would imply that ypu're confident your hyperparameters are good and that you have a general idea of how long training will take and how much it can learn. For big transformers, neither is the case.
Oceanboi t1_ja8egtc wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
It could be too much dropout. But also how large is your test data in relation to your train data and are you leaking any information from one into the other?