Oceanboi t1_ja8egtc wrote on February 27, 2023 at 4:43 PM

Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919

It could be too much dropout. But also how large is your test data in relation to your train data and are you leaking any information from one into the other?

[deleted] t1_ja8d3hh wrote on February 27, 2023 at 4:35 PM

Reply to comment by trajo123 in Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919

[deleted]

alam-ai t1_ja8czf5 wrote on February 27, 2023 at 4:34 PM

Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919

Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.

Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.

trajo123 t1_ja8cyw2 wrote on February 27, 2023 at 4:34 PM

Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919

How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?

yannbouteiller t1_ja8cd7n wrote on February 27, 2023 at 4:30 PM

Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919

That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?

FunBit9789 t1_ja8aim7 wrote on February 27, 2023 at 4:17 PM

Reply to comment by augusts99 in Implementation of RNN as post-processing by augusts99

Ah ok so if it were also a NN you could have outputs feed directly from model to the other with multiple heads but with a random forest as the first model my suggestion doesn’t really make sense.

augusts99 OP t1_ja7jo91 wrote on February 27, 2023 at 12:53 PM

Reply to comment by thehallmarkcard in Implementation of RNN as post-processing by augusts99

Okay thank you for the feedback! That could be interesting! Model 1 is a Random Forest model and uses different input than the LSTM, and at the moment I think for my skill level it may be a too big of a hassle to make the models predict simultaneously. Or what is meant with stacking the models if I may ask?

nibbajenkem t1_ja7d93f wrote on February 27, 2023 at 11:44 AM

Reply to comment by JJ_00ne in How would you approach this task? by JJ_00ne

What I mean is it doesn't make sense to use deep learning here.

JJ_00ne OP t1_ja7087f wrote on February 27, 2023 at 8:42 AM

Reply to comment by nibbajenkem in How would you approach this task? by JJ_00ne

Yes it's more a way to exercise rather than a real necessity

JJ_00ne OP t1_ja706gr wrote on February 27, 2023 at 8:41 AM

Reply to comment by usesbinkvideo in How would you approach this task? by JJ_00ne

Basically it's an "To do what you want to do, do what you want to do"

usesbinkvideo t1_ja6stdw wrote on February 27, 2023 at 7:01 AM

Reply to How would you approach this task? by JJ_00ne

Here let me ChatGPT this for you:

Cost Function: Since you want to create an organization chart that meets specific criteria, such as having at least one male and one female employee in each sector, you could use a custom cost function that takes these criteria into account. One option could be to penalize the model heavily for each violation of these criteria. For example, you could add a large penalty to the cost function if a sector does not have at least one male and one female employee.

Activation Function: The choice of activation function depends on the structure of your model and the specific problem you're trying to solve. Since you have a binary classification problem (assigning each worker to a sector), you could use the sigmoid activation function for the output layer to produce a probability score for each sector. The input layer and hidden layers could use the ReLU activation function, which has been shown to work well in many types of neural networks.

Setting Minimum Employees Based on Sex: You mentioned that each sector requires at least one male and one female employee. You could enforce this requirement by adding constraints to the model. For example, you could use a custom constraint that checks the number of male and female employees in each sector after each batch and enforces the requirement that each sector has at least one male and one female employee. This would ensure that your model meets the specific requirements of your problem.

I_will_delete_myself t1_ja6e9do wrote on February 27, 2023 at 4:25 AM

Reply to CNN in R code for Parkinson Disease with MRI by Electronic-Clerk868

Dude this is like saying you are trying to build a ship with a rubber duckey. Use the right tools.

thehallmarkcard t1_ja604kn wrote on February 27, 2023 at 2:25 AM

Reply to comment by augusts99 in Implementation of RNN as post-processing by augusts99

So with no other info on your methodology I can’t think of any issue with this. In some sense you’re RNN may be modeling the trend component and the other model measuring the volatility. But that’s hard to say not knowing any more. I am curious if you tried stacking the models directly such that the weights optimize through both models simultaneously. But that depends what kind of models you have and isn’t necessarily better just different.

augusts99 OP t1_ja5y2rt wrote on February 27, 2023 at 2:09 AM

Reply to comment by thehallmarkcard in Implementation of RNN as post-processing by augusts99

Yeah! Currently what I do is that Model 1 makes predictions based on certain input features making predictions timestep for timestep. The LSTM model uses the predicted sequence together with other variable sequences to make the predictions more robust and stable, as well as more making it have more correct trends. Atleast, that is the idea.

thehallmarkcard t1_ja5shr4 wrote on February 27, 2023 at 1:25 AM

Reply to Implementation of RNN as post-processing by augusts99

Am I understanding correctly that you train one model from input features to output minimizing the error to the true output then take the predictions of this first model and feed it into the RNN with other features and again minimize the loss to the true output?

nibbajenkem t1_ja5568v wrote on February 26, 2023 at 10:34 PM

Reply to How would you approach this task? by JJ_00ne

Doesnt seem like anything you need deep learning for

Schlonksi t1_ja4v96x wrote on February 26, 2023 at 9:25 PM

Reply to CNN in R code for Parkinson Disease with MRI by Electronic-Clerk868

>im doing a convolutional Neural Newtork in R code

Why on earth would you do that? Why are you even implementing your own new if you're just using a vanilla CNN?

jazzzzzzzzzzzzzzzy t1_ja3yg3d wrote on February 26, 2023 at 5:49 PM

Reply to CNN in R code for Parkinson Disease with MRI by Electronic-Clerk868

Why R? It seems you want to make things hard for yourself. Just use PyTorch.

bigfoot1144 t1_ja3xywq wrote on February 26, 2023 at 5:46 PM

Reply to CNN in R code for Parkinson Disease with MRI by Electronic-Clerk868

Why are you doing it in R? Not saying you shouldn't but it's much harder than in python. If you're doing it as a learning exercise that is fair enough.

augusts99 OP t1_ja3n6aa wrote on February 26, 2023 at 4:36 PM

Reply to Implementation of RNN as post-processing by augusts99

Perhaps I should elaborate that the predicted sequence made by model 1 is not the only sequence of the LSTM model. I also use different variable sequences for which I hope the LSTM uses these to understand the correct trends.

trajo123 t1_ja3lwj9 wrote on February 26, 2023 at 4:27 PM

Reply to CNN in R code for Parkinson Disease with MRI by Electronic-Clerk868

The architecture depends on what task you want to solve: classification, semantic segmentation, detection/localization?

On another note, by choosing to do deep learning on image-like data such as MRI in R you are making your job more difficult from the get-go as there are many more tool and documentation resources available for Python.

[deleted] OP t1_ja32trh wrote on February 26, 2023 at 2:10 PM

Reply to comment by PaleontologistDue620 in My Neural Net is stuck, I've run out of ideas by [deleted]

Another update, I am reading the first yolo paper:

>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.

Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..

Some-Assistance-7812 t1_j9yb5oh wrote on February 25, 2023 at 1:29 PM

Reply to comment by AbCi16 in Using Jupyter via GPU by AbCi16

Sure, let me know!

AnDaoLe t1_j9ul0cf wrote on February 24, 2023 at 5:48 PM

Reply to Why bigger transformer models are better learners? by begooboi

There's a bunch of papers that show large neural networks are actually just memorizing data as well

suflaj t1_j9sxn6g wrote on February 24, 2023 at 9:26 AM

Reply to comment by Dropkickmurph512 in Why bigger transformer models are better learners? by begooboi

You say it doesn't help, yet double descent says otherwise. You do not early stop transformer models the way you do with other models, outside of maybe finetuning on a similar task.

But pretraining - no way. Big transformers are trained by setting some hyperparameters, and then checking them out the next day. If the model learned something, you keep on doing that, and if it diverged you load the last good checkpoint, change the hyperparameters and train with that.

Early stopping would imply that ypu're confident your hyperparameters are good and that you have a general idea of how long training will take and how much it can learn. For big transformers, neither is the case.

Recent comments in /f/deeplearning