Recent comments in /f/deeplearning

alam-ai t1_ja8czf5 wrote

Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.

Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.

2

augusts99 OP t1_ja7jo91 wrote

Okay thank you for the feedback! That could be interesting! Model 1 is a Random Forest model and uses different input than the LSTM, and at the moment I think for my skill level it may be a too big of a hassle to make the models predict simultaneously. Or what is meant with stacking the models if I may ask?

1

usesbinkvideo t1_ja6stdw wrote

Here let me ChatGPT this for you:

Cost Function: Since you want to create an organization chart that meets specific criteria, such as having at least one male and one female employee in each sector, you could use a custom cost function that takes these criteria into account. One option could be to penalize the model heavily for each violation of these criteria. For example, you could add a large penalty to the cost function if a sector does not have at least one male and one female employee.

Activation Function: The choice of activation function depends on the structure of your model and the specific problem you're trying to solve. Since you have a binary classification problem (assigning each worker to a sector), you could use the sigmoid activation function for the output layer to produce a probability score for each sector. The input layer and hidden layers could use the ReLU activation function, which has been shown to work well in many types of neural networks.

Setting Minimum Employees Based on Sex: You mentioned that each sector requires at least one male and one female employee. You could enforce this requirement by adding constraints to the model. For example, you could use a custom constraint that checks the number of male and female employees in each sector after each batch and enforces the requirement that each sector has at least one male and one female employee. This would ensure that your model meets the specific requirements of your problem.

0

thehallmarkcard t1_ja604kn wrote

So with no other info on your methodology I can’t think of any issue with this. In some sense you’re RNN may be modeling the trend component and the other model measuring the volatility. But that’s hard to say not knowing any more. I am curious if you tried stacking the models directly such that the weights optimize through both models simultaneously. But that depends what kind of models you have and isn’t necessarily better just different.

1

augusts99 OP t1_ja5y2rt wrote

Yeah! Currently what I do is that Model 1 makes predictions based on certain input features making predictions timestep for timestep. The LSTM model uses the predicted sequence together with other variable sequences to make the predictions more robust and stable, as well as more making it have more correct trends. Atleast, that is the idea.

1

thehallmarkcard t1_ja5shr4 wrote

Am I understanding correctly that you train one model from input features to output minimizing the error to the true output then take the predictions of this first model and feed it into the RNN with other features and again minimize the loss to the true output?

1

augusts99 OP t1_ja3n6aa wrote

Perhaps I should elaborate that the predicted sequence made by model 1 is not the only sequence of the LSTM model. I also use different variable sequences for which I hope the LSTM uses these to understand the correct trends.

1

trajo123 t1_ja3lwj9 wrote

The architecture depends on what task you want to solve: classification, semantic segmentation, detection/localization?

On another note, by choosing to do deep learning on image-like data such as MRI in R you are making your job more difficult from the get-go as there are many more tool and documentation resources available for Python.

4

[deleted] OP t1_ja32trh wrote

Another update, I am reading the first yolo paper:

>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.

Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..

1

suflaj t1_j9sxn6g wrote

You say it doesn't help, yet double descent says otherwise. You do not early stop transformer models the way you do with other models, outside of maybe finetuning on a similar task.

But pretraining - no way. Big transformers are trained by setting some hyperparameters, and then checking them out the next day. If the model learned something, you keep on doing that, and if it diverged you load the last good checkpoint, change the hyperparameters and train with that.

Early stopping would imply that ypu're confident your hyperparameters are good and that you have a general idea of how long training will take and how much it can learn. For big transformers, neither is the case.

0