Recent comments in /f/deeplearning

xolotl96 t1_j9izg4o wrote

The proce of 4 rtx4090 is very high. If you plan on spending that much money I suppose it is for business or research. It could make sense to invest in a server grade processor and motherboard with support to many more pcie lanes. Also, servers often use multiple power supplies for redundancy, but in this case they can be helpful for managing the wattage of 4 cards. In my experience limiting the max power does not impact training time in a dramatic way, so I would do that especially if you are planning to air cool them (which is probably the best thing for an offsite server)

4

hayAbhay t1_j9i9nfv wrote

If you have the hardware, and if you have a lot of those input-output examples, you can use alternative smaller models in the gpt family.

Should work reasonably well especially if the variance in the input-output isn't too much. (A lot depends on your dataset here)

Definitely tradeoffs here in terms of model dev, inference and maintenance of it. If the expected costs aren't too high, I'd strongly recommend gpt3 as a base.

1

hayAbhay t1_j9duoda wrote

Create a Corpus C like this

<source text from corpus A> <human generated text from corpus B> . . .

Make sure you add some unique tokens marking the start and end of each example and the input and output within it.

Then, take any pretrained LLM (tuning gpt3 is trivial with ~10-20 lines of code).

For inference, use the tuned model and give it the input and let it complete the output. You can add the "end" marker token to get generation to complete.

[Source: trained/tuned several language models including gpt3]

3

Morteriag t1_j90stfo wrote

Put it to test with real data. Putting a lot of effort into tuning a model on a fixed data set that will eventually be deployed is a waste of time. And dont freak out when it fails! Just add more quality data from when it is deployed.

2

Oceanboi t1_j8zdely wrote

Oh I see, I missed the major point that the training data is basically incomplete to model the entire relationship.

Why embed priors into neural networks, doesn’t Bayesian Modeling using MCMC do pretty much what this is attempting to do? We did something similar to this in one of my courses although we didn’t get to spend enough time on it so forgive me if my questions are stupid. I also would need someone to walk me through a motivating example for a PINN because I’d just get lost in generalities otherwise. I get the example, but am failing to see the larger use case.

1