Recent comments in /f/deeplearning
vk6flab t1_jax2hd7 wrote
Reply to comment by PepperoniMozz in Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
I'm guessing that someone got hold of the thing that you needed to submit a Google Form for and decided that they could instead distribute it via a torrent. They updated the source code as a community service to show the torrent to the next poor sod who went looking through the source to get rid of the Google Form.
But, I'm not the developer, I'm not sure what actually happened, but it seems plausible.
diepala OP t1_jaw16wx wrote
Reply to comment by goedel777 in General NN Architecture guidelines for a regression problem with tabular data by diepala
Because the problem will benefit from doing linear operations, and under certain circumstances the output of the model is almost equal to one of the input features. This is harder to generalize with tree based models.
Jaffa6 t1_javzwj6 wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
No worries, shoot me a message if you need a hand!
OzzyKampha t1_javy6i8 wrote
PepperoniMozz t1_javx5wi wrote
Reply to Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
i an clueless. could someone explain the whole shebang?
goedel777 t1_javtr0h wrote
Reply to comment by diepala in General NN Architecture guidelines for a regression problem with tabular data by diepala
How do you know you can get better results?
inFamous_16 OP t1_javmu8a wrote
Reply to comment by Jaffa6 in [R] Variable size input to pre-trained BERT model by inFamous_16
ohh ok... super clear, Thanks for your time! I will check this out
Jaffa6 t1_javl6ef wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
No problem.
I believe that if you're using a BERT-esque model, you do indeed need to do "full" tokenisation (part of which is creating the attention mask and padding) because BERT expects its input to be a list of token indices. E.g. Given the token mapping {"a": 1, "cow": 2, "cat": 3, "dog": 4}, tokenisation would turn "a cat" into [1, 3] which is in the form that BERT expects.
And since BERT comes with a token mapping (due to pre-training), if you're just putting in your own features (say, number of likes and number of retweets), they'll quite possibly just get interpreted as random tokens if their numbers match up with known token indices.
If your features are already the right kind (tokenised text, with the resultant indices matching the correct BERT token indices), I suppose you could do truncation/padding yourself and feed that input directly to BERT.
But it'll probably end up simpler and less error-prone to let BERT tokenise it for you (e.g. via HuggingFace's `AutoTokenizer.from_pretrained('bert-base')`)
issam_28 t1_javazog wrote
Reply to Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
That was bound to happen.
diepala OP t1_javaont wrote
Reply to comment by big_ol_tender in General NN Architecture guidelines for a regression problem with tabular data by diepala
I already use that, but it is not giving me the results I want, and I know I can get better performance.
mumbo1134 t1_jav893u wrote
Reply to comment by DingWrong in Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
no shortage of seeders on the original torrent
nirajandhakal37 t1_jav7o33 wrote
inFamous_16 OP t1_jav6112 wrote
Reply to comment by Jaffa6 in [R] Variable size input to pre-trained BERT model by inFamous_16
Ahhh... thank you! I wasn't aware of the concept attention mask. Also I had one more doubt, As I already have tweet features of variable size after concatenation, Is there a way to skip the tokenization step because I don't require it? I only need padding and attention mask.
Jaffa6 t1_jav3gj2 wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
I believe that you don't really lose the context because you also have an attention mask which basically says "don't pay attention to these tokens" and every pad token is masked in it.
inFamous_16 OP t1_jauvj21 wrote
Reply to comment by I_will_delete_myself in [R] Variable size input to pre-trained BERT model by inFamous_16
yeah thanks... That's the first thought came into my mind but isn't that way we will lose the context of original feature vector?
I_will_delete_myself t1_jauuhhi wrote
You add padding
DingWrong t1_jauuc0z wrote
Reply to Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
Now just if there were seeders in that torrent..
big_ol_tender t1_jatvx64 wrote
If you have tabular data just use xgboost, forget the nn
djaym7 t1_jatu0sf wrote
average-joee OP t1_jasr95t wrote
Reply to comment by fundamental_entropy in What do you recommend for a text summarization task? by average-joee
Many thanks for your input!
fundamental_entropy t1_jasqy64 wrote
Reply to comment by average-joee in What do you recommend for a text summarization task? by average-joee
Flan models are trained in almost every open dataset available in Generic English tasks. Recent research suggests models trained to perform multiple tasks (in fact ratios of different tasks too affect see flan 2022 paper) are better than models trained only on a given task. Flan T5 beats T5 in almost every task and sometimes Flan T5 XXL matches gpt3 type of prompt generation.
silva_p t1_jax6viy wrote
Reply to Meta’s LLaMa weights leaked on torrent... and the best thing about it is someone put up a PR to replace the google form in the repo with it 😂 by RandomForests92
219GB? Oh dear!