pyepyepie t1_j93ary5 wrote on February 18, 2023 at 10:34 PM

Reply to [D] Please stop by [deleted]

OP - Honestly, I don't really see many low-quality posts here (should I sort by new?), the worst I saw today is the current one. Your clickbait title and conservational topic made me spend too much time. Next time say in the title that you are going to preach about something I don't care about so I know not to click it. I wonder what the mods are doing, cause this nonsense should stop.

pyepyepie t1_j8e7gjp wrote on February 13, 2023 at 5:37 PM

Reply to comment by bballerkt7 in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Thanks :) I agree it's useful but I don't see how it's related to AGI. Additionally, it was already done a long time ago, many "AI" agents used the internet before. I feel that the real challenge is to control language models using structured data, perform planning, etc., not to use language models to interact with the world (which seems trivial to me, sorry), but of course, it's just my opinion - which is probably not even that smart.

pyepyepie t1_j8dvci2 wrote on February 13, 2023 at 4:13 PM

Reply to comment by EducationalCicada in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

I would have told you my opinion if I would know what is the definition of AGI xD

pyepyepie t1_j8dv3wv wrote on February 13, 2023 at 4:11 PM

Reply to comment by bballerkt7 in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Why do you think it's a step in this direction? Did you read the paper (serious question, it's interesting)?

pyepyepie t1_j8dgah3 wrote on February 13, 2023 at 2:27 PM

Reply to comment by belacscole in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Did it learn to master tools though? I see it more as a neuro-symbolic system (is it the correct term?). It happens a lot in production.

pyepyepie t1_j1hi1tv wrote on December 24, 2022 at 11:33 AM

Reply to comment by idrajitsc in [D] Has anyone integrated ChatGPT with scientific papers? by justrandomtourist

ChatGPT will do it too, it happily invented papers (with nice ideas! although it just merged two ideas most of the time) for me when I asked for it to write a literature review. Then again, we face the challenge of grounding correctly vs flexibility. My hypothesis is that the model was trained using feedback from non-domain experts as well, so unless we solve grounding fundamentally I would even go and say it is the expected behavior of the model. That is, it was probably rewarded to make facts that sound good even if incorrect, in comparison to facts that sound bad, which makes its hallucination trickier even if it happens less. No reason to think fine-tuning will solve it.

pyepyepie t1_j1hhdv1 wrote on December 24, 2022 at 11:23 AM

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

Interesting answer, thanks :)

pyepyepie t1_j14a34r wrote on December 21, 2022 at 3:49 PM

Reply to [D] Simple Questions Thread by AutoModerator

Why do many papers are putting emphasis on performance comparisons and ignore the model's behavior?

Background - My first ML project was done around 2016-2017. It's funny to say but SOTA for NLP was nowhere near what it is today, so even though I am relatively new to the field, I observed how transformers completely change the world, not only the world of NLP.

Now, I am nowhere close to research scientist, my experience is implementing stuff, but I did read relatively many NLP papers (during work and a little for grad school) - and I see that there are many papers that are improvements upon a specific task, using "cheap tricks" or just fine-tuning a new model (BERT version 100X), to get better quantitative performance.

That being said, I have yet to see a situation where getting 96% vs 95% accuracy (hopefully more info but not always) on datasets that are often imbalanced is a meaningful signal that is even ethical to report as improvement without statistical significance tests and qualitative analysis.

Again, if I look at myself as someone who builds a product, I can't see when I would ever want to use "the best" model if I don't know how it fails - which would mean I would take a 93% model instead of 95% accuracy if I can understand it better (even because the paper was more explicit and the model is a complete black-box).

My question to the smarter & more experienced people here (probably a large portion of the subreddit), is what is the counter to my argument? Do you see qualitative improvements of models (i.e., classification with less bias, better grounding of language models) as more or less important in comparison to quantitative? And if I ask you honestly, do you ever read papers that just improved SOTA without introducing significant novel ideas? If so, why do you do it (I can see a few reasons but would like to hear more)?

pyepyepie t1_j08pa80 wrote on December 14, 2022 at 9:37 PM

Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee

Ideas > performance, for sure :)

pyepyepie t1_j07gugl wrote on December 14, 2022 at 4:58 PM

Reply to comment by nucLeaRStarcraft in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee

I think it's kind of important to state what our models do better, I really dislike this SOTA thing on some dataset, Internal-Diet has a point here.

pyepyepie t1_j07bgek wrote on December 14, 2022 at 4:23 PM

Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee

Just my 2 cents, ignoring the specific model details (as I don't do vision): Well, you would assume every model works differently on different data. For example, try to train a large NN on 10 examples that are y = mx + b, and then try to do the same but with a linear model. The same applies also in less clear situations, i.e. larger models that require more data vs larger models that are more sample efficient but introduce more bias.

pyepyepie t1_izgms77 wrote on December 8, 2022 at 11:29 PM

Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Interesting. I was completely surprised by the results (I honestly thought Diplomacy will take 10 years) - it's a great demo of how to utilize large language models without messing up :) Congrats.

pyepyepie t1_izci4w8 wrote on December 8, 2022 at 2:22 AM

Reply to comment by dborowiec10 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Not from the Meta team but you might want to take a look in the SM and search for "GPU"/"GPUs", they actually did a very nice job describing it (does not answer you question RE region but I thought it might be helpful, e.g. number of GPUs).

pyepyepie t1_izbzd9r wrote on December 8, 2022 at 12:02 AM

Reply to [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

I feel like the agent was implemented incredibly well, however, the grounding and "information selection" of the language model was not "clean" since it used classifiers to filter messages. Since the Diplomacy team is extremely competent, I wonder if you had put efforts regarding grounding better (in a general context) and if it's in a future plan, as I feel like it's very important for the community (arguably one of the most important problems in NLP).

edit: I know that the language model was conditioned on in-game orders, etc., but I wonder if you intend to work on novel algorithms for it in the future.

pyepyepie t1_iz0wj9m wrote on December 5, 2022 at 5:34 PM

Reply to comment by trnka in [D] What is the advantage of multi output regression over doing it individually for each target variable by triary95

Super interesting. I like the story about the metrics, very useful for people who are new to data science. Even when it's not solvable (I assume in your case it was but in MARL, for example, sometimes if you just aim for Pareto optimality you get a weird division of "goods") you would rather have two models with x-5% accuracy than 1 model with x+15 and another with x-15 most of the time. We get money to know the systems we build :)

BTW, what you talk about seems related to this https://openai.com/blog/deep-double-descent/ (deep double descent). That phenomenon is clearly magic :D I have heard some explanations about weight initialization at a conference but to be honest I really don't have anything intelligent to say about it, would be interesting to see if it's the standrad type of networks in 20 years.

pyepyepie t1_iyzvql1 wrote on December 5, 2022 at 12:57 PM

Reply to comment by michaelaalcorn in [D] What is the advantage of multi output regression over doing it individually for each target variable by triary95

Great answer, but I am a little unsure about the last line. If you are using ANN you can get stuck in a local minimum of the loss function, and I am not sure if learning multiple tasks in parallel will not be beneficial for the model. I am not saying you are incorrect, just trying to learn something new :).

edit: my TLDR question is if sharing weights can't prevent getting stuck in local minimum in the case of ANN, i.e., improving performance.

pyepyepie t1_iyx0k1s wrote on December 4, 2022 at 8:38 PM

Reply to comment by maxToTheJ in [D] NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready by Even_Stay3387

True. Who am I to say what is good and what's not, but I tend to enjoy simple papers with good ideas much more than papers that contain many moving parts (I am 100% unable to get that kind of result but I can enjoy it :) ).

I kind of treat complicated papers without robust code as noise or maybe a source of ideas, but when I try to implement it it's mostly not working as well as expected - e.g., I had to implement a model for a task related to speech and I have no expertise in the field, most of the models I tried to use were really bad in comparison to a good, simple solution (inspired by ResNet), and I found a model that performs better only due to preprocessing. It's hard to come up with new ideas so I am happy there is so much information, but sometimes it's too much.

pyepyepie t1_iywowgo wrote on December 4, 2022 at 7:24 PM

Reply to comment by lemlo100 in [D] NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready by Even_Stay3387

Thank you sir for making SIGNIFICANT contributions, it takes a lot to go against your supervisor's opinions, but it seems like you did the moral thing.

pyepyepie t1_iywmmd9 wrote on December 4, 2022 at 7:10 PM

Reply to comment by Comfortable_Use_5033 in [D] NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready by Even_Stay3387

I agree. It's even worse when the cause of the improvement is different from that stated in the paper (you might as well call some papers "really? ADAM is better than SGD most of the time"), causing such a huge time-waste.

pyepyepie t1_iywkmz6 wrote on December 4, 2022 at 6:57 PM

Reply to comment by lemlo100 in [D] NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready by Even_Stay3387

I was a software engineer for a few years (I would probably say I am a little more skilled as a coder than in DS), and I still find it difficult to not mess up experiments if I don't recheck myself. Mostly, I just assume my results are garbage and try to attack them until I come to the conclusion that it's actually real. It's even more important when the task is not supervised (i.e., difficult to implement, MARL, GANs...), for example (RL) - you might think you developed a nice algorithm just to find out you accidentally modified the rewards.