Recent comments in /f/MachineLearning
tonicinhibition t1_jdn4v86 wrote
Reply to comment by Veggies-are-okay in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.
CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.
Andrej Karpathy walks you through building GPT in code
As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.
Puzzleheaded_Acadia1 t1_jdn4sly wrote
Reply to comment by fv42622 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss
[deleted] t1_jdn4kvn wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
[removed]
Deep-Station-1746 t1_jdn3vxg wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
If you say so.
philipgutjahr t1_jdn2p2u wrote
Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
from https://www.reddit.com/r/MachineLearning/comments/11uk8ti/d_totally_open_alternatives_to_chatgpt/
OpenChatKit (based on GPT-NeoX-20B) https://www.together.xyz/blog/openchatkit
Instruct-GPT https://carper.ai/instruct-gpt-announcement/
gamerx88 t1_jdn1dd3 wrote
Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.
I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.
yaru22 t1_jdn17j5 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hello,
GPT4 has context length of 32K tokens while some others have 2-4K tokens. What decides the limit on these context lengths? Is it simply bigger the model, larger the context length? Or is it possible to have a large context length even on a smaller model like LLaMA 7/13/30B?
Thank you!
londons_explorer t1_jdn0t7k wrote
Paper after paper has shown that bigger model outperforms smaller model.
Sure, you can use tricks to make a small model work better. But apply those same tricks to a big model, and it works even better.
currentscurrents t1_jdn0opn wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:
>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.
Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.
currentscurrents t1_jdmzphs wrote
Reply to comment by gamerx88 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That's true, but only for the given compute budget used in training.
Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.
In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.
currentscurrents t1_jdmyjrb wrote
Reply to comment by Crystal-Ammunition in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Bigger models are more sample efficient, so it should need less data.
But - didn't the Chinchilla paper say bigger models need more data? Yes, but that's only true because right now compute is the limiting factor. They're intentionally trading off more data for less model size.
As computers get faster and models bigger, data will increasingly become the limiting factor, and people will trade off in the opposite direction instead.
Art10001 t1_jdmyazo wrote
Reply to comment by sweatierorc in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Quantum computing solves new types of problems, and their resolution, or findings from them, improves our lives.
jomobro117 t1_jdmx6h7 wrote
Reply to comment by nicku_a in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a
Interesting! Is there a discord or slack channel you hang out on for development?
Fit-Recognition9795 t1_jdmwd4g wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
It is not
DigThatData t1_jdmvquq wrote
Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
the fact that it's comparable at all is pretty wild and exciting
DigThatData t1_jdmvjyb wrote
Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:
- the efficacy of instruct finetuning
- that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data
dolly isn't research from a year ago, it was only just described for the first time a few days ago.
EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.
massimosclaw2 t1_jdmvjlp wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
When you haven’t done much, best to obscure it in some complicated language /s
zbyte64 t1_jdmvaak wrote
Reply to comment by Blacky372 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Sounds like we're realizing that a model is only as good as the experts that wrote the training data.
DigThatData t1_jdmv87n wrote
don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours
Cherubin0 t1_jdmt5el wrote
I think have the particular knowledge inside the model is a bad approach. I think it would make much more sense that the model knows how to search and reason about the found data.
Fit-Recognition9795 t1_jdmsxa9 wrote
Reply to [D] What happens if you give as input to bard or GPT4 an ASCII version of a screenshot of a video game and ask it from what game it has been taken or to describe the next likely action or the input? by Periplokos
As an AI language model, both GPT-4 and its predecessors like me, ChatGPT, are designed to process and generate text, not to analyze images or visual data. Giving an ASCII representation of a screenshot to GPT-4 or any text-based language model would likely result in a poor understanding of the actual image, as the model doesn't have the capability to process images in the same way that a human or a dedicated image recognition AI can.
However, if the ASCII representation is clear enough and contains easily recognizable elements that are unique to a particular video game, there is a chance that GPT-4 might be able to make an educated guess about the game in question, but the accuracy would be significantly lower compared to proper image recognition AI.
Regarding the prediction of the next likely action or input, GPT-4 might be able to provide some generic suggestions based on the text description, but again, its ability to understand the actual visual information would be limited.
For analyzing images and making predictions about visual content, you would be better off using a dedicated image recognition AI model, such as OpenAI's DALL-E or an AI model specifically trained for video game analysis.
Anis_Mekacher OP t1_jdmshzq wrote
Reply to comment by These-Assignment-936 in [D] Keeping track of ML advancements by Anis_Mekacher
It's an inspiring idea, I'll try founding such a club in my company or university.
Thank you for your contribution !!
wrossmorrow t1_jdmsbvf wrote
Reply to comment by shanereid1 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Probably related https://arxiv.org/abs/2106.09685
dreamingleo12 t1_jdn511a wrote
Reply to comment by Daveboi7 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
By platform you mean?