dreamingleo12 t1_jdn511a wrote on March 25, 2023 at 5:17 PM

Reply to comment by Daveboi7 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

By platform you mean?

tonicinhibition t1_jdn4v86 wrote on March 25, 2023 at 5:16 PM

Reply to comment by Veggies-are-okay in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.

CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.

Andrej Karpathy walks you through building GPT in code

As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.

Puzzleheaded_Acadia1 t1_jdn4sly wrote on March 25, 2023 at 5:15 PM

Reply to comment by fv42622 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss

[deleted] t1_jdn4kvn wrote on March 25, 2023 at 5:13 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

[removed]

Deep-Station-1746 t1_jdn3vxg wrote on March 25, 2023 at 5:08 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

If you say so.

philipgutjahr t1_jdn2p2u wrote on March 25, 2023 at 5:00 PM

Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

from https://www.reddit.com/r/MachineLearning/comments/11uk8ti/d_totally_open_alternatives_to_chatgpt/

OpenChatKit (based on GPT-NeoX-20B) https://www.together.xyz/blog/openchatkit

Instruct-GPT https://carper.ai/instruct-gpt-announcement/

gamerx88 t1_jdn1dd3 wrote on March 25, 2023 at 4:51 PM

Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.

yaru22 t1_jdn17j5 wrote on March 25, 2023 at 4:49 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hello,

GPT4 has context length of 32K tokens while some others have 2-4K tokens. What decides the limit on these context lengths? Is it simply bigger the model, larger the context length? Or is it possible to have a large context length even on a smaller model like LLaMA 7/13/30B?

Thank you!

londons_explorer t1_jdn0t7k wrote on March 25, 2023 at 4:47 PM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Paper after paper has shown that bigger model outperforms smaller model.

Sure, you can use tricks to make a small model work better. But apply those same tricks to a big model, and it works even better.

currentscurrents t1_jdn0opn wrote on March 25, 2023 at 4:46 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:

>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.

Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.

currentscurrents t1_jdmzphs wrote on March 25, 2023 at 4:39 PM

Reply to comment by gamerx88 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

That's true, but only for the given compute budget used in training.

Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.

In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

currentscurrents t1_jdmyjrb wrote on March 25, 2023 at 4:31 PM

Reply to comment by Crystal-Ammunition in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Bigger models are more sample efficient, so it should need less data.

But - didn't the Chinchilla paper say bigger models need more data? Yes, but that's only true because right now compute is the limiting factor. They're intentionally trading off more data for less model size.

As computers get faster and models bigger, data will increasingly become the limiting factor, and people will trade off in the opposite direction instead.

Art10001 t1_jdmyazo wrote on March 25, 2023 at 4:29 PM

Reply to comment by sweatierorc in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Quantum computing solves new types of problems, and their resolution, or findings from them, improves our lives.

jomobro117 t1_jdmx6h7 wrote on March 25, 2023 at 4:21 PM

Reply to comment by nicku_a in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Interesting! Is there a discord or slack channel you hang out on for development?

Fit-Recognition9795 t1_jdmwd4g wrote on March 25, 2023 at 4:15 PM

Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai

It is not

DigThatData t1_jdmvquq wrote on March 25, 2023 at 4:10 PM

Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

the fact that it's comparable at all is pretty wild and exciting

DigThatData t1_jdmvjyb wrote on March 25, 2023 at 4:09 PM

Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:

the efficacy of instruct finetuning
that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data

dolly isn't research from a year ago, it was only just described for the first time a few days ago.

EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.

massimosclaw2 t1_jdmvjlp wrote on March 25, 2023 at 4:09 PM

Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

When you haven’t done much, best to obscure it in some complicated language /s

zbyte64 t1_jdmvaak wrote on March 25, 2023 at 4:07 PM

Reply to comment by Blacky372 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Sounds like we're realizing that a model is only as good as the experts that wrote the training data.

DigThatData t1_jdmv87n wrote on March 25, 2023 at 4:07 PM

Reply to [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours

kim_en t1_jdmu877 wrote on March 25, 2023 at 4:00 PM

Reply to comment by Puzzleheaded_Acadia1 in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

me too

Cherubin0 t1_jdmt5el wrote on March 25, 2023 at 3:52 PM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I think have the particular knowledge inside the model is a bad approach. I think it would make much more sense that the model knows how to search and reason about the found data.

Fit-Recognition9795 t1_jdmsxa9 wrote on March 25, 2023 at 3:51 PM

Reply to [D] What happens if you give as input to bard or GPT4 an ASCII version of a screenshot of a video game and ask it from what game it has been taken or to describe the next likely action or the input? by Periplokos

As an AI language model, both GPT-4 and its predecessors like me, ChatGPT, are designed to process and generate text, not to analyze images or visual data. Giving an ASCII representation of a screenshot to GPT-4 or any text-based language model would likely result in a poor understanding of the actual image, as the model doesn't have the capability to process images in the same way that a human or a dedicated image recognition AI can.

However, if the ASCII representation is clear enough and contains easily recognizable elements that are unique to a particular video game, there is a chance that GPT-4 might be able to make an educated guess about the game in question, but the accuracy would be significantly lower compared to proper image recognition AI.

Regarding the prediction of the next likely action or input, GPT-4 might be able to provide some generic suggestions based on the text description, but again, its ability to understand the actual visual information would be limited.

For analyzing images and making predictions about visual content, you would be better off using a dedicated image recognition AI model, such as OpenAI's DALL-E or an AI model specifically trained for video game analysis.

Anis_Mekacher OP t1_jdmshzq wrote on March 25, 2023 at 3:47 PM

Reply to comment by These-Assignment-936 in [D] Keeping track of ML advancements by Anis_Mekacher

It's an inspiring idea, I'll try founding such a club in my company or university.

Thank you for your contribution !!

wrossmorrow t1_jdmsbvf wrote on March 25, 2023 at 3:46 PM

Reply to comment by shanereid1 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Probably related https://arxiv.org/abs/2106.09685

Recent comments in /f/MachineLearning