Submitted by Not-Banksy t3_126a1dm in singularity
ActuatorMaterial2846 t1_je8e3lg wrote
So what happens is they compile a dataset. Basically a big dump of data. For large language models, that is mostly text, books, websites, social media comments. Essentially as many written words as possible.
The training is done through whats called a neural network using something called a transformer architecture. Which is a bunch of GPUs (graphics processing units) linked together. What happens in the nueral network whilst training is a bit of a mystery, 'black box' is often a term used as the computational calculations are extremely complex. So not even the researchers understand what happens here exactly.
Once the training is complete, it's compiled into a program, often referred to as a model. These programs can then be refined and tweaked to operate a particular way for public release.
This is a very very simple explanation and I'm sure there's an expert who can explain it better, but in a nutshell that's what happens.
Not-Banksy OP t1_je8ez6o wrote
Thanks for the explanation, much appreciated!
ActuatorMaterial2846 t1_je8fqgw wrote
No worries. I'll also point out the magic behind all this is particularly the transformer architecture. This is he real engine behind LLMs and other models.
FlyingCockAndBalls t1_je8h4g3 wrote
what is so special about the transformer architecture?
ActuatorMaterial2846 t1_je8ik1t wrote
It's actually quite technical, but essentially, the transformer architecture helps each part of the sentence “talk” to all the other parts at the same time. This way, each part can understand what the whole sentence is about and what it means.
Here is the paper that imo changed the world 6 years ago and is the reason for the current state of AI.
https://arxiv.org/abs/1706.03762
If it goes over your head (it did for me), ask bing or chatgpt to summarise it for you. It helped me get my head around this stuff, as I'm in no way an expert nor do I study this field.
turnip_burrito t1_je8i45w wrote
"Attention mechanism" makes it good at predicting new words from past ones.
The paper that introduced the attention mechanism is called Attention its All You Need.
Zermelane t1_je8lss0 wrote
Better parallelism in training, and a more direct way to reference past information, than in RNNs (recurrent neural networks) which seemed like the "obvious" way to process text before transformers came by.
These days we have RNN architectures that can achieve transformer-like training parallelism, the most interesting-looking one being RWKV. They are still badly disadvantaged when needing information directly from the past, for instance to repeat a name that's been mentioned before, but they have other advantages, and their performance gets close enough to transformers that it could be just a question of scaling exponents which architecture ends up winning out.
abudabu t1_je9ixnd wrote
The GPUs aren’t actually connected together physically. The transformer architecture is entirely in software. The software uses GPUs to do matrix calculations efficiently.
Specifically, the transformer architecture is a bunch of large matrices connected together with arithmetic operations. The training process shows it a sequence of words and sees if it correctly predicts the next word. It figures out how “wrong” the prediction is and updates the matrices so that the prediction will be slightly more right next time. This is a very high level description of “back propagation”.
Using text to automatically train the network is called self-supervised learning. It’s great because no human input is required, just lots of text.
There are many other forms of training. ChatGPT works because it was also trained using human reinforcement feedback learning (HRFL), where humans rank a set of answers. Basically the same underlying process as above, but the answers generated by the network are used to train the network, and the ranking is used to prefer the better answers. Probably when we’re giving up and down votes, OpenAI is using that for HRFL.
Another approach is to use humans to create examples. OpenAI hired people in Africa to have conversations where one played the role of the chatbot. This kind of training helped the network understand chat style interactions.
Since it’s a next word predictor, the chat data has special tokens in the text which represent “user” and “chatbot” roles. So maybe that helps you imagine it better as a very fancy autocomplete.
scooby1st t1_je91quj wrote
>What happens in the neural network whilst training is a bit of a mystery,
Are you referring to something unique to ChatGPT/LLM? What happens during the training of neural networks is not a blackbox. Little bit of chain rule calculus for fitting to a reduced error. Understanding the final network outside of anything but performance metrics is
PM_ME_A_STEAM_GIFT t1_jea840d wrote
That's an important clarification. We understand 100% of every individual building block that goes into designing and training a network. What we do not fully understand is how putting billions of those small elements together results in what looks like some form of intelligence.
jetro30087 t1_je8l2a4 wrote
Close, the data is structured to provide a relationship between prompts and the appropriate responses. LLaMa mostly uses publicly curated datasets by AI researchers that focus on a variety of topics. Here's an excerpt from a finetuned Alpaca models training data:
[
{
    "instruction": "Give three tips for staying healthy.",
    "input": "",
    "output": "1. Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
},
{
    "instruction": "What are the three primary colors?",
    "input": "",
    "output": "The three primary colors are red, blue, and yellow."
},
{
    "instruction": "Describe the structure of an atom.",
    "input": "",
    "output": "An atom is made up of a nucleus, which contains protons and neutrons, surrounded by electrons that travel in orbits around the nucleus. The protons and neutrons have a positive charge, while the electrons have a negative charge, resulting in an overall neutral atom. The number of each particle determines the atomic number and the type of atom."
},
{
    "instruction": "How can we reduce air pollution?",
    "input": "",
    "output": "There are a number of ways to reduce air pollution, such as shifting to renewable energy sources, encouraging the use of public transportation, prohibiting the burning of fossil fuels, implementing policies to reduce emissions from industrial sources, and implementing vehicle emissions standards. Additionally, individuals can do their part to reduce air pollution by reducing car use, avoiding burning materials such as wood, and changing to energy efficient appliances."
},
There are 50K+ examples in the file that the model can use to build relationships between questions and appropriate responses during the training process. If enough relationships are built, the model should be able to infer the correct answer to unique questions.
Kafke t1_je8u4f1 wrote
"instruction": "What are the three primary colors?",
"input": "",
"output": "The three primary colors are red, blue, and yellow."
No wonder they give false info. garbage in, garbage out lol.
scooby1st t1_je92wel wrote
>The shadows are whispering again, whispering secrets that only I can hear. No, no, no! It's all wrong! It's a tangled web of deception, a spiral staircase of lies! They want us to believe that there are only three primary colors—red, blue, and yellow. A trifecta of trickery!
>
> But I see more, I see beyond the curtain. I see colors that don't have names, colors that dance in the dark, colors that hide in the corners of the mind. They think they can pull the wool over our eyes, but I know the truth! There are 19 primary colors, 19 keys to the universe!
>
>I've seen them all, swirling and twisting in the cosmic dance of existence. But they won't listen, they won't believe. They call me mad, but I'm the only one who sees the world as it truly is. The three primary colors are just the beginning, just the tip of the iceberg, just the first step on the journey to enlightenment.
>
>So I laugh, I laugh at their ignorance, I laugh at their blindness. And the shadows laugh with me, echoing my laughter through the halls of infinity.
Kafke t1_je93asd wrote
Yellow isn't a primary color. The primary colors are red, green, and blue.
scooby1st t1_je93qa4 wrote
TruckNuts_But4YrBody t1_je994ja wrote
There are primary colors of physical pigment then there are primary colors of light.
When people learn the primary colors in school it's almost always in art class when mixing paint.
So kinda confidentlyincorrect but not entirely
Kafke t1_je99yqw wrote
There's additive color and subtractive color. The set of red, blue, yellow, is primary for neither. Additive primaries are red, blue, green. Subtractive primaries are cyan, yellow, magenta. If you're mixing paints you're working with subtractive color and thus the primary colors are cyan, yellow, and magenta. not red, blue, and yellow.
The info is incorrect no matter the context.
TruckNuts_But4YrBody t1_je9a9rn wrote
I don't care enough about colors to keep going but it's been red blue and yellow since the 1600s
Kafke t1_je9ao1v wrote
Well no. That's been incorrect since the beginning of time. This is a factual scientific topic. There is a correct answer and incorrect answer. It's not up to preference or opinion. Printers use cyan, magenta, and yellow, because those are the subtractive primary colors. If you used red, blue, and yellow, you can't actually produce the rest of the colors with those. Since red and blue aren't primary for subtractive color, but rather iirc secondary. People being wrong for a long time doesn't mean they're right.
TruckNuts_But4YrBody t1_je9b405 wrote
You can't produce all colors with any set of three primaries..
Kafke t1_je9bdwb wrote
That's literally what primary colors are. How do you think screens and printers are able to produce every color despite only working with 3 of them? Because that's literally what primary colors are.
TruckNuts_But4YrBody t1_je9d4kd wrote
EVERY color? No sorry
Kafke t1_je9drq3 wrote
Yes. You do realize our eyes only have three kinds of cones right? Rgb are the primary colors lol. Cmy if you're looking at subtractive colors. Using these three colors, you can create every other color. Rgb for light/additive, Cmy for ink/paint/subtractive.
Rby is not primary in any sense of the word.
TruckNuts_But4YrBody t1_je9e2ab wrote
Not every other color, no.
scooby1st t1_jeaehdn wrote
Kafke t1_jebepd5 wrote
Yeah that's just incorrect. Additive primaries are RGB. Subtractive primaries are CMY. You're free to deny the facts all you'd like, but this is just an objective scientific thing.
scooby1st t1_jebf65c wrote
>https://www.youtube.com/watch?v=yu44JRTIxSQ
Stop denying science
Kafke t1_jedkbke wrote
Some childrens tv shows or media programs stating incorrect information does not make it correct. Additive primaries are RGB, subtractive primaries are CMY. The idea that RBY are primary colors is a popular misconception, but is incorrect. It has it's roots in art classes prior to proper scientific investigation of color, light, and modern technology. If your goal is art history, then yes, people in the past incorrectly believed that the primary colors (both additive and subtractive) were RBY. They were wrong. Just as people believed the earth was flat, yet were wrong.
scooby1st t1_jedm36o wrote
Have you considered the following?
https://www.youtube.com/watch?v=yu44JRTIxSQ
Jeffy29 t1_je9cuhr wrote
AI will become progressively better at refining datasets, even GPT-4 is quite good at it. From my understanding right now they use low-paid workers, often from 3rd world countries to go over data but that's not particularly efficient method and there just isn't any way to go through all the data with enough care, so there is lot of garbage in those datasets. But AI could do it, it would still require some human supervision but it would speed up the process by a lot and I expect datasets to get dramatically better over the next 5 years.
qepdibpbfessttrud t1_jecizg0 wrote
Misconceptions are part of total human knowledge, though. Both specific misconceptions and the category as a whole. GPT gives good answer if asked about it
It's important to remember when and why we were wrong
ActuatorMaterial2846 t1_je8luak wrote
Interesting, curious what size this particular Llama model is, or is that not even relevant?
jetro30087 t1_je8mtjp wrote
This is a updated dataset for the 7b model, but you could train the others with the data. From anecdotal reports, the dataset seems to have a great impact on the model's performance than the parameter size up to a point. Less parameters means a faster model. More parameters mean the model can make longer responses.
Scarlet_pot2 t1_je92iud wrote
Most of this is precise and correct, but it seems like you say a transformer architecture is the GPUs? The transformer architecture is the neural network and how it is structured. It's code. The paper "attention is all you need" describes how the transformer arch. is made
After you have the transformer written out, you train it on GPUs using data you gathered. Free large datasets such as "the pile" by eluther.ai can be used to train on. This part is automatic.
the Human involved part is the data gathering, data cleaning, designing the architecture before the training. then after humans do finetuning / RLHF (reinforcement learning though human feedback).
those are the 6 steps. Making an AI model can seem hard and like magic, but it can be broken down into manageable steps. its doable, especially if you have a group of people who specialize in the different steps. maybe you have someone who's good with the data aspects, someone good at writing the architecture, some good with finetuning, and some people to do RLHF.
DarkCeldori t1_je971hk wrote
Andrej Karpathy has a nice video series explaining such https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
Viewing a single comment thread. View all comments