Raywuo t1_jeal232 wrote on March 30, 2023 at 4:52 PM

Reply to comment by learn-deeply in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

Exactly, they could never appeal or they would be in contradiction with themselves.

SatoshiNotMe t1_jeakml0 wrote on March 30, 2023 at 4:49 PM

Reply to comment by matterhayes in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

thanks!

A_Light_Spark t1_jeaim48 wrote on March 30, 2023 at 4:36 PM

Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

The real vip is in the comments again. TIL about rwkv!
Now I just need to read up on it and see if it can do sequence classification...

ZestyData t1_jeagwa8 wrote on March 30, 2023 at 4:25 PM

Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis

Thank you for repeating half of what I said back to me, much like ChatGPT you catch on quick to new information:

So, let's be clear here then. Contrary to your incorrect first comment; Google translate is an LLM, it is autoregressive, and it is pretrained. At least to the definition of pre-training given in the GPT paper, which was the parallel I first used in my own comment for OP who was coming into this thread with the knowledge of the latest GPT3+ and ChatGPT products.

>It's funny how you mention unrelated stuff, like RLHF

I did so because I had naively assumed you were also a newcomer to the field who knew nothing outside of ChatGPT, given how severely wrong your first comment was. I'll grant you that it wasn't related, except to lend an olive branch and reasonable exit-plan if that were the case for you. Alas.

>LLMs tend to be >>1B parameter models

Again, no. Elmo was 94 million, GPT was 120 milliom, GPT-2 was 1.5 billion. BERT has ~300 million parameters. These are all Large Language Models and have been called so for years.There is no hard definition on what constitutes "large". 2018's large is nearly today's consumer-hardware level. Google Translate (and its search) are a few of the most well-used LLMs actually out there.

Man. Why do you keep talking about things that you don't understand, even when corrected?

>Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).

Sure! It is easier! But that's not what you said. You'd initially brought up P(Y|X) as a justification that Translation isn't pre-trained. Those are two unrelated concepts. Its ultimate modelling goal is P(Y|X) but in both GPT (Generative Pre-training) and Google translate, they both pretrain their ability to predict P(X|context) in the decoder, just like any hot new LLM of today, hence my correction for you. The application towards ultimate P(Y|X) is not connected to the pretraining of their decoders.

ghostfaceschiller t1_jeagl0y wrote on March 30, 2023 at 4:23 PM

Reply to comment by Nhabls in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

What?

HerculeanSubmarine t1_jeaeqow wrote on March 30, 2023 at 4:11 PM

Reply to comment by EvilMegaDroid in [D] FOMO on the rapid pace of LLMs by 00001746

Alpaca LoRA cost pretty much nothing to get the dataset from GPT-3

GPT4All was fine-tuned using a 430k dataset that costed $100 in OpenAI API fees

Crow-Scare t1_jeael4m wrote on March 30, 2023 at 4:10 PM

Reply to [D] Directed Graph-based Machine Learning Pipeline tool? by Driiper

Airflow?

[deleted] t1_jeae8re wrote on March 30, 2023 at 4:08 PM

Reply to comment by Raywuo in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

[deleted]

Raywuo t1_jeadybx wrote on March 30, 2023 at 4:06 PM

Reply to [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

Well data generated by GPT cannot be used on a new IA commercially, but what about data generated from an AI that was generated from GPT data? (2 levels of abstraction) haha

Barton5877 t1_jeadc3o wrote on March 30, 2023 at 4:02 PM

Reply to comment by pengo in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents

On 2:

Competence is used sociologically to describe ability to perform, such as speak or act, in a manner demonstrating some level of mastery - but isn't necessarily a sign of understanding.

I'd be loathe to have to design a metric or assessment by which to "measure" understanding. One can measure or rate competence - the degree to which the person "understands" what they are doing, why, how, for what purpose and so on is another matter.

In linguistics, there's also a distinction between practical and discursive reason that can be applied here: ability to reason vs ability to describe the reasoning. Again, understanding escapes measurement, insofar as what we do and how we know what we are doing isn't the same as describing it (which requires both reflection on our actions and translation into speech that communicates them accurately).

The long and short of it being that "understanding" is never going to be the right term for us to use.

That said, there should be terminology for describing the conceptual connectedness that LLMs display. Some of this is in the models and design. Some of it is in our projection and psychological interpretation of their communication and actions.

I don't know to what degree LLMs have "latent" conceptual connectedness, or whether this is presented only in the response to prompts.

Swolnerman t1_jead4wo wrote on March 30, 2023 at 4:01 PM

Reply to comment by drizel in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

How can it do that with a context window of 32k?

On top of that, I don’t think gpt4 can make informed decisions on picking between academic research papers as of yet

pier4r t1_jead39m wrote on March 30, 2023 at 4:01 PM

Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

As a semi layman, while I was amazed by the progress in ML, I was skeptical of every increasing models, needing more and more parameters to do good. I felt like "more parameters can improve things, then other factor follows".

I asked myself whether there was any effort in being more efficient shrinking things and recently I read about LLAMA and I realized that that direction is now pursued as well.

matterhayes t1_jeacmx0 wrote on March 30, 2023 at 3:58 PM

Reply to comment by SatoshiNotMe in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

It’s been released here: https://huggingface.co/databricks/dolly-v1-6b

matterhayes t1_jeackdz wrote on March 30, 2023 at 3:57 PM

Reply to comment by Daveboi7 in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

It’s been released here: https://huggingface.co/databricks/dolly-v1-6b

ChuckSeven t1_jeab590 wrote on March 30, 2023 at 3:48 PM

Reply to comment by ZestyData in [D] Can large language models be applied to language translation? by matthkamis

It's funny how you mention unrelated stuff, like RLHF, which has nothing to do with the point of discussion. A bit like an LLM I reckon.

See, Google translate models are (as far as publicly known) trained on a parallel corpus. This is supervised data since it provides the same text in different languages. The model is trained to model, e.g. p(y=German|x=English). There is much less supervised data available which means that the models you train will be significantly smaller. Note that translation models are usually only auto-regressive in the decoding part. The encoder part, which usually makes up about 50% of the parameters, is not auto-regressive.

LLMs tend to be >>1B parameter models trained on billions or trillions of tokens. The vast amount of data is believed to be necessary to train such large models. The models are modelling p(x) which in some cases is monolingual or virtually so. An LLM that is trained on a vast but only English corpus will not be capable of translating at all. LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.

Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).

sEi_ t1_jeaaxcy wrote on March 30, 2023 at 3:47 PM

Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215

Thank you for your response.

Yes, it's hard to predict, especially about the future.

cc-test t1_jea9ejf wrote on March 30, 2023 at 3:37 PM

Reply to comment by Smallpaul in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215

>Then you are asking them to waste time.

Having inexperienced staff gain more knowledge about languages and tooling in the context of the codebases they work in isn't a waste of time.

Sure, for example, I'm not going to explain every function in each library or package that we use, and will point juniors towards the documentation. Equally, I'm not going to say "hey ask ChatGPT instead of just looking at the docs", mainly because ChatGPT's knowledge is out of date and the junior would likely be getting outdated information.

>The first time, I wasted 30 minutes trying to interpret an extremely obscure error message, then asked my colleague, then kicked myself because I had run into the same problem six months ago.

So you weren't learning a new language or codebase, you were working with something you already knew. I don't care if anyone, regardless of seniority, uses GPT or any other LLM or any type of model for that matter to solve problems with. You were able to filter through the incorrect outputs or less than ideal outputs and arrive at the solution that suited the problem best.

How are you supposed to do that when you have no foundation to work with?

I do care about people new to a subject matter using it to learn because of the false positives the likes of ChatGPT can spew out.

Telling a junior to use ChatGPT to learn something new is just lazy mentoring and I'd take that as a red flag for any other senior or lead I found doing that.

caffeine_potent t1_jea8km0 wrote on March 30, 2023 at 3:31 PM

Reply to [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215

ChatGPT is writing 40% of my code now. It knows how to read doco faster than me and will synthesize complicated code 80% of the way. The rest is tweaking to it. The hype is real.

timo_kk t1_jea88ah wrote on March 30, 2023 at 3:29 PM

Reply to comment by arnowaczynski in [D] Alternatives to fb Hydra? by alyflex

Pyrallis quite nicely builds on top of that to support e.g. command line arguments.

I'm quite happy with it.

lgastako t1_jea7kb3 wrote on March 30, 2023 at 3:24 PM

Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo

Would love to see an example of it adding knowledge effectively. I haven't been able to find any at all.

EverythingGoodWas t1_jea7gbo wrote on March 30, 2023 at 3:23 PM

Reply to comment by AlmightySnoo in [D] The best way to train an LLM on company data by jaxolingo

You wouldn’t, that would be a direct violation of that license. I would imagine they have a commercial use license as well though.

ZestyData t1_jea73i3 wrote on March 30, 2023 at 3:21 PM

Reply to comment by ChuckSeven in [D] Can large language models be applied to language translation? by matthkamis

LLM simply means Large Language Model. A language model with a large number of parameters. LLMs have referred to all sorts of deep learning architectures over the past 20 years.

Google invented the Transformer architecture, and most importantly discovered how well transformers scale in power as they scale in size. This invention kickstarted the new arms race of LLMs to refer to transformer models with large numbers of parameters.

Google translate's current Prod architecture is a (large) transformer to encode, and an RNN to decode.[1] This falls into the category of LLMs - which weren't just invented when OpenAI invented RLHF at the end of 2022 and published ChatGPT. GPT is the same, but uses transformers for both the encoder & decoder.

The decoding RNN in google translate absolute is an autoregressive model.

I re-read the original GPT paper[2] to try and get a better understanding of the actual "pre-training" term here and I genuinely can't see a difference between that and what Google write about in their papers & blogs [3]; it just defines X & Y differently but they're both predicting a token based on the context window. GPT calls it pretraining because it does an additional step after learning P(X | context). But both approaches perform this fundamental autoregressive training.

[1] - https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html

[2] - https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[3] - https://arxiv.org/pdf/1609.08144.pdf

LanchestersLaw t1_jea6q5k wrote on March 30, 2023 at 3:18 PM

Reply to [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215

Devil’s advocate: why shouldnt the biggest leap in progress towards AGI and the shocking rate of progress be hyped? Even if you limit the news to just be publications by MS/closedAI a lot is happening with progress that was expected to happen in years taking weeks.

[deleted] t1_jea6oxf wrote on March 30, 2023 at 3:18 PM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

[removed]

AlmightySnoo t1_jea6dra wrote on March 30, 2023 at 3:16 PM

Reply to comment by EverythingGoodWas in [D] The best way to train an LLM on company data by jaxolingo

I'm just curious, how are you supposed to fine-tune a model on company data if the current licences (either explicitly, or implicitly through the licence of the training data) on model weights prohibit commercial use?

Recent comments in /f/MachineLearning