Smallpaul t1_jea4whk wrote on March 30, 2023 at 3:05 PM

Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215

>You get a zero cost tutor that may or may not be correct about something objective, and as a student you are supposed to trust that?

No. I did not say to trust that.

Also: if you think that real teachers never make mistakes, you're incorrect yourself. My kids have textbooks full of errata. Even Donald Knuth issues corrections for his books (rarely).

>I also pay, well my company does, to access GPT-4 and it's still not that close to being a reliable tutor. I wouldn't tell my juniors to ask ChatGPT about issues they are having instead of asking me or another of the seniors or lead engineer.

Then you are asking them to waste time.

I am "junior" on a particular language and I wasted a bunch of time on a problem because I don't want to bug the more experience person every time I have a problem.

The situation actually happened twice in one day.

The first time, I wasted 30 minutes trying to interpret an extremely obscure error message, then asked my colleague, then kicked myself because I had run into the same problem six months ago.

Then I asked ChatGPT4, and it gave me six possible causes. Which included the one that I had seen before. Had I asked GPT4, I would have saved myself 30 minutes and saved my colleague an interruption.

The second time, I asked ChatGPT4 directly. It gave me 5 possible causes. Using process of elimination I immediately knew which it was. Saved me trying to figure it out for myself before interrupting someone else.

You are teaching your juniors to be helpless instead of teaching them how to use tools appropriately.

> Code working is not equivocal to the code being written correctly or well. If you're the kind of engineer that just think "oh well it works at least, that's good enough" then you're the kind of engineer who will be replaced by AI tooling in the near future.

One of the ways you can use this tool is to ask it how to make the code more reliable, easier to read, etc.

If you use the tool appropriately, it can help with that too.

Tight-Lettuce7980 t1_jea4ojl wrote on March 30, 2023 at 3:03 PM

Reply to comment by trajo123 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

How about medical images, which are more difficult to obtain due to privacy issues? I don't think it's easy to get for example 1000+ images. Would 300 - 700 or so be sufficient?

[deleted] t1_jea4n7e wrote on March 30, 2023 at 3:03 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

[removed]

Academic-Rent7800 t1_jea3eop wrote on March 30, 2023 at 2:54 PM

Reply to [D] Simple Questions Thread by AutoModerator

I am having a hard time understanding how knowledge distillation can help federated learning. I have uploaded my question here (https://ai.stackexchange.com/questions/39846/how-does-knowledge-distillation-help-federated-learning). I will highly appreciate inputs on it!

JustOneAvailableName t1_jea2dzf wrote on March 30, 2023 at 2:47 PM

Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Software engineer perspective on attention (self quote):

> You have to think about searching. If you search, you have a query (the search term), some way to correlate the query to the actual (size unknown/indifferent) knowledge base and the knowledge base itself. If you have to write this as a mathematical function you have to have something that matches a query, to how similar it is to some key and then return the corresponding value to that key. The transformer equation is a pretty straightforward formula from that perspective. Each layers learns what it searches for, how it can be found and which value it wants to transfer when requested.

RWKV changes this by removing the query. So data is not requested anymore, only pushed. I am frankly surprised to seems to work thus far. Pushing data (self determining how important something is for something else) is not dependant on other states, enabling it to be a RNN.

Edit: step I need to mention: in RWKV importance also fades over time, so it has a recency bias

ChuckSeven t1_jea2b99 wrote on March 30, 2023 at 2:46 PM

Reply to comment by ZestyData in [D] Can large language models be applied to language translation? by matthkamis

Google translate is certainly not an LLM. LLM can do translation but they are significantly worse than translation models trained on translation data. They have an encoder-decoder architecture as it is a sequence-to-sequence model and not a autoregressive architecture like LLMs do.

They are also not pretrained afaik. Since language modelling is modelling p(x) but translation is p(y|x).

IDefendWaffles t1_jea2403 wrote on March 30, 2023 at 2:45 PM

Reply to [D] Can large language models be applied to language translation? by matthkamis

You can talk to chatgpt in English and then just ask it a question in any other language and it will answer back to you in that language. Or you can just tell it: let's talk in Finnish from now on and it will.

LetGoAndBeReal t1_jea1id9 wrote on March 30, 2023 at 2:40 PM

Reply to comment by elbiot in [D] The best way to train an LLM on company data by jaxolingo

I believe you are referring to this statement from the link: "Ability to train on more examples than can fit in a prompt." Correct?

If so, as I explained, the key word here is "examples." And if you understand why, you will see that there is no contradiction. I will try to clarify why.

There are two methods that we are discussing for extending the capability of an LLM:

Prompt engineering
Fine-tuning

There are also different types of capability that might be extended. We are discussing the following two:

Adding new knowledge/facts to the model
Improving downstream processing tasks, such as classification, sentiment analysis, etc.

Both of these capabilities are readily done through prompt engineering. Adding new knowledge with prompt engineering involves including that knowledge as context in the prompt. Improving tasks such as classification is done by include examples of the processing you want done in the prompt.

What the article says is that for the case where you want to provide examples in the prompt to make the model perform better, you can alternatively use fine-tuning. The article does not say "Ability to add more knowledge than can fit in a prompt." Examples = downstream processing tasks. Examples != new knowledge.

WokeAssBaller t1_jea17d0 wrote on March 30, 2023 at 2:38 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Why don’t you actually implement a transformer from scratch and then speak more confidently, this is like talking to a virgin about sex.

DigThatData t1_jea10sg wrote on March 30, 2023 at 2:37 PM

Reply to comment by TheAdvisorZabeth in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

yeah... i hate to say it but I agree with the other commenters. If you have access to medical support, I strongly recommend you get seen by a physician. I'm concerned you might be experiencing some kind of psychiatric episode. If you're skeptical that's fine, you can even tell them that.

> "Strangers on the internet expressed concern that I might be experiencing a psychiatric episode of some kind. I don't see it, but enough people suggested it that I felt it merited a professional opinion, so here I am."

WokeAssBaller t1_jea0ubd wrote on March 30, 2023 at 2:35 PM

Reply to comment by lgastako in [D] The best way to train an LLM on company data by jaxolingo

Fine tuning is additional training, there are lots of ways of doing that and sometimes it’s absolutely ideal, there are tradeoffs

WokeAssBaller t1_jea0o2f wrote on March 30, 2023 at 2:34 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Again you are using an incredibly limited definition of fine tuning based on what the open ai api allows, which once again tells me you don’t know ML.

Fine tuning is ANY additional training on a foundational model, this can be MLM training on the model base or selectively training the subsequent layers.

OF COURSE this can add knowledge as you are doing the same training that got it knowledge in the first place. Glad to see you jumped on the chatgpt band wagon last week, build a transformer from scratch and come talk to me

LetGoAndBeReal t1_je9zfyb wrote on March 30, 2023 at 2:25 PM

Reply to comment by Goldenier in [D] The best way to train an LLM on company data by jaxolingo

>And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.

It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".

FermiAnyon t1_je9ygm5 wrote on March 30, 2023 at 2:18 PM

Reply to [D] Can large language models be applied to language translation? by matthkamis

Yeah, I've had bilingual conversations with chatgpt.

Also, ask it a question and when it tells you some paragraphs, say "can you rewrite that in Japanese" or whatever you speak.

Pretty cool

[deleted] t1_je9ydid wrote on March 30, 2023 at 2:17 PM

Reply to [D] Can large language models be applied to language translation? by matthkamis

[removed]

unkz t1_je9wuzm wrote on March 30, 2023 at 2:06 PM

Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Practically speaking, it does have a context limit — that RNN issue has not really been solved. It is a lot of fun to play with though.

Successful_Can_3704 t1_je9wksf wrote on March 30, 2023 at 2:04 PM

Reply to comment by darkbluetwilight in [D]Suggestions on keeping Llama index cost down by darkbluetwilight

Updates for this issue? I need "gpt-3.5-turbo" in Llama index </3

xander76 OP t1_je9w2ax wrote on March 30, 2023 at 2:00 PM

Reply to comment by icedrift in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76

Yeah, that's definitely one of the things it offers right now. If you want a particular data shape out of GPT, we handle that, both on the side of crafting the prompt to elicit the type and on the parsing side to get the data out of the raw GPT response.

We're also building more tools to make the development process easier, which depend on the fact that imaginary functions are easy to do static analysis on. The first tool is an IDE plugin that lets you directly run and test imaginary functions in VS Code and to compare different versions of an imaginary function to see how they do on various test inputs. We also plan to add simple annotations to the comment format to let you easily switch to other LLMs for your runtime to manage the cost/quality/privacy tradeoff.

ETA: One thing it also does right now is lets you switch between models (ada, babbage, curie, davinci, gpt-3.5-turbo, gpt-4) with just a configuration switch. If you use OpenAI's APIs you need to change your client code, because the GPT-3 models have a different API than GPT-3.5 and GPT-4.

jaxolingo OP t1_je9w197 wrote on March 30, 2023 at 2:00 PM

Reply to comment by sandys1 in [D] The best way to train an LLM on company data by jaxolingo

Would be amazing yes!

lxe t1_je9vkqx wrote on March 30, 2023 at 1:57 PM

Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

In what way is this different than the existing low rank adaptation method everyone is doing already?

Goldenier t1_je9uruu wrote on March 30, 2023 at 1:51 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

This is false, and actually most of the time the opposite is the problem: the model learns too much of the new data it's finetuned on (overfitting on it), but forgets the "knowledge" in the original model. The simplest and most popularly used example right now is when you use the dreambooth, Lora or other finetuning methods to finetune parts of the big image diffusion models and if you overtrain it will place the newly trained face or object in almost all of it's output, so it easily learns new data but also easily forgets old one. ( One mitigation for this is to use preservation loss to make sure it also keeps the old knowledge. ) And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.