Recent comments in /f/MachineLearning
lifesthateasy t1_je9ftvv wrote
And your post is being different... How exactly?
xander76 OP t1_je9fh7e wrote
Reply to comment by Educational_Ice151 in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
Thanks! It's definitely a little mind-bending, but we are enjoying exploring how far the idea can go.
xander76 OP t1_je9fehv wrote
Reply to comment by icedrift in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
Thanks for the response!
I may not be completely understanding the question, but from my perspective, the OpenAI APIs are just as non-deterministic as imaginary functions. If you call OpenAI directly multiple times with the exact same prompt and a temperature above 0, you will get different responses each time. The same is true of imaginary functions. (As an interesting side note, we default temperature in imaginary functions to 0, so unless you modify it in the comment, imaginary functions do by default return the same responses for the same set of arguments.)
Now, I do think that introducing this kind of non-determinism into your web code, whether through OpenAI's APIs or imaginary programming, presents some interesting wrinkles. For a traditional web developer like me, the fuzziness and non-determinism is frankly a bit scary. The thing we're working on now is tools that you can use to consistently test your imaginary functions and make sure that they are returning acceptable answers. Our hope is that this will give frontend devs the ability to use AI in their apps with reasonable confidence that the AI is doing what they want it to.
joeiyoma t1_je9fegm wrote
Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
There is a lot of Buzz about prompt engineering, can it cut as a skill-set going forward or is just a hype that will out with time.
cc-test t1_je9fd2k wrote
Reply to comment by Smallpaul in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
>In exchange for this vigilance, you get a zero-cost tutor that answers questions immediately, and can take you down a personalized learning path.
You get a zero cost tutor that may or may not be correct about something objective, and as a student you are supposed to trust that?
I also pay, well my company does, to access GPT-4 and it's still not that close to being a reliable tutor. I wouldn't tell my juniors to ask ChatGPT about issues they are having instead of asking me or another of the seniors or lead engineer.
Code working is not equivocal to the code being written correctly or well. If you're the kind of engineer that just think "oh well it works at least, that's good enough" then you're the kind of engineer who will be replaced by AI tooling in the near future.
saintshing t1_je9fciu wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.
I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.
They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.
example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2
https://python.langchain.com/en/latest/getting_started/getting_started.html
https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/
https://twitter.com/yoheinakajima/status/1640934493489070080
joeiyoma t1_je9f2ma wrote
Reply to comment by Calamero in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
That is the utopia, and we all want it!
statmlsn t1_je9f0mb wrote
Reply to comment by Dear-Vehicle-3215 in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Touché*
Business-Lead2679 OP t1_je9erdj wrote
Reply to comment by Justice43 in [D] Training a 65b LLaMA model by Business-Lead2679
Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it
learn-deeply t1_je9eovt wrote
Reply to comment by ustainbolt in [D] Training a 65b LLaMA model by Business-Lead2679
Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.
xander76 OP t1_je9emb1 wrote
Reply to comment by utopiah in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
We have tried out other runtimes, and a lot of them seem to work decently well. If memory serves, Claude was quite good. I'm definitely interested in supporting other hosting and other models as a way to balance quality, cost, and privacy, and we are currently building IDE tools that will let you test your imaginary functions in ways that will hopefully surface those tradeoffs.
gmork_13 t1_je9e6wu wrote
Reply to comment by CasulaScience in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I'm wondering the same thing.
In the LoRA paper they had some pros vs cons on other adapters (where LoRA won out). Though you technically could do both, you'd probably pick one.
Indeed, this adapter wins out vs LoRA when looking at weight size, but since we're talking about MB it's an almost negligible difference (in this scenario). It's a shame they didn't include LoRA training time in their comparison.
They say 1hr on 8*A100, whereas the alpaca-LoRA github says 4-5 hrs on 1*4090.
8*A100's is 640GB VRAM (assuming 80GB model) as opposed to the 4090's 24GB - there are also differences in speed and the fact that the alpaca-LoRA github may have run the inference on an 8bit quantized model.
Since the adapter paper says nothing about quantization, I'm assuming it's 640GB VRAM used for the full fp32 7B model for one hour (or fp16?), compared to the alpaca-LoRA git which runs 24GB VRAM on 8int 7B model for 4.5 hrs.
They both train on the stanford dataset, but alpaca-LoRA-git trains for 3 epochs on the cleaned dataset whereas llama-adapter trains on the full dataset for 5 epochs.
That's a lot of small differences to account for if you're trying to figure out what's faster.
It can be done, but the question remains whether the end result is comparable and whether it was trained to an optimal point.
Since the authors trained alpaca-LoRA, why didn't they write how long alpaca-LoRA took in their comparison table? They trained on the same hardware and dataset, I assume.
If the only difference between this adapter and others is, as they mention in the paper, the gating, zero init and multi-modality then the downsides mentioned in the LoRA paper might still hold (bottlenecks). I'm no expert though.
saintshing t1_je9e5q1 wrote
Reply to comment by silva_p in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Natural language processing for cats and dogs
Educational_Ice151 t1_je9e2in wrote
Smallpaul t1_je9e0am wrote
Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Note: although I have learned many things from ChatGPT, I have not learned a whole language. I haven't run that experiment yet.
ChatGPT is usually good at distilling common wisdom, i.e. professional standards. It has read hundreds of blogs and can summarize "both sides" of any issue which is controversial, or give you best practices when the question is not.
If the question is whether the information it gives you is factually correct, you will need your discernment to decide whether the thing you are learning is trivially verifiable ("does the code run") or more subtle, in which case you might verify with Google.
In exchange for this vigilance, you get a zero-cost tutor that answers questions immediately, and can take you down a personalized learning path.
It might end up being more trouble than it is worth, but it might also depend on the optimal learning style of the student.
I use GPT-4, and there are far fewer hallucinations.
VertexMachine t1_je9dyb9 wrote
Reply to comment by Nhabls in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Because reddit? :D
badabummbadabing t1_je9cdf7 wrote
Reply to comment by Nhabls in [D] Training a 65b LLaMA model by Business-Lead2679
They just had their Series B funding, they should upscale their resources soon.
LetGoAndBeReal t1_je9c66v wrote
Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo
Instead of insisting that fine-tuning reliably adds new knowledge to an LLM, why not instead show some evidence of this claim. Per my links above, this is a notoriously challenging problem in ML.
Apart from these resources, let's think critically for a second. If the approach were viable at this point, then there would be tons of commercial solutions using fine-tuning instead of RAG for incorporating external knowledge in an LLM application. Can you find even one?
cc-test t1_je9c5lb wrote
Reply to comment by Smallpaul in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
If you're learning something new for the first time and you want to verify that it is correct and is up to professional standards how would you check?
FWIW I use AI tooling daily and I'm huge fan of it, not to mention my job has me working closely with an in-house model created by our Data Science & ML team to integrate into our current systems. My concern is with people treating the recent versions of GPT like a silver bullet, which it isn't, and blindly trusting it.
Smallpaul t1_je9bwuq wrote
Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
It is easy to verify anything ChatGPT tells you about programming.
9182763498761234 t1_je9bhu7 wrote
Reply to comment by silva_p in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I’ll dm you
seedbrage t1_je9b2zz wrote
Reply to comment by TheAdvisorZabeth in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
What
mofoss t1_je9ayyx wrote
Try segformers after augmentation
silva_p t1_je9aw63 wrote
Reply to comment by 9182763498761234 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Can you share your niche?
Maleficent_Refuse_11 t1_je9fzb6 wrote
Reply to [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Very tired, to the point on reflecting whether I can deal with this type of bs for the rest of my career