Recent comments in /f/MachineLearning

xander76 OP t1_je9fehv wrote

Thanks for the response!

I may not be completely understanding the question, but from my perspective, the OpenAI APIs are just as non-deterministic as imaginary functions. If you call OpenAI directly multiple times with the exact same prompt and a temperature above 0, you will get different responses each time. The same is true of imaginary functions. (As an interesting side note, we default temperature in imaginary functions to 0, so unless you modify it in the comment, imaginary functions do by default return the same responses for the same set of arguments.)

Now, I do think that introducing this kind of non-determinism into your web code, whether through OpenAI's APIs or imaginary programming, presents some interesting wrinkles. For a traditional web developer like me, the fuzziness and non-determinism is frankly a bit scary. The thing we're working on now is tools that you can use to consistently test your imaginary functions and make sure that they are returning acceptable answers. Our hope is that this will give frontend devs the ability to use AI in their apps with reasonable confidence that the AI is doing what they want it to.

2

cc-test t1_je9fd2k wrote

>In exchange for this vigilance, you get a zero-cost tutor that answers questions immediately, and can take you down a personalized learning path.

You get a zero cost tutor that may or may not be correct about something objective, and as a student you are supposed to trust that?

I also pay, well my company does, to access GPT-4 and it's still not that close to being a reliable tutor. I wouldn't tell my juniors to ask ChatGPT about issues they are having instead of asking me or another of the seniors or lead engineer.

Code working is not equivocal to the code being written correctly or well. If you're the kind of engineer that just think "oh well it works at least, that's good enough" then you're the kind of engineer who will be replaced by AI tooling in the near future.

0

saintshing t1_je9fciu wrote

> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.

I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.

They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.

example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2

https://python.langchain.com/en/latest/getting_started/getting_started.html

https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/

https://twitter.com/yoheinakajima/status/1640934493489070080

7

xander76 OP t1_je9emb1 wrote

We have tried out other runtimes, and a lot of them seem to work decently well. If memory serves, Claude was quite good. I'm definitely interested in supporting other hosting and other models as a way to balance quality, cost, and privacy, and we are currently building IDE tools that will let you test your imaginary functions in ways that will hopefully surface those tradeoffs.

2

gmork_13 t1_je9e6wu wrote

I'm wondering the same thing.
In the LoRA paper they had some pros vs cons on other adapters (where LoRA won out). Though you technically could do both, you'd probably pick one.

Indeed, this adapter wins out vs LoRA when looking at weight size, but since we're talking about MB it's an almost negligible difference (in this scenario). It's a shame they didn't include LoRA training time in their comparison.

They say 1hr on 8*A100, whereas the alpaca-LoRA github says 4-5 hrs on 1*4090.
8*A100's is 640GB VRAM (assuming 80GB model) as opposed to the 4090's 24GB - there are also differences in speed and the fact that the alpaca-LoRA github may have run the inference on an 8bit quantized model.

Since the adapter paper says nothing about quantization, I'm assuming it's 640GB VRAM used for the full fp32 7B model for one hour (or fp16?), compared to the alpaca-LoRA git which runs 24GB VRAM on 8int 7B model for 4.5 hrs.

They both train on the stanford dataset, but alpaca-LoRA-git trains for 3 epochs on the cleaned dataset whereas llama-adapter trains on the full dataset for 5 epochs.
That's a lot of small differences to account for if you're trying to figure out what's faster.
It can be done, but the question remains whether the end result is comparable and whether it was trained to an optimal point.

Since the authors trained alpaca-LoRA, why didn't they write how long alpaca-LoRA took in their comparison table? They trained on the same hardware and dataset, I assume.

If the only difference between this adapter and others is, as they mention in the paper, the gating, zero init and multi-modality then the downsides mentioned in the LoRA paper might still hold (bottlenecks). I'm no expert though.

8

Smallpaul t1_je9e0am wrote

Note: although I have learned many things from ChatGPT, I have not learned a whole language. I haven't run that experiment yet.

ChatGPT is usually good at distilling common wisdom, i.e. professional standards. It has read hundreds of blogs and can summarize "both sides" of any issue which is controversial, or give you best practices when the question is not.

If the question is whether the information it gives you is factually correct, you will need your discernment to decide whether the thing you are learning is trivially verifiable ("does the code run") or more subtle, in which case you might verify with Google.

In exchange for this vigilance, you get a zero-cost tutor that answers questions immediately, and can take you down a personalized learning path.

It might end up being more trouble than it is worth, but it might also depend on the optimal learning style of the student.

I use GPT-4, and there are far fewer hallucinations.

4

LetGoAndBeReal t1_je9c66v wrote

Instead of insisting that fine-tuning reliably adds new knowledge to an LLM, why not instead show some evidence of this claim. Per my links above, this is a notoriously challenging problem in ML.

Apart from these resources, let's think critically for a second. If the approach were viable at this point, then there would be tons of commercial solutions using fine-tuning instead of RAG for incorporating external knowledge in an LLM application. Can you find even one?

2

cc-test t1_je9c5lb wrote

If you're learning something new for the first time and you want to verify that it is correct and is up to professional standards how would you check?

FWIW I use AI tooling daily and I'm huge fan of it, not to mention my job has me working closely with an in-house model created by our Data Science & ML team to integrate into our current systems. My concern is with people treating the recent versions of GPT like a silver bullet, which it isn't, and blindly trusting it.

−1