Recent comments in /f/MachineLearning

blueSGL t1_jdl756z wrote

> with specialized expert data from literally 50 experts in various fields that worked on the response quality in their domain.

Sounds like a future goal for Open Assistant.

If one were being unethical... create a bot to post the current Open Assistant answers to technical questions in small specialist subreddits and wait for Cunningham's_Law to come into effect. (I'm only half joking)

20

Blacky372 t1_jdl62vl wrote

GPT-J-6B with instruction finetuning will surely not ever be better than GPT-4. With RLHF you may reach a similar response quality in some contexts for some types of instruction, but you will never match the vast amounts of proprietary data that ClosedAI fed into a probably 250+B parameter model with specialized expert data from literally 50 experts in various fields that worked on the response quality in their domain. This cannot be surpassed easily, unfortunately. But maybe future open source models will be of similar capabilities with advanced training techniques. I would definitely hope so.

56

WarAndGeese t1_jdl5t0z wrote

Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?

7

cyborgsnowflake t1_jdl47n8 wrote

I think the simpler answer is its easier than some people believed to reproduce certain knowledge tasks statistically than the alternative theory that shuffling tensors creates living thinking beings like everyone else on this thread seems to be jumping on board.

2

dreamingleo12 t1_jdl3qgp wrote

It’s just a shameless copy of Stanford’s work. The innovative thing about Stanford Alpaca is it makes a ChatGPT style assistant with a language model, Meta LLaMA, and the cost is low. Databricks just followed Stanford’s approach and uses a different base model and claims it’s a big innovation. Alpaca actually can be fine-tuned with the same dataset in 3 hours and performs better than Databricks’ model.

4

learn-deeply t1_jdl1bmp wrote

Anyone else tired of papers that obscure a simple concept with endless paragraphs of verbose gibberish? This 17 page could be a few sentences.

Tl;DR the authors wrote prompts to tell GPT-4 to fix code given some unit tests and the output of the broken code. It performs better than GPT-4 that doesn't have access to the output of the code execution.

https://github.com/noahshinn024/reflexion-human-eval/blob/main/reflexion.py#L7-L12

369

blueSGL t1_jdl02u6 wrote

>So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.

> In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.

That just reads to me as capability overhang. If there is "one simple trick" to make the model "behave" what's to say there that this is the only one. (or that the capabilities derived from the current behavior modification are the 'best they can be') Scary thought.

2