Recent comments in /f/MachineLearning

Carrasco_Santo t1_jczz5k9 wrote

In theory, Open Assistant should at least match the best corporate models if enough people start accessing the language project and contribute at least a little bit each week to creating prompts, sorting prompts, etc.

In theory, if 10,000 people do this work every month, that's a much greater number of people than any AI team in large corporations. The issue is the quality of the work.

30

i_sanitize_my_hands OP t1_jczyrsl wrote

Not expecting a magic bullet solution. Been in the field long enough to know that.

However, any written record of the intelligent ways you mentioned are valuable and worth going through.

One of the reasons it gets asked a lot is because image quality analysis doesn't seem to get enough air time. There are only few papers and sone as old as 2016. They font reflect the trend since 'all you need is attention '

1

currentscurrents t1_jczods2 wrote

I'm hoping that non-Vonn-Neumann chips will scale up in the next few years. There's some you can buy today but they're small:

>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.

>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode

An arduino idles at about 10mw, for comparison.

The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.

14

lucidraisin t1_jcznnvh wrote

actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.

2

Joel_Duncan t1_jczkbc0 wrote

I keep seeing this getting asked like people are expecting a magic bullet solution.

​

In general you can only get out something within the realm of what you put in.

There are intelligent ways to structure training and models, but you can't fill in expected gaps without training with a reference or a close approximation of what those gaps are.

My best suggestion is to limit your input data or muxed model to specific high resolution subsets.

ex. You can train a LoRa on a small focused subset of data.

0

Unlucky_Excitement_2 t1_jczk2lm wrote

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?

2