Recent comments in /f/MachineLearning

RedditLovingSun t1_jddyo6g wrote

I can see a future where apple and android start including apis and tools/interface for LLM models to navigate and use features of the phone, smart home appliance makers can do the same, along with certain web apps and platforms (as long as your user is authenticated). If that kind of thing takes off so businesses can say they are "GPT friendly" (same way they say "works with Alexa") or something we could see actual Jarvis level tech soon.

Imagine being able to talk to google assistant and it's actually intelligent and can operate your phone, computer, home, execute code, analyze data, and pull info from the web and your google account.

Obviously there are a lot of safety and alignment concerns that need to be thought out better first but I can't see us not doing something like that in the coming years, it would suck tho if companies got anti-competitive with it (like if google phone and home ml interfaces are kept only available to google assistant model)

83

adventuringraw t1_jddte0k wrote

No one else mentioned this, so I figured I'd add that there's also much more exotic research going into low-power techniques that could match what we're seeing with modern LLMs. One of the most interesting areas to me personally, is that there's been recent progress in spiking neural networks, an approach much more inspired by biological intelligence. The idea, instead of continuous parameters sending vectors between layers, you've got spiking neurons sending sparse digital signals. Progress historically has been kind of stalled out since they're so hard to train, but there's been some big movement just this month actually, with spikeGPT. They basically figured out how to leverage normal deep learning training. That along with a few other tricks got something with comparable performance to an equivalently sized DNN, with 22x reduced power consumption.

The real promise of SNNs though, in theory you could develop large scale specialized 'neuromorphic' hardware... what GPUs and TPUs are for traditional DNNs, meant to optimally run SNNs. A chip like that could end up being a cornerstone of efficient ML, if things end up working out that way, and who knows? Maybe it'd even open the door to tighter coupling and progress between ML and neuroscience.

There's plenty of other things being researched too of course, I'm nowhere near knowledgeable enough to give a proper overview, but it's a pretty vast space once you start looking at more exotic research efforts. I'm sure carbon nanotube or superconductor based computing breakthroughs would massively change the equation for example. 20 years from now, we might find ourselves in a completely new paradigm... that'd be pretty cool.

1

underPanther t1_jddpryu wrote

Another reason: wide single-layer MLPs with polynomials cannot be universal. But lots of other activations do give universality with a single hidden layer.

The technical reason behind this is that non-discriminatory discriminatory activations can give universality with a single hidden layer (Cybenko 1989 is the reference).

But polynomials are not discriminatory (https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions), so they fail to reach this criterion.

Also, if you craft a multilayer percepteron with polynomials, does this offer any benefit over fitting a Taylor series directly?

2

jarmosie t1_jddmvp9 wrote

What are you some informative blogs, RSS feed or newsletter you've subscribed to for regular content on Machine Learning? In general, the Software Development community has an abundance of people maintaining high quality online content through individual blogs or newsletter.

I know there's Towards Data Science & Machine Learning Mastery to name a few but what other lesser known yet VERY informative resource did you stumble across & one which has help you further you knowledge even more?

1

GamerMinion t1_jddlqit wrote

Yes, theory is one thing, but you can't build ASICs for everything due to the cost involved.

Did you look into sparsity at latency-equivalent scales? i.e. same latency, bigger but sparser model.

I would be very interested to see results like that, especially for GPU-like accelerators (e.g. Nvidia's AGX computers use their ampere GPU architecture), as latency is a primary focus in high-value computer vision applications such as in autonomous driving.

2

brownmamba94 t1_jddhxdb wrote

Hi yes, this is a great question. When we say FLOP-equivalent, we're saying on an ideal hardware which can accelerate unstructured weight sparsity, the total compute-time would also be equivalent. Except, we're showing we can actually improve the accuracy of the original dense model for the same compute budget with these Sparse Iso-FLOP Transformations (e.g., Sparse Wide, Sparse Parallel, etc.).

In Section 4 of our paper, we actually make comparisons for inference and training on hardwares with and without support for sparsity acceleration.

In theory, there should be no increase in wall-clock time, but on GPUs there'd be a significant increase. However, emerging hardware accelerators like Cerebras CS-2 are doing hardware-software co-design for sparse techniques, which can allow us to take advantage of sparse acceleration during training.

0

GamerMinion t1_jddeprr wrote

When you say "FLOP-equivalent, does that also mean compute-time equivalent?

I ask this because on GPUs, models like EfficientNet, which technically have far less flops and parameters can be way slower than a standard ResNet of same accuracy because they're that much less efficiently parallelizable.

Did you look into inference latency on GPUs in your paper?

3

brownmamba94 t1_jdd1otu wrote

Hi thank you for the feedback. This was a genuine oversight and we will correct the paper with a new acronym in the revised version of the manuscript. You can expect the changes soon. I look forward to any feedback you have on the research itself, cheers!

8