Recent comments in /f/MachineLearning
ZenDragon t1_jddxepi wrote
Reply to [N] ChatGPT plugins by Singularian2501
Wolfram plugin 👀
[deleted] t1_jddutrc wrote
adventuringraw t1_jddte0k wrote
No one else mentioned this, so I figured I'd add that there's also much more exotic research going into low-power techniques that could match what we're seeing with modern LLMs. One of the most interesting areas to me personally, is that there's been recent progress in spiking neural networks, an approach much more inspired by biological intelligence. The idea, instead of continuous parameters sending vectors between layers, you've got spiking neurons sending sparse digital signals. Progress historically has been kind of stalled out since they're so hard to train, but there's been some big movement just this month actually, with spikeGPT. They basically figured out how to leverage normal deep learning training. That along with a few other tricks got something with comparable performance to an equivalently sized DNN, with 22x reduced power consumption.
The real promise of SNNs though, in theory you could develop large scale specialized 'neuromorphic' hardware... what GPUs and TPUs are for traditional DNNs, meant to optimally run SNNs. A chip like that could end up being a cornerstone of efficient ML, if things end up working out that way, and who knows? Maybe it'd even open the door to tighter coupling and progress between ML and neuroscience.
There's plenty of other things being researched too of course, I'm nowhere near knowledgeable enough to give a proper overview, but it's a pretty vast space once you start looking at more exotic research efforts. I'm sure carbon nanotube or superconductor based computing breakthroughs would massively change the equation for example. 20 years from now, we might find ourselves in a completely new paradigm... that'd be pretty cool.
underPanther t1_jddpryu wrote
Reply to comment by andrew21w in [D] Simple Questions Thread by AutoModerator
Another reason: wide single-layer MLPs with polynomials cannot be universal. But lots of other activations do give universality with a single hidden layer.
The technical reason behind this is that non-discriminatory discriminatory activations can give universality with a single hidden layer (Cybenko 1989 is the reference).
But polynomials are not discriminatory (https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions), so they fail to reach this criterion.
Also, if you craft a multilayer percepteron with polynomials, does this offer any benefit over fitting a Taylor series directly?
[deleted] t1_jddp57b wrote
Reply to comment by sshmessiah in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
[removed]
mouldygoldie t1_jddorlu wrote
Reply to comment by brownmamba94 in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Good to hear! I admit I've not actually read the paper - I'll add it to the list and get back if I have any pointers
edthewellendowed t1_jddoq57 wrote
Reply to comment by Icko_ in [P] Open-source GPT4 & LangChain Chatbot for large PDF docs by radi-cho
Can you give me a little bit more info on this ? I'm interested but also very slow
jarmosie t1_jddmvp9 wrote
Reply to [D] Simple Questions Thread by AutoModerator
What are you some informative blogs, RSS feed or newsletter you've subscribed to for regular content on Machine Learning? In general, the Software Development community has an abundance of people maintaining high quality online content through individual blogs or newsletter.
I know there's Towards Data Science & Machine Learning Mastery to name a few but what other lesser known yet VERY informative resource did you stumble across & one which has help you further you knowledge even more?
GamerMinion t1_jddlqit wrote
Reply to comment by brownmamba94 in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Yes, theory is one thing, but you can't build ASICs for everything due to the cost involved.
Did you look into sparsity at latency-equivalent scales? i.e. same latency, bigger but sparser model.
I would be very interested to see results like that, especially for GPU-like accelerators (e.g. Nvidia's AGX computers use their ampere GPU architecture), as latency is a primary focus in high-value computer vision applications such as in autonomous driving.
ommerike t1_jddjvvn wrote
Is there an APK out there to side load? Would be fun to try on my Pixel 6 Pro without becoming an expert on how to go through the motions of the make stuff...
brownmamba94 t1_jddhxdb wrote
Reply to comment by GamerMinion in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Hi yes, this is a great question. When we say FLOP-equivalent, we're saying on an ideal hardware which can accelerate unstructured weight sparsity, the total compute-time would also be equivalent. Except, we're showing we can actually improve the accuracy of the original dense model for the same compute budget with these Sparse Iso-FLOP Transformations (e.g., Sparse Wide, Sparse Parallel, etc.).
In Section 4 of our paper, we actually make comparisons for inference and training on hardwares with and without support for sparsity acceleration.
In theory, there should be no increase in wall-clock time, but on GPUs there'd be a significant increase. However, emerging hardware accelerators like Cerebras CS-2 are doing hardware-software co-design for sparse techniques, which can allow us to take advantage of sparse acceleration during training.
passerby251 t1_jddhszk wrote
Reply to comment by nokpil in [D] ICML 2023 Reviewer-Author Discussion by zy415
Congrats! Do u mean that you didn’t receive any email notification but found them respond and change the scores?
iantimmis t1_jddhpwx wrote
[deleted] t1_jddfdkq wrote
linverlan t1_jddepw6 wrote
Reply to comment by currentscurrents in [D] Do you have a free and unlimited chat that specializes only in teaching programming or computing in general? by Carrasco_Santo
lol you got me there. Although we are probably saving some compute by not generating.
GamerMinion t1_jddeprr wrote
Reply to [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
When you say "FLOP-equivalent, does that also mean compute-time equivalent?
I ask this because on GPUs, models like EfficientNet, which technically have far less flops and parameters can be way slower than a standard ResNet of same accuracy because they're that much less efficiently parallelizable.
Did you look into inference latency on GPUs in your paper?
rikiiyer t1_jddanig wrote
Reply to comment by djmaxm in [D] Simple Questions Thread by AutoModerator
The 30B params of the model are going onto your GPUs VRAM (which should be 24GB), which is causing the issue. You can try loading the model in 8bit which could reduce size
elegantrium t1_jdd8xyw wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
I had 4,7,8 and 4's questions are orthogonal to the paper. He does not respond to our rebuttal either... What are my chances?
Smallpaul t1_jdd8q6m wrote
Reply to comment by Different_Prune_3529 in [P] Open-source GPT4 & LangChain Chatbot for large PDF docs by radi-cho
It *is* OpenAI's GPT. Through an API.
CommunismDoesntWork t1_jdd87tx wrote
Reply to comment by KerfuffleV2 in [P] New toolchain to train robust spiking NNs for mixed-signal Neuromorphic chips by FrereKhan
I haven't, that's really cool though!
KerfuffleV2 t1_jdd5b3d wrote
Reply to comment by CommunismDoesntWork in [P] New toolchain to train robust spiking NNs for mixed-signal Neuromorphic chips by FrereKhan
Have you already seen this? https://github.com/ridgerchu/SpikeGPT
dwarfarchist9001 t1_jdd33ha wrote
Reply to comment by andrew21w in [D] Simple Questions Thread by AutoModerator
Short answer: Polynomials can have very large derivatives compared to sigmoid or rectified linear functions which leads to exploding gradients.
https://en.wikipedia.org/wiki/Vanishing_gradient_problem#Recurrent_network_model
__Maximum__ t1_jdd2hjg wrote
Reply to comment by brownmamba94 in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Under which license?
brownmamba94 t1_jdd1otu wrote
Reply to comment by mouldygoldie in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Hi thank you for the feedback. This was a genuine oversight and we will correct the paper with a new acronym in the revised version of the manuscript. You can expect the changes soon. I look forward to any feedback you have on the research itself, cheers!
RedditLovingSun t1_jddyo6g wrote
Reply to [N] ChatGPT plugins by Singularian2501
I can see a future where apple and android start including apis and tools/interface for LLM models to navigate and use features of the phone, smart home appliance makers can do the same, along with certain web apps and platforms (as long as your user is authenticated). If that kind of thing takes off so businesses can say they are "GPT friendly" (same way they say "works with Alexa") or something we could see actual Jarvis level tech soon.
Imagine being able to talk to google assistant and it's actually intelligent and can operate your phone, computer, home, execute code, analyze data, and pull info from the web and your google account.
Obviously there are a lot of safety and alignment concerns that need to be thought out better first but I can't see us not doing something like that in the coming years, it would suck tho if companies got anti-competitive with it (like if google phone and home ml interfaces are kept only available to google assistant model)