currentscurrents
currentscurrents t1_j525hto wrote
Reply to comment by hapliniste in [D] is it time to investigate retrieval language models? by hapliniste
Retrieval language models do have some downsides. Keeping a copy of the training data around is suboptimal for a couple reasons:
-
Training data is huge. Retro's retrieval database is 1.75 trillion tokens. This isn't a very efficient way of storing knowledge, since a lot of the text is irrelevant or redundant.
-
Training data is still a mix of knowledge and language. You haven't achieved separation of the two types of information, so it doesn't help you perform logic on ideas and concepts.
-
Most training data is copyrighted. It's currently legal to train a model on copyrighted data, but distributing a copy of the training data with the model puts you on much less firm ground.
Ideally I think you want to condense the knowledge from the training data down into a structured representation, perhaps a knowledge graph. Knowledge graphs are easy to perform logic on and can be human-editable. There's also already an entire sub-field studying them.
currentscurrents t1_j4s2n9t wrote
Reply to comment by _Arsenie_Boca_ in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
It looks like he goes into a lot more detail on his github.
currentscurrents t1_j4rcc3e wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Interesting! I haven't heard of RWKV before.
Getting rid of attention seems like a good way to increase training speed (since training all those attention heads at once is slow), but how can it work so well without attention?
Also aren't RNNs usually slower than transformers because they can't be parallelized?
currentscurrents t1_j4jj1l6 wrote
Reply to comment by junetwentyfirst2020 in [D] What kinds of interesting models can I train with just an RTX 4080? by faker10101891
It's a little discouraging when every interesting paper has a cluster of 64 A100s in their methods section.
currentscurrents t1_j4ijvez wrote
Reply to comment by RuairiSpain in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
A Snappy Headline Is All You Need
currentscurrents t1_j4ijlqv wrote
You can fine-tune image generator models and some smaller language models.
You can also do tasks that don't require super large models, like image recognition.
>that's beyond just some toy experiment?
Don't knock toy experiments too much! I'm having a lot of fun trying to build a differentiable neural computer or memory-augmented network in pytorch.
currentscurrents t1_j4a2las wrote
Reply to comment by iamnotlefthanded666 in [D] Is MusicGPT a viable possibility? by markhachman
>Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models.
currentscurrents t1_j49x0ev wrote
Reply to comment by blueSGL in [D] Is MusicGPT a viable possibility? by markhachman
Also MeloForm, a Microsoft project that composes music using expert systems.
currentscurrents t1_j49u28o wrote
Reply to comment by mycall in [D] Is MusicGPT a viable possibility? by markhachman
I don't think it's that simple - whether or not generative AI is considered "transformative" has not yet been tested by the courts.
Until somebody actually gets sued over this and it goes to court, we don't know how the legal system is going to handle it. There is currently a lawsuit against Github Copilot, so we will probably know in the next couple years.
currentscurrents t1_j499l3p wrote
Reply to [D] Combining Machine Learning + Expert Knowledge (Question for Agriculture Research) by Tigmib
Are you trying to do research, or solve a problem? Building expert systems out of neural networks is still a new, experimental idea. If you just want to get the job done you may want to pick more proven methods.
currentscurrents t1_j490rvn wrote
Reply to comment by BarockMoebelSecond in [D] Bitter lesson 2.0? by Tea_Pearce
It's meaningful right now because there's a threshold where LLMs become awesome, but getting there requires expensive specialized GPUs.
I'm hoping in a few years consumer GPUs will have 80GB of VRAM or whatever and we'll be able to run them locally. While datacenters will still have more compute, it won't matter as much since there's a limit where larger models would require more training data than exists.
currentscurrents t1_j48csbo wrote
Reply to comment by RandomCandor in [D] Bitter lesson 2.0? by Tea_Pearce
If it is true that performance scales infinitely with compute power - and I kinda hope it is, since that would make superhuman AI achievable - datacenters will always be smarter than PCs.
That said, I'm not sure that it does scale infinitely. You need not just more compute but also more data, and there's only so much data out there. GPT-4 reportedly won't be any bigger than GPT-3 because even terabytes of scraped internet data isn't enough to train a larger model.
currentscurrents t1_j4716tp wrote
Reply to comment by ml-research in [D] Bitter lesson 2.0? by Tea_Pearce
Try to figure out systems that can generalize from smaller amounts of data? It's the big problem we all need to solve anyway.
There's a bunch of promising ideas that need more research:
- Neurosymbolic computing
- Expert systems built out of neural networks
- Memory augmented neural networks
- Differentiable neural computers
currentscurrents t1_j4702g0 wrote
Reply to comment by mugbrushteeth in [D] Bitter lesson 2.0? by Tea_Pearce
Compute is going to get cheaper over time though. My phone today has the FLOPs of a supercomputer from 1999.
Also if LLMs become the next big thing you can expect GPU manufacturers to include more VRAM and more hardware acceleration directed at them.
currentscurrents OP t1_j44ycdz wrote
Reply to comment by Farconion in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
From what I've seen, it's a promising field that should be possible. But so far but nobody's made it work for more than toy problems.
currentscurrents t1_j44pu0u wrote
Reply to I just started out guys, wish me luck by 47153
Is it though? These days it seems like even a lot of research papers are just "we stuck together a bunch of pytorch components like lego blocks" or "we fed a transformer model a bunch of data".
Math is important if you want to invent new kinds of neural networks, but for end users it doesn't seem very important.
currentscurrents OP t1_j44nngb wrote
Reply to comment by omniron in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
The paper does talk about this and calls transformers "first generation compositional systems" - but limited ones.
>Transformers, on the other hand, use graphs, which in principle can encode general, abstract structure, including webs of inter-related concepts and facts.
> However, in Transformers, a layer’s graph is defined by its data flow, yet this data flow cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used by that layer, the graph disappears. For the graph to be a bona fide encoding, carrying information to the rest of the network, it would need to be represented with an activation vector that encodes the graph’s abstract, compositionally-structured internal information.
>The technique we introduce next—NECST computing—provides exactly this type of activation vector.
They then talk about a more advanced variant called NECSTransformers, which they consider a 2nd generation compositional system. But I haven't heard of this system before and I'm not clear if it actually performs better.
currentscurrents OP t1_j43s8ki wrote
Reply to comment by Diffeologician in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
In the paper they talk about "first generation compositional systems" and I believe they would include differentiable programming in that category. It has some compositional structure, but the structure is created by the programmer.
Ideally the system would be able to create it's own arbitrarily complex structures and systems to understand abstract ideas, like humans can.
currentscurrents t1_j3jst6n wrote
Reply to comment by Immarhinocerous in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
I know there's a whole field of decision tree learning, but I'm not super up to date on it.
I assume neural networks are better or else we'd be using trees instead.
currentscurrents t1_j3fop2j wrote
Reply to comment by tdgros in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
You can represent any neural network as a decision tree, and I believe you can represent any decision tree as a series of if statements...
But the interesting bit about neural networks is the training process, automatically creating that decision tree.
currentscurrents t1_j3epeo7 wrote
Reply to comment by junetwentyfirst2020 in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
Transformers are just deep learning with attention.
And attention is just another neural network telling the first one where to look.
currentscurrents t1_j3eo4uc wrote
Reply to comment by singularpanda in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
There's plenty of work to be done in researching language models that train more efficiently or run on smaller machines.
ChatGPT is great, but it needed 600GB of training data and megawatts of power. It must be possible to do better; the average human brain runs on 12W and has seen maybe a million words tops.
currentscurrents t1_j3emas4 wrote
Reply to comment by suflaj in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
>I hate to break your bubble, but the task is also achievable even with GPT2
Is it? I would love to know how. I can run GPT2 locally, and that would be fantastic level of zero-shot learning to be able to play around with.
I have no doubt you can fine-tune GPT2 or T5 to achieve this, but in my experience they aren't nearly as promptable as GPT3/ChatGPT.
>Specifically the task you gave it is likely implicitly present in the dataset, in the sense that the dataset allowed the model to learn the connections between the words you gave it
I'm not sure what you're getting at here. It has learned the connections and meanings between words of course, that's what a language model does.
But it still followed my instructions, and it can follow a wide variety of other detailed instructions you give it. These tasks are too specific to have been in the training data; it is successfully generalizing zero-shot to new NLP tasks.
currentscurrents t1_j573tug wrote
Reply to [D] Did YouTube just add upscaling? by Avelina9X
They announced upscaling support in Chrome at CES 2023.
>The new feature will work within the Chrome and Edge browsers, and also requires an Nvidia RTX 30-series or 40-series GPU to function. Nvidia didn't specify what exactly is required from those two GPU generations to get the new upscaling feature working, nor if there's any sort of performance impact, but at least this isn't a 40-series only feature.
Interesting though that it's working with your GTX 1660 Ti. Maybe Chrome is implementing a simpler upscaler as a fallback for older GPUs?
Check your chrome://flags for anything that looks related.