currentscurrents t1_j3eiw5w wrote on January 8, 2023 at 12:19 AM

Reply to comment by suflaj in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda

I think you're missing some of the depth of what it's capable of. You can "program" it to do new tasks just by explaining in plain english, or by providing examples. For example many people are using it to generate prompts for image generators:

>I want you to act as a prompt creator for an AI image generator.

>Prompts are descriptions of artistic images than include visual adjectives and art styles or artist names. The image generator can understand complex ideas, so use detailed language and describe emotions or feelings in detail. Use terse words separated by commas, and make short descriptions that are efficient in word use.

>With each image, include detailed descriptions of the art style, using the names of artists known for that style. I may provide a general style with the prompt, which you will expand into detail. For example if I ask for an "abstract style", you would include "style of Picasso, abstract brushstrokes, oil painting, cubism"

>Please create 5 prompts for an mob of grandmas with guns. Use a fantasy digital painting style.

This is a complex and poorly-defined task, and it certainly was not trained on this since the training stops in 2021. But the resulting output is exactly what I wanted:

>An army of grandmas charging towards the viewer, their guns glowing with otherworldly energy. Style of Syd Mead, futuristic landscapes, sleek design, fantasy digital painting.

Once I copy-pasted it into an image generator it created a very nice image.

I think we're going to see a lot more use of language models for controlling computers to do complex tasks.

currentscurrents OP t1_j39hde8 wrote on January 6, 2023 at 11:24 PM

Reply to comment by visarga in [D] Special-purpose "neuromorphic" chips for AI - current state of the art? by currentscurrents

Not bad for a milliwatt of power though - an arduino idles at about 15 milliwatts.

I could see running pattern recognition in a battery-powered sensor or something.

currentscurrents OP t1_j34uma6 wrote on January 6, 2023 at 1:22 AM

Reply to comment by IntelArtiGen in [D] Special-purpose "neuromorphic" chips for AI - current state of the art? by currentscurrents

>That alone I doubt it, even if it could theoretically reproduce how the brain works with the same power efficiency it doesn't mean you would have the algorithm to efficiently use this hardware.

I meant just in terms of compute efficiency, using the same kind of algorithms we use now. It's clear they won't magically give you AGI, but Innatera claims 10000x lower power usage with their chip.

This makes sense to me; instead of emulating a neural network using math, you're building a physical model of one on silicon. Plus, SNNs are very sparse and an analog one would only use power when firing.

>Usual ANNs are designed for current tasks and current tasks are often designed for usual ANNs. It's easier to use the same datasets but I don't think the point of SNNs is just to try to perform better on these datasets but rather to try more innovative approaches on some specific datasets.

I feel like a lot of SNN research is motivated by understanding the brain rather than being the best possible AI. It also seems harder to get traditional forms of data into and out of the network, like you have to convert images into spike timings - for which there are several methods each with downsides and upsides.

currentscurrents t1_j338km2 wrote on January 5, 2023 at 7:21 PM

Reply to comment by C_Hawk14 in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

Good question.

Unfortunately, I have no clue what makes "good" art either. This is a pretty old problem that may not be solvable.

currentscurrents t1_j2zidye wrote on January 5, 2023 at 12:48 AM

Reply to comment by C_Hawk14 in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

I think that actually measures how good it is at getting popular on social media, which is not the same task as making good art.

There's also some backlash against AI art right now, so this might favor models that can't be distinguished from human art rather than models that are better than human art.

currentscurrents t1_j2yp98t wrote on January 4, 2023 at 9:38 PM

Reply to comment by Red-Portal in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

For some tasks it seems hard to even define the question. What would it even mean to have superhuman performance at art?

currentscurrents t1_j2uxtmv wrote on January 4, 2023 at 2:56 AM

Reply to comment by SnooHesitations8849 in [R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by zveroboy152

Yeah... it's no A100, but it's on par with the high-end gamer cards of today. For much less money.

currentscurrents t1_j2uwlrh wrote on January 4, 2023 at 2:47 AM

Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

I think interpretability will help us build better models too. For example, in this paper they deeply analyzed a model trained to do a toy problem - addition mod 113.

They found that it was actually working by doing a Discrete Fourier Transform to turn the numbers into sinewaves. Sinewaves are great for gradient descent because they're easily differentiable (unlike modular addition on the natural numbers, which is not differentiable), and if you choose the right frequency it'll repeat every 113 numbers. The modular addition algorithm worked by doing a bunch of addition and multiplication operations on these sinewaves, which gave the same result as modular addition.

This lets you answer an important question; why wasn't the network generalizable to other bases other than mod 113? Well, the frequency of the sinewaves was hardcoded into the network, so it couldn't work for any other bases.

The opens the possibility to do neural network surgery, and change the frequency to work with any base.

currentscurrents t1_j2uvlg2 wrote on January 4, 2023 at 2:40 AM

Reply to comment by SatoshiNotMe in RIFFUSION real time AI music generation with stable diffusion , Text to Music AI [R] by [deleted]

I'm impressed that it worked at all, with how different spectrograms are from natural images.

currentscurrents t1_j2uul25 wrote on January 4, 2023 at 2:32 AM

Reply to [R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by zveroboy152

Interesting how that card went from $15k to $100 in the space of five years.

I'm holding out hope the A100 will do the same once it's a couple generations old.

currentscurrents t1_j2tuq1a wrote on January 3, 2023 at 10:25 PM

Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

There's a lot of old ideas that are a ton more useful now that we have more compute in one GPU than in their biggest supercomputers.

currentscurrents t1_j2trd40 wrote on January 3, 2023 at 10:04 PM

Reply to comment by artsybashev in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

If only it could run on a card that doesn't cost as much as a car.

I wonder if we will eventually hit a wall where more compute is required for further improvement, and we can only wait for GPU manufacturers. Similar to how they could never have created these language models in the 80s, no matter how clever their algorithms - they just didn't have enough compute power, memory, or the internet to use as a dataset.

currentscurrents t1_j2srptn wrote on January 3, 2023 at 6:27 PM

Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

I've seen other research that pruning as a continual process during training can actually improve performance. Which is interesting since that is what the brain does.

currentscurrents OP t1_j2k1t74 wrote on January 1, 2023 at 10:32 PM

Reply to comment by keepthepace in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting! I have heard about these, but it doesn't look like there's been much work on them in the last few years - it's mostly 2014-2018 papers.

currentscurrents OP t1_j2hdtor wrote on January 1, 2023 at 8:12 AM

Reply to comment by abc220022 in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting! That's a very different way to implement modular addition, but it makes sense for the network to do it that way.

currentscurrents OP t1_j2hdsvv wrote on January 1, 2023 at 8:12 AM

Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Someone else posted this example, which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able to determine the algorithm it used to compute it.

>The algorithm learned to do modular addition can be fully reverse engineered. The algorithm is roughly:

>Map inputs x,y→ cos(wx),cos(wy),sin(wx),sin(wy) with a Discrete Fourier Transform, for some frequency w.

>Multiply and rearrange to get cos(w(x+y))=cos(wx)cos(wy)−sin(wx)sin(wy) and sin(w(x+y))=cos(wx)sin(wy)+sin(wx)cos(wy)

>By choosing a frequency w=2πnk we get period dividing n, so this is a function of x + y (mod n)

>Map to the output logits z with cos(w(x+y))cos(wz)+sin(w(x+y))sin(wz)=cos(w(x+y−z)) - this has the highest logit at z≡x+y(mod n), so softmax gives the right answer.

>To emphasise, this algorithm was purely learned by gradient descent! I did not predict or understand this algorithm in advance and did nothing to encourage the model to learn this way of doing modular addition. I only discovered it by reverse engineering the weights.

This is a very different way to do modular addition, but it makes sense for the network. Sine/cosine functions represent waves that repeat every frequency, so if you choose the right frequency you can implement the non-differentiable modular addition function just working with differentiable functions.

Extracting this algorithm is useful for generalization; while the original network only worked for mod 113, with the algorithm we can plug in any value for the frequency. Of course this is a toy example and there are much faster ways to do modular addition, but maybe it could work for more complex problems too.

currentscurrents OP t1_j2h21zn wrote on January 1, 2023 at 5:45 AM

Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

>For example, a sort function is necessarily general over the entire domain of the entries for which it is valid, whereas a neural network will only approximate a function over the subfield of the domain for which it was trained, all bets are off elsewhere; it doesn't generalize.

But you can teach neural networks to do things like solve arbitrary mazes. Isn't that pretty algorithmic?

currentscurrents OP t1_j2gfvev wrote on January 1, 2023 at 2:20 AM

Reply to comment by enzlbtyn in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Nice, this is the kind of thing I'm looking for! Their approach is different (training a neural network to write programs to solve a task, instead of converting a network that's been trained for the task) but I think it may be equivalent.

currentscurrents OP t1_j2gdumb wrote on January 1, 2023 at 2:04 AM

Reply to comment by RandomIsAMyth in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

By classical algorithm, I mean something that doesn't use a neural network. Traditional programming and neural networks are two very different ways to solve problems, but they can solve many of the same problems.

That sounds like a translation problem, which neural networks are good at. Just like in translation, it would have to understand the higher-level idea behind the implementation.

It's like text-to-code, but network-to-code instead.

currentscurrents OP t1_j2gctk4 wrote on January 1, 2023 at 1:55 AM

Reply to comment by Dylan_TMB in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting!

This feels like it falls under emulating a neural network, since you've done equivalent computations - just in a different form.

I wonder if you could train a neural network with the objective of creating the minimal decision tree.

currentscurrents OP t1_j2g9mvy wrote on January 1, 2023 at 1:30 AM

Reply to comment by Dylan_TMB in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Thanks, that is the question I'm trying to ask! I know explainability is a bit of a dead-end field right now so it's a hard problem.

An approximate or incomprehensible algorithm could still be useful if it's faster or uses less memory. But I think to accomplish that you would need to convert it into higher-level ideas; otherwise you're just emulating the network.

Luckily neural networks are capable of converting things into higher-level ideas? It doesn't seem fundamentally impossible.

currentscurrents OP t1_j2g5x92 wrote on January 1, 2023 at 1:02 AM

Reply to comment by On_Mt_Vesuvius in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Thanks for the link, that's good to know about!

But maybe I should have titled this differently. I'm interested in taking a network that solves a problem networks are good at, and converting it into a code representation as a way to speed it up. Like translating between between the two different forms of computation.

currentscurrents t1_j2fduvv wrote on December 31, 2022 at 9:31 PM

Reply to comment by Longjumping_Essay498 in [Discussion] is attention an explanation? by Longjumping_Essay498

You can get some information this way, but not everything you would want to know. You can try it yourself with BertViz.

The information you do get can be useful though. For example in image processing, you can use the attention map from an object classifier to see where the object is in the image.