currentscurrents

currentscurrents t1_j3eiw5w wrote

I think you're missing some of the depth of what it's capable of. You can "program" it to do new tasks just by explaining in plain english, or by providing examples. For example many people are using it to generate prompts for image generators:

>I want you to act as a prompt creator for an AI image generator.

>Prompts are descriptions of artistic images than include visual adjectives and art styles or artist names. The image generator can understand complex ideas, so use detailed language and describe emotions or feelings in detail. Use terse words separated by commas, and make short descriptions that are efficient in word use.

>With each image, include detailed descriptions of the art style, using the names of artists known for that style. I may provide a general style with the prompt, which you will expand into detail. For example if I ask for an "abstract style", you would include "style of Picasso, abstract brushstrokes, oil painting, cubism"

>Please create 5 prompts for an mob of grandmas with guns. Use a fantasy digital painting style.

This is a complex and poorly-defined task, and it certainly was not trained on this since the training stops in 2021. But the resulting output is exactly what I wanted:

>An army of grandmas charging towards the viewer, their guns glowing with otherworldly energy. Style of Syd Mead, futuristic landscapes, sleek design, fantasy digital painting.

Once I copy-pasted it into an image generator it created a very nice image.

I think we're going to see a lot more use of language models for controlling computers to do complex tasks.

0

currentscurrents OP t1_j34uma6 wrote

>That alone I doubt it, even if it could theoretically reproduce how the brain works with the same power efficiency it doesn't mean you would have the algorithm to efficiently use this hardware.

I meant just in terms of compute efficiency, using the same kind of algorithms we use now. It's clear they won't magically give you AGI, but Innatera claims 10000x lower power usage with their chip.

This makes sense to me; instead of emulating a neural network using math, you're building a physical model of one on silicon. Plus, SNNs are very sparse and an analog one would only use power when firing.

>Usual ANNs are designed for current tasks and current tasks are often designed for usual ANNs. It's easier to use the same datasets but I don't think the point of SNNs is just to try to perform better on these datasets but rather to try more innovative approaches on some specific datasets.

I feel like a lot of SNN research is motivated by understanding the brain rather than being the best possible AI. It also seems harder to get traditional forms of data into and out of the network, like you have to convert images into spike timings - for which there are several methods each with downsides and upsides.

2

currentscurrents t1_j2zidye wrote

I think that actually measures how good it is at getting popular on social media, which is not the same task as making good art.

There's also some backlash against AI art right now, so this might favor models that can't be distinguished from human art rather than models that are better than human art.

8

currentscurrents t1_j2uwlrh wrote

I think interpretability will help us build better models too. For example, in this paper they deeply analyzed a model trained to do a toy problem - addition mod 113.

They found that it was actually working by doing a Discrete Fourier Transform to turn the numbers into sinewaves. Sinewaves are great for gradient descent because they're easily differentiable (unlike modular addition on the natural numbers, which is not differentiable), and if you choose the right frequency it'll repeat every 113 numbers. The modular addition algorithm worked by doing a bunch of addition and multiplication operations on these sinewaves, which gave the same result as modular addition.

This lets you answer an important question; why wasn't the network generalizable to other bases other than mod 113? Well, the frequency of the sinewaves was hardcoded into the network, so it couldn't work for any other bases.

The opens the possibility to do neural network surgery, and change the frequency to work with any base.

34

currentscurrents t1_j2trd40 wrote

If only it could run on a card that doesn't cost as much as a car.

I wonder if we will eventually hit a wall where more compute is required for further improvement, and we can only wait for GPU manufacturers. Similar to how they could never have created these language models in the 80s, no matter how clever their algorithms - they just didn't have enough compute power, memory, or the internet to use as a dataset.

5

currentscurrents OP t1_j2hdsvv wrote

Someone else posted this example, which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able to determine the algorithm it used to compute it.

>The algorithm learned to do modular addition can be fully reverse engineered. The algorithm is roughly:

>Map inputs x,y→ cos(wx),cos(wy),sin(wx),sin(wy) with a Discrete Fourier Transform, for some frequency w.

>Multiply and rearrange to get cos(w(x+y))=cos(wx)cos(wy)−sin(wx)sin(wy) and sin(w(x+y))=cos(wx)sin(wy)+sin(wx)cos(wy)

>By choosing a frequency w=2πnk we get period dividing n, so this is a function of x + y (mod n)

>Map to the output logits z with cos(w(x+y))cos(wz)+sin(w(x+y))sin(wz)=cos(w(x+y−z)) - this has the highest logit at z≡x+y(mod n), so softmax gives the right answer.

>To emphasise, this algorithm was purely learned by gradient descent! I did not predict or understand this algorithm in advance and did nothing to encourage the model to learn this way of doing modular addition. I only discovered it by reverse engineering the weights.

This is a very different way to do modular addition, but it makes sense for the network. Sine/cosine functions represent waves that repeat every frequency, so if you choose the right frequency you can implement the non-differentiable modular addition function just working with differentiable functions.

Extracting this algorithm is useful for generalization; while the original network only worked for mod 113, with the algorithm we can plug in any value for the frequency. Of course this is a toy example and there are much faster ways to do modular addition, but maybe it could work for more complex problems too.

6

currentscurrents OP t1_j2h21zn wrote

>For example, a sort function is necessarily general over the entire domain of the entries for which it is valid, whereas a neural network will only approximate a function over the subfield of the domain for which it was trained, all bets are off elsewhere; it doesn't generalize.

But you can teach neural networks to do things like solve arbitrary mazes. Isn't that pretty algorithmic?

1

currentscurrents OP t1_j2gdumb wrote

By classical algorithm, I mean something that doesn't use a neural network. Traditional programming and neural networks are two very different ways to solve problems, but they can solve many of the same problems.

That sounds like a translation problem, which neural networks are good at. Just like in translation, it would have to understand the higher-level idea behind the implementation.

It's like text-to-code, but network-to-code instead.

3

currentscurrents OP t1_j2g9mvy wrote

Thanks, that is the question I'm trying to ask! I know explainability is a bit of a dead-end field right now so it's a hard problem.

An approximate or incomprehensible algorithm could still be useful if it's faster or uses less memory. But I think to accomplish that you would need to convert it into higher-level ideas; otherwise you're just emulating the network.

Luckily neural networks are capable of converting things into higher-level ideas? It doesn't seem fundamentally impossible.

3

currentscurrents OP t1_j2g5x92 wrote

Thanks for the link, that's good to know about!

But maybe I should have titled this differently. I'm interested in taking a network that solves a problem networks are good at, and converting it into a code representation as a way to speed it up. Like translating between between the two different forms of computation.

1