Recent comments in /f/MachineLearning

marcus_hk t1_jcrgwqm wrote

Just browsing on my phone and haven’t dug deep yet, but in the notebook it says that build.py targets M2 by default but can also target CUDA. What about CPU?

I’d love to see a super minimal example, like running a small nn.Linear layer, for pedagogical purposes and to abstract away the complexity of a larger model like Stable Diffusion.

1

marcus_hk t1_jcrdufd wrote

Reply to comment by race2tb in [P] Web Stable Diffusion by crowwork

For weights, yes, and for inference. If you can decompose and distribute a model across enough nodes, then you can get meaningful compute out of CPUs too — for instance for tokenization and smaller models.

1

currentscurrents t1_jcqzjil wrote

I haven't heard of anybody running LLama as a paid API service. I think doing so might violate the license terms against commercial use.

>(or any other) model

OpenAI has a ChatGPT API that costs pennies per request. Anthropic also recently announced one for their Claude language model but I have not tried it.

5

theAbominablySlowMan t1_jcqlq7g wrote

bash a big ole data set through as an integration test and call it a done job. in my experience, DS moves too fast for testing to be as effective as for SWEs (no matter how carefully I've written my tests, they've never lasted more than 12 months before becoming a nuisance that people started ignoring).

−1

Sad-Comedian-711 t1_jcqgv1x wrote

This approach has been shown to work. Longformer even provided a script that did this for you: https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb

I think for flash attention you do not want to use Longformer's attention though, you want to use Big Bird's with specific block sizes or something like that.

1

spadel_ t1_jcqdxi4 wrote

I went into a quant research position at a prop trading firm and am very happy about that decision. While unfortunately I have not been using any deep learning so far, it is a lot of stats and machine learning. Also there are some interesting applications of physics informed neural networks, which I want to look into at some point. It is definitely fun to work on problems now that just need to be solved in a creative way instead of continuously having to come up with new research ideas.

2

127-0-0-1_1 t1_jcqd8se wrote

It's not unlimited memory in a single run, which remains unchanged, but that doesn't seem super relevant to what people want (nothing wrong with multiple runs!). Think about a turing machine, or heck, yourself. A turing machine only has access to a single cell of memory at at time, and in practice, modern CPUs only have access to their registers directly. For long term storage, that goes into RAM, which is accessed on demand.

Similarly, your own memory is not large enough to contain all the information you'd need to complete most complex tasks. That's why you have to write things down and actively try to remember things.

While that uses OpenAI's embedding networks, like the autoregressive LLM itself, it's not like OpenAI has a monopoly on text embeddings by any means (far from it - embeddings have a very straightforward business use and are used in practically any major site you know for things like similarity queries).

While I think OP is overhyping the degree to which this is "infinite memory" yet, in a hypothetical turing machine formulation where the network can more proactively store and restore memory, it would allow for it to be, at least, turing complete.

1