Recent comments in /f/MachineLearning
simpleuserhere OP t1_jcrfjsh wrote
Reply to comment by schorhr in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Thanks,What error are you getting? With Vs compiler and cmake we can easily build it.
simpleuserhere OP t1_jcreufr wrote
Reply to comment by Pale-Dentist330 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Hi, please check this branch https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
kross00 t1_jcre2hi wrote
Reply to [P] The next generation of Stanford Alpaca by [deleted]
I'm a newbie... but maybe take a look at this model: https://github.com/BlinkDL/RWKV-LM
marcus_hk t1_jcrdufd wrote
Reply to comment by race2tb in [P] Web Stable Diffusion by crowwork
For weights, yes, and for inference. If you can decompose and distribute a model across enough nodes, then you can get meaningful compute out of CPUs too — for instance for tokenization and smaller models.
starstruckmon t1_jcrbf0m wrote
Reply to comment by timedacorn369 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
You can see some benchmarks here
Pale-Dentist330 t1_jcr3e9m wrote
Can you add the steps here?
VelvetyPenus t1_jcr1usl wrote
Reply to [D] LLama model 65B - pay per prompt by MBle
Wait two weeks, it will all be free.
[deleted] t1_jcr18u4 wrote
Reply to comment by timedacorn369 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
[deleted]
NotARedditUser3 t1_jcr0vsb wrote
Reply to comment by CommunicationLocal78 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
You'd know exactly how you were wrong if those topics weren't forbidden and you'd actually heard about them
currentscurrents t1_jcqzjil wrote
Reply to [D] LLama model 65B - pay per prompt by MBle
I haven't heard of anybody running LLama as a paid API service. I think doing so might violate the license terms against commercial use.
>(or any other) model
OpenAI has a ChatGPT API that costs pennies per request. Anthropic also recently announced one for their Claude language model but I have not tried it.
the320x200 t1_jcqxqrs wrote
Reply to comment by CommunicationLocal78 in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
Please... That's ridiculous. Name one historical event people in the west are afraid to even admit to knowing about in public.
schorhr t1_jcqwzek wrote
Reply to comment by simpleuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
That's amazing!
Thank you for that link. With my old laptop and slow internet connection I'm struggling downloading visual studio and getting everything to work. I do have weights but still figuring out why building fails. Is there any way to download a prebuilt version?
CommunicationLocal78 t1_jcqw9zq wrote
Reply to comment by BalorNG in [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
There are a lot fewer forbidden topics in China than in the West.
theAbominablySlowMan t1_jcqlq7g wrote
bash a big ole data set through as an integration test and call it a done job. in my experience, DS moves too fast for testing to be as effective as for SWEs (no matter how carefully I've written my tests, they've never lasted more than 12 months before becoming a nuisance that people started ignoring).
[deleted] t1_jcqj70f wrote
Reply to comment by timedacorn369 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
[removed]
Sad-Comedian-711 t1_jcqgv1x wrote
Reply to comment by super_deap in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
This approach has been shown to work. Longformer even provided a script that did this for you: https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb
I think for flash attention you do not want to use Longformer's attention though, you want to use Big Bird's with specific block sizes or something like that.
BalorNG t1_jcqgc4x wrote
Reply to [R] ChatGLM-6B - an open source 6.2 billion parameter Eng/Chinese bilingual LLM trained on 1T tokens, supplemented by supervised fine-tuning, feedback bootstrap, and RLHF. Runs on consumer grade GPUs by MysteryInc152
I has 6b parameters, but I bet it cannot answer what has happened on Tiananmen square in 1989 :3
timedacorn369 t1_jcqg4v6 wrote
Reply to comment by simpleuserhere in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
What is the performance hit with various levels of quantization??
spadel_ t1_jcqdxi4 wrote
I went into a quant research position at a prop trading firm and am very happy about that decision. While unfortunately I have not been using any deep learning so far, it is a lot of stats and machine learning. Also there are some interesting applications of physics informed neural networks, which I want to look into at some point. It is definitely fun to work on problems now that just need to be solved in a creative way instead of continuously having to come up with new research ideas.
127-0-0-1_1 t1_jcqd8se wrote
Reply to comment by KerfuffleV2 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
It's not unlimited memory in a single run, which remains unchanged, but that doesn't seem super relevant to what people want (nothing wrong with multiple runs!). Think about a turing machine, or heck, yourself. A turing machine only has access to a single cell of memory at at time, and in practice, modern CPUs only have access to their registers directly. For long term storage, that goes into RAM, which is accessed on demand.
Similarly, your own memory is not large enough to contain all the information you'd need to complete most complex tasks. That's why you have to write things down and actively try to remember things.
While that uses OpenAI's embedding networks, like the autoregressive LLM itself, it's not like OpenAI has a monopoly on text embeddings by any means (far from it - embeddings have a very straightforward business use and are used in practically any major site you know for things like similarity queries).
While I think OP is overhyping the degree to which this is "infinite memory" yet, in a hypothetical turing machine formulation where the network can more proactively store and restore memory, it would allow for it to be, at least, turing complete.
light24bulbs t1_jcqco8i wrote
Reply to comment by Prymu in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
Old reddit does too
marcus_hk t1_jcrgwqm wrote
Reply to [P] Web Stable Diffusion by crowwork
Just browsing on my phone and haven’t dug deep yet, but in the notebook it says that build.py targets M2 by default but can also target CUDA. What about CPU?
I’d love to see a super minimal example, like running a small nn.Linear layer, for pedagogical purposes and to abstract away the complexity of a larger model like Stable Diffusion.