currentscurrents
currentscurrents t1_jebm09t wrote
Reply to comment by tvetus in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
No, do you have a link?
currentscurrents t1_jeb4shv wrote
Reply to comment by DigThatData in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Yeah, I think that's why they're starting with whales - they're an easy subject since their vocalizations can be heard through the water from miles away. They also seem to have a fairly complex vocal language, unlike for example songbirds with memorized mating calls.
currentscurrents t1_jean0il wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Other researchers are working on an LLM for whales.
Looks feasible to me, whale calls are no more alien to the computer than English is. The hard part is collecting enough data.
currentscurrents OP t1_je83z1p wrote
Reply to comment by midasp in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
I think these are all ideas from the internet, but it did understand that they would be appropriate for the task of making jeans useful on mars.
It seems to have understood the instructions and then pulled relevant information out of its associative memory to build the response.
currentscurrents OP t1_je7faup wrote
Reply to comment by throwaway957280 in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.
I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.
currentscurrents t1_je7c29r wrote
The architecture probably isn't the problem. You only have 100 images, that's your problem.
If you can't get more labeled data, you should pretrain on unlabeled data that's as close as possible to your task - preferably other dental x-rays. Then you can finetune on your real dataset.
currentscurrents OP t1_je631oa wrote
TL;DR:
- 
This is a survey paper. The authors summarize a variety of arguments about whether or not LLMs truly "understand" what they're learning. 
- 
The major argument in favor of understanding is that LLMs are able to complete many real and useful tasks that seem to require understanding. 
- 
The major argument against understanding is that LLMs are brittle in non-human ways, especially to small changes in their inputs. They also don't have a real-world experience to ground their knowledge in (although multimodal LLMs may change this). 
- 
A key issue is that no one has a solid definition of "understanding" in the first place. It's not clear how you would test for it. Tests intended for humans don't necessarily test understanding in LLMs. 
I tend to agree with their closing summary. LLMs likely have a type of understanding, and humans have a different type of understanding.
>It could thus be argued that in recent years the field of AI has created machines with new modes of understanding, most likely new species in a larger zoo of related concepts, that will continue to be enriched as we make progress in our pursuit of the elusive nature of intelligence.
Submitted by currentscurrents t3_125uxab in MachineLearning
currentscurrents t1_je34ui9 wrote
This is code for running the LLaMa model, sort of like llama.cpp.
It's a reimplementation of Facebook's original GPL-licensed open source client under a more permissive Apache license. The GPL requires all your other code to also be GPL, so you can't use it in closed-source projects.
This doesn't affect the license for the model weights, which you will still have to download from somewhere else.
currentscurrents t1_je1cjaq wrote
Reply to "[D]" Is wandb.ai worth using? by frodo_mavinchotil
Wandb can be run locally. There is also tensorboard.
currentscurrents t1_je1ai1i wrote
Reply to comment by nixed9 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
I asked it for a parody and got something similar to, but different from Weird Al's song: https://pastebin.com/FKrZiEi9
When I asked it to be original I got quite different lyrics: https://pastebin.com/uwpqAnyz
Here's the actual lyrics for reference. This reminds me of how you can get LLMs to be less toxic/biased just by telling them to treat people fairly.
currentscurrents t1_je17y5v wrote
Reply to comment by hardmaru in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
>Why are deep learning technologists so overconfident
>A Narayanan, S Kapoor
>Substack newsletter. AI Snake Oil
You can get your blogposts listed on Google Scholar?
currentscurrents t1_je15i85 wrote
Reply to comment by krali_ in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
That's still on a waitlist unfortunately.
GPT-4 is good but slow, at least for now I mostly still use the GPT-3.5 model.
currentscurrents t1_je14pi5 wrote
Reply to comment by cegras in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Clearly, the accuracy is going to have to get better before it can replace Google. It's pretty accurate when it knows what it's talking about, but if you go "out of bounds" the accuracy drops off a cliff without warning.
But the upside is that it can integrate information from multiple sources and you can interactively ask it questions. Google can't do that.
currentscurrents t1_je13kdr wrote
Reply to comment by londons_explorer in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
True! But also, problems in general are never 100% novel. That's why metalearning works.
You can make up for poor reasoning abilities with lots of experience. This isn't bad exactly, but it makes testing their reasoning abilities tricky.
currentscurrents t1_je12d3k wrote
Reply to comment by ianitic in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Nobody knows exactly what it was trained on, but there exist several datasets of published books.
>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.
They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.
currentscurrents t1_jdy3q2b wrote
Reply to comment by VectorSpaceModel in [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680
Overwhelming majority of ML research comes out of US and China because that's where the big tech companies are.
currentscurrents t1_jdvxu6g wrote
Reply to comment by s0n0fagun in [D] Can we train a decompiler? by vintergroena
Those languages don't compile to machine code, they compile to a special bytecode that runs in a VM.
currentscurrents t1_jdvxga6 wrote
Reply to comment by ultraminxx in [D] Can we train a decompiler? by vintergroena
Possibly! But it also seems like a good sequence-to-sequence translation problem, just line up the two streams of tokens and let the model figure it out.
currentscurrents t1_jdu3vwq wrote
Reply to comment by Smallpaul in [D] Can we train a decompiler? by vintergroena
Yeah, but they're hand-crafted algorithms and produce code that's hard to read.
currentscurrents t1_jdrt3gv wrote
Reply to comment by liqui_date_me in [D] GPT4 and coding problems by enryu42
I think all tests designed for humans are worthless here.
They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.
currentscurrents t1_jdrpl3u wrote
Reply to [D] GPT4 and coding problems by enryu42
I'm not really surprised. Anybody who's extensively used one of these tools has probably already run into their reasoning limitations.
Today's entire crop of self-supervised models can learn complex ideas, but they have a hard time manipulating them in complex ways. They can do a few operations on ideas (style transfer, translation, etc) but high-level reasoning involves many more operations that nobody understands yet.
But hey, at least there will still be problems left to solve by the time I graduate!
currentscurrents t1_jdn7spo wrote
Reply to comment by pornthrowaway42069l in [N] GPT-4 has 1 trillion parameters by mrx-ai
Bigger models are more sample efficient for a given amount of data.
Scale is a triangle of three factors; model size, data size, and compute size. If you want to make more efficient use of data, you need to increase the other two.
In practice LLMs are not data limited right now, they're limited by compute and model size. Which is why you see models like LLaMa that throw huge amounts of data at smaller models.
currentscurrents t1_jdn0opn wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:
>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.
Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.
currentscurrents t1_jefj08w wrote
Reply to [D] [R] On what kind of machine does Midjorney, the art generating AI, runs on? by shn29
Definitely a GPU farm, probably NVidia A100s, but nobody knows for sure because it's closed source.
If you want to run an image generator locally, head over to /r/StableDiffusion