LowPressureUsername t1_jdq0nsn wrote on March 26, 2023 at 7:58 AM

Reply to comment by yaru22 in [D] Simple Questions Thread by AutoModerator

It’s mostly computational power available AFAIK. More context = more tokens = more processing power required.

DigThatData t1_jdpza0l wrote on March 26, 2023 at 7:38 AM

Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

it's an RNN

karius85 t1_jdpz4k7 wrote on March 26, 2023 at 7:36 AM

Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat

Pytorch Lightning is a simpler alternative to PyTorch and Kornia has a lot of standard scikit-image/opencv implementations for PyTorch on GPU, many of which support autograd. Webdataset is scheduled for inclusion into PyTorch, and uses sharded tar files with label data, but is currently underdocumented to some extent.

[deleted] t1_jdpz1d5 wrote on March 26, 2023 at 7:34 AM

Reply to comment by maizeq in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

[deleted]

Poseidon_22 t1_jdpyo9u wrote on March 26, 2023 at 7:29 AM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Apparently, for linear improvement in accuracy, we would need exponentially more parameters. Gpt-4 with more than 1 trillion parameters would need to be trained on 6,700gpus for a whole year!

michaelthwan_ai OP t1_jdpyb80 wrote on March 26, 2023 at 7:24 AM

Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

added in backlog. Need some time to study. Thanks.

michaelthwan_ai OP t1_jdpy7yf wrote on March 26, 2023 at 7:22 AM

Reply to comment by StellaAthena in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

I may include BARD if it is fully released.

So LAMDA->BARD (maybe). But it is still in alpha/beta.

michaelthwan_ai OP t1_jdpy5dy wrote on March 26, 2023 at 7:21 AM

Reply to comment by tonicinhibition in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Thanks for the sharing above!

My choice is yk - Yannic Kilcher. Some "AI News" videos is a brief introduction and he sometimes go through certain papers in details. Very insightful!

michaelthwan_ai OP t1_jdpy0l2 wrote on March 26, 2023 at 7:19 AM

Reply to comment by DigThatData in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

added, thank you.

michaelthwan_ai OP t1_jdpy06p wrote on March 26, 2023 at 7:19 AM

Reply to comment by philipgutjahr in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

I only include recent LLM (Feb/Mar 2023) (that is the LLMs usually at the bottom) and 2-factor predecessors (parent/grandparent). See if your mentioned one is related to them.

michaelthwan_ai OP t1_jdpxuzs wrote on March 26, 2023 at 7:17 AM

Reply to comment by Small-Fall-6500 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Chatgpt-like github -> added most and so is in TODO (e.g. palm)

RWKV -> added in backlog

michaelthwan_ai OP t1_jdpxtp1 wrote on March 26, 2023 at 7:17 AM

Reply to comment by philipgutjahr in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

Open alternative -> added most and so is in TODO (e.g. palm)
OpenChatKit -> added
Instruct-GPT -> seems it's not a released model but plan.

elnaqnely t1_jdpty1s wrote on March 26, 2023 at 6:22 AM

Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat

> accelerate the image processing tasks using a GPU

You can find some working code to do simple manipulations of images (scaling, flipping, cropping) on a GPU. Search for "gpu image augmentation".

> image dataset as some form of a database

With millions of images, the metadata alone may be difficult to navigate. I recommend storing the images/metadata on a good SSD (plus a backup), with the metadata in Parquet format, partitioned by categories that are meaningful to you. That will allow the metadata to be efficiently queried using Arrow or Spark, both of which have Python wrappers (pyarrow, pyspark).

For the images themselves, store them in a similar nested directory structure to match the metadata. This means your images will be grouped by the same meaningful attributes you chose to partition the metadata. Also, this will hopefully keep the number of images per directory from becoming too large. Doing that will allow you to browse thumbnails using whatever file browser comes with your operating system. To rapidly page through thousands of images, I found that the default Ubuntu image viewer, Eye of Gnome, works really well.

MrFlufypants t1_jdpt0vm wrote on March 26, 2023 at 6:10 AM

Reply to [D] Keeping track of ML advancements by Anis_Mekacher

We do a journal series at work. Rule is every engineer has to do one before we get to do another one. Gives presenting skills and forces us to hear new stuff since we all have different preferences.

Big issue is that recently many of the coolest advancements have been by Facebook, openai, and google and they are increasingly releasing “reports” instead of “papers”. We are getting a lot more “And then they did this incredibly revolutionary thing but only said they used a ‘model’”. They aren’t giving details because they want to keep their work private. Big bummer.

I also read any papers that make the top of this sub, and I’ll usually read a couple of the best performing papers from the big conferences

sam__izdat t1_jdps8rk wrote on March 26, 2023 at 6:00 AM

Reply to comment by Ilforte in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

>Why would that be a disingenuous definition?

Doesn't matter if it's disingenuous. What it's implying is ridiculous. It would be more surprising if the linear regression model didn't work at all. The fact that it can correlate fMRI data better than random doesn't mean you've replicated how language works in the brain, let alone how it's acquired.

> In general, your defense of generative linguistics is very weak. It's just invective and strawmen, and it reeks of desperation.

I don't have any horse in the race or anything to be desperate about. It's just an astonishingly stupid proposition.

I should say, I am not qualified to defend or refute generative linguistics (though that clearly was no obstance for the author), and I don't know anything about it. I do feel qualified (because I can read and check sources) to dismiss this embarrassing pile of nonsense, though, as it's just so plainly nonsense that it doesn't take an expert to dismiss its bombastic claims as pseudoscience -- and I'm talking about Piantadosi here and not his references, which, for all I know, are serious research misrepresented by a dunce. I'm not in academia and I don't feel the need to be any more diplomatic about this than he was toward linguists in his pdf-format blog post.

PilotThen t1_jdppmpl wrote on March 26, 2023 at 5:27 AM

Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

There's also the point that they optimise for computer power at training time.

In mass deployment computer power at inference time starts to matter.

rdeternalkid t1_jdpo6ra wrote on March 26, 2023 at 5:10 AM

Reply to comment by Zealousideal_Low1287 in [D] Do you use a website or program to organise and annotate your papers? by who_here_condemns_me

I do the same if the file is not so voluminous. I prefer taking notes with pen rather doing it digitally. Maybe I'm old soul :)

PilotThen t1_jdpnoul wrote on March 26, 2023 at 5:05 AM

Reply to comment by ganzzahl in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.

You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)

(Also open-assistent seems to be building on top of them.)

PilotThen t1_jdpn8eb wrote on March 26, 2023 at 5:00 AM

Reply to [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I'm down the rabbit hole of finding the best model to build on and learn with this weekend.

Currently poking at PygmalionAI/pygmalion-1.3b

Beware: The different size pygmalion model are finetuned from different pretrained models, so have inherited different licenses.

I like my results with 6b better but 1.3b has the better license (apgl-3.0)

ILOVETOCONBANDITS t1_jdpmxfe wrote on March 26, 2023 at 4:56 AM

Reply to comment by [deleted] in [D] ICML 2023 Reviewer-Author Discussion by zy415

Er i'm prob not the best to answer this but I think that's a pretty good score. While the AC can still choose to reject, I think you'd have like a 60% chance? Usually it seems scores of less than 5 are pretty strong reject and greater than 6.5 are pretty safe, but in the middle its more of a toss up. However, that was before the change in rating where 5 went from borderline reject to borderline accept. Now it seems this range has shifted down about a half point. So that's how I got my 60% chance estimate. Best of luck!