Recent comments in /f/MachineLearning
DigThatData t1_jdpza0l wrote
Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
it's an RNN
karius85 t1_jdpz4k7 wrote
Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat
Pytorch Lightning is a simpler alternative to PyTorch and Kornia has a lot of standard scikit-image/opencv implementations for PyTorch on GPU, many of which support autograd. Webdataset is scheduled for inclusion into PyTorch, and uses sharded tar files with label data, but is currently underdocumented to some extent.
[deleted] t1_jdpz1d5 wrote
Reply to comment by maizeq in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
[deleted]
Poseidon_22 t1_jdpyo9u wrote
Apparently, for linear improvement in accuracy, we would need exponentially more parameters. Gpt-4 with more than 1 trillion parameters would need to be trained on 6,700gpus for a whole year!
michaelthwan_ai OP t1_jdpyb80 wrote
Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
added in backlog. Need some time to study. Thanks.
michaelthwan_ai OP t1_jdpy7yf wrote
Reply to comment by StellaAthena in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
I may include BARD if it is fully released.
So LAMDA->BARD (maybe). But it is still in alpha/beta.
michaelthwan_ai OP t1_jdpy5dy wrote
Reply to comment by tonicinhibition in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Thanks for the sharing above!
My choice is yk - Yannic Kilcher. Some "AI News" videos is a brief introduction and he sometimes go through certain papers in details. Very insightful!
michaelthwan_ai OP t1_jdpy0l2 wrote
Reply to comment by DigThatData in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
added, thank you.
michaelthwan_ai OP t1_jdpy06p wrote
Reply to comment by philipgutjahr in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
I only include recent LLM (Feb/Mar 2023) (that is the LLMs usually at the bottom) and 2-factor predecessors (parent/grandparent). See if your mentioned one is related to them.
michaelthwan_ai OP t1_jdpxuzs wrote
Reply to comment by Small-Fall-6500 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Chatgpt-like github -> added most and so is in TODO (e.g. palm)
RWKV -> added in backlog
michaelthwan_ai OP t1_jdpxtp1 wrote
Reply to comment by philipgutjahr in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
Open alternative -> added most and so is in TODO (e.g. palm)
OpenChatKit -> added
Instruct-GPT -> seems it's not a released model but plan.
elnaqnely t1_jdpty1s wrote
Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat
> accelerate the image processing tasks using a GPU
You can find some working code to do simple manipulations of images (scaling, flipping, cropping) on a GPU. Search for "gpu image augmentation".
> image dataset as some form of a database
With millions of images, the metadata alone may be difficult to navigate. I recommend storing the images/metadata on a good SSD (plus a backup), with the metadata in Parquet format, partitioned by categories that are meaningful to you. That will allow the metadata to be efficiently queried using Arrow or Spark, both of which have Python wrappers (pyarrow, pyspark).
For the images themselves, store them in a similar nested directory structure to match the metadata. This means your images will be grouped by the same meaningful attributes you chose to partition the metadata. Also, this will hopefully keep the number of images per directory from becoming too large. Doing that will allow you to browse thumbnails using whatever file browser comes with your operating system. To rapidly page through thousands of images, I found that the default Ubuntu image viewer, Eye of Gnome, works really well.
MrFlufypants t1_jdpt0vm wrote
Reply to [D] Keeping track of ML advancements by Anis_Mekacher
We do a journal series at work. Rule is every engineer has to do one before we get to do another one. Gives presenting skills and forces us to hear new stuff since we all have different preferences.
Big issue is that recently many of the coolest advancements have been by Facebook, openai, and google and they are increasingly releasing “reports” instead of “papers”. We are getting a lot more “And then they did this incredibly revolutionary thing but only said they used a ‘model’”. They aren’t giving details because they want to keep their work private. Big bummer.
I also read any papers that make the top of this sub, and I’ll usually read a couple of the best performing papers from the big conferences
sam__izdat t1_jdps8rk wrote
Reply to comment by Ilforte in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778
>Why would that be a disingenuous definition?
Doesn't matter if it's disingenuous. What it's implying is ridiculous. It would be more surprising if the linear regression model didn't work at all. The fact that it can correlate fMRI data better than random doesn't mean you've replicated how language works in the brain, let alone how it's acquired.
> In general, your defense of generative linguistics is very weak. It's just invective and strawmen, and it reeks of desperation.
I don't have any horse in the race or anything to be desperate about. It's just an astonishingly stupid proposition.
I should say, I am not qualified to defend or refute generative linguistics (though that clearly was no obstance for the author), and I don't know anything about it. I do feel qualified (because I can read and check sources) to dismiss this embarrassing pile of nonsense, though, as it's just so plainly nonsense that it doesn't take an expert to dismiss its bombastic claims as pseudoscience -- and I'm talking about Piantadosi here and not his references, which, for all I know, are serious research misrepresented by a dunce. I'm not in academia and I don't feel the need to be any more diplomatic about this than he was toward linguists in his pdf-format blog post.
PilotThen t1_jdppmpl wrote
Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
There's also the point that they optimise for computer power at training time.
In mass deployment computer power at inference time starts to matter.
rdeternalkid t1_jdpo6ra wrote
Reply to comment by Zealousideal_Low1287 in [D] Do you use a website or program to organise and annotate your papers? by who_here_condemns_me
I do the same if the file is not so voluminous. I prefer taking notes with pen rather doing it digitally. Maybe I'm old soul :)
PilotThen t1_jdpnoul wrote
Reply to comment by ganzzahl in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.
You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)
(Also open-assistent seems to be building on top of them.)
PilotThen t1_jdpn8eb wrote
I'm down the rabbit hole of finding the best model to build on and learn with this weekend.
Currently poking at PygmalionAI/pygmalion-1.3b
Beware: The different size pygmalion model are finetuned from different pretrained models, so have inherited different licenses.
I like my results with 6b better but 1.3b has the better license (apgl-3.0)
ILOVETOCONBANDITS t1_jdpmxfe wrote
Reply to comment by [deleted] in [D] ICML 2023 Reviewer-Author Discussion by zy415
Er i'm prob not the best to answer this but I think that's a pretty good score. While the AC can still choose to reject, I think you'd have like a 60% chance? Usually it seems scores of less than 5 are pretty strong reject and greater than 6.5 are pretty safe, but in the middle its more of a toss up. However, that was before the change in rating where 5 went from borderline reject to borderline accept. Now it seems this range has shifted down about a half point. So that's how I got my 60% chance estimate. Best of luck!
noobgolang t1_jdpmeq9 wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Stop gate keeping researchhhh!!!! It is already that bad
[deleted] t1_jdplqud wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
[deleted]
[deleted] t1_jdpl0gb wrote
[deleted]
Ilforte t1_jdpkqlz wrote
Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778
>If you define "developmentally plausible" as "100 million tokens"
Why would that be a disingenuous definition?
In general, your defense of generative linguistics is very weak. It's just invective and strawmen, and it reeks of desperation.
> overconfident doe-eyed futurists guzzling the silicon valley kool aid
Come on now.
SatoshiNotMe t1_jdpgrat wrote
Reply to comment by SatoshiNotMe in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Looking at the repo, well, it does looks like we need to run this in a DB notebook.
LowPressureUsername t1_jdq0nsn wrote
Reply to comment by yaru22 in [D] Simple Questions Thread by AutoModerator
It’s mostly computational power available AFAIK. More context = more tokens = more processing power required.