NovelspaceOnly t1_jdqx9t2 wrote on March 26, 2023 at 2:22 PM

Reply to [D] Keeping track of ML advancements by Anis_Mekacher

This might sound a bit corny. I try to have a sparse BFS understanding of the field at any given time and a DFS on topics I'm interested in like interpretability, NLP, and GNNs.

Four things that I think are important are - contributing to open source, joining discord communities, at the very min "skimming" papers(reading abstracts, conclusions, and charts), and I also topic model researchers' Github repos i find on paperswithcode. As a 5th - ML Twitter if you can maintain your sanity.

The sixth sense and the most important one is to have a strong math background, IMO it is the most important aspect that helps generalize new research. grok linear algebra, probability, and calc. Mostly linear algebra though because the tensor notation really helps with probability and functional analysis. A lot of physics can be understood through the lens of tensor analysis and probability.

theogognf t1_jdqx32u wrote on March 26, 2023 at 2:20 PM

Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat

You can use whatever you want for the actual data storage. As other comments have mentioned and at least in the PyTorch space, it really just comes down to you defining a dataset object that samples or pulls data from your data storage whenever it's indexed so you aren't using all your RAM on just storing your dataset. That data storage can be a relational database or non-relational database, just files on your local system, or files on a database file system for a cloud provider; it doesn't really matter so long as you can quickly load samples into memory. With billions of images, you may want to look into using a cloud provider for at least storing your dataset (depending on their size)

You can certainly preprocess your data and store it in a processed format if you want to and if you think that's a bottleneck in your data loading. It sounds like you should focus on figuring out how to store your data first though

Regarding ML frameworks, that just comes down to preference. I usually see PyTorch for experimentation with custom models, while I see TensorFlow for mature models that're being deployed/served

Jeffy29 t1_jdquk5v wrote on March 26, 2023 at 2:00 PM

Reply to comment by 3deal in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Literally doomsayer. I know I know “bUt ThIs TiMe iTs dIfFeRenT”. I am sure you guys will be right one day.

[deleted] t1_jdqmk5x wrote on March 26, 2023 at 12:49 PM

Reply to comment by StellaAthena in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

[deleted]

signed7 t1_jdqm8lt wrote on March 26, 2023 at 12:45 PM

Reply to comment by ganzzahl in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only

Camillo_Trevisan t1_jdqjtbp wrote on March 26, 2023 at 12:20 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hello everyone,

I state that I am a neophyte.

I'm looking for a Machine Learning software that can analyze large datasets composed as follows: 3D surface defined by triplets of XYZ values (at least 150 triplets or more, defined on a regular and constant grid or, possibly, also on an irregular grid, different for each set) and the related outputs, produced by my software, which contain about seventy calculated numerical parameters on that surface. I would like to analyze a few thousand datasets, each consisting of at least 500/600 or more numerical values.

The idea is both to analyze the entered data and also to carry out simulations such as: if I define a new set of output values, which 3D surface could generate them using my software?

The utility is given by the fact that my software takes many hours of calculation to generate a set of output values and also it only works in one direction (input grid -> output values).

Thanks in advance for any suggestion

Camillo

tdgros t1_jdqjc8q wrote on March 26, 2023 at 12:15 PM

Reply to comment by Co0k1eGal3xy in Is it possible to merge transformers? [D] by seraphaplaca2

there's also weight averaging in eSRGAN that I knew about, but that always irked me. The permutation argument from your third point is the usual reason I evoke on this subject, and the paper does show why it's not as simple as just blending weights! The same reasoning also shows why blending subsequent checkpoints isn't like blending random networks.

Co0k1eGal3xy t1_jdqgwlh wrote on March 26, 2023 at 11:48 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

BioBERT base and LegalBERT use the same architecture so using a technique like Git-rebasin would improve performance over using just one or the other model, however if you want to merge the models and get the best of both models, you should retrain on a merged dataset or use model ensembles instead (aka, load and run both models and intelligently pick which model to listen to for which type of data)

You can not (easily) merge BioBERT large since that checkpoint uses a custom vocabulary, but BioBERT base looks perfectly fine.

Co0k1eGal3xy t1_jdqfxcr wrote on March 26, 2023 at 11:36 AM

Reply to comment by tdgros in Is it possible to merge transformers? [D] by seraphaplaca2

Most stable diffusion UIs DO merge weights by averaging them
Averaging weights between checkpoints works really well with CLIP fine-tuning, improving performance over both checkpoints for their respective validation sets. https://github.com/mlfoundations/wise-ft
Git-rebasin found that their method of merging weights works for merging checkpoints with completely different pretraining data + init weights and improves accuracy on a mixed validation set over just using one model or the other. https://arxiv.org/abs/2209.04836

You're right that merging the model outputs has higher quality than merging the weights, but OP was asking if it was possible and it is very much possible if the weight tensors have the same shape.

Jean-Porte t1_jdqfhrf wrote on March 26, 2023 at 11:31 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

Model averaging sounds stupid but it actually kind of works, you could try it. But does it make sense ? It not work as well as the individual models

mLalush t1_jdqc5kk wrote on March 26, 2023 at 10:47 AM

Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat

https://github.com/webdataset/webdataset

The above library is getting integrated into torchdata, and will become part of Pytorch stack eventually.

[deleted] t1_jdqc0ax wrote on March 26, 2023 at 10:45 AM

Reply to comment by tdgros in Is it possible to merge transformers? [D] by seraphaplaca2

[removed]

tdgros t1_jdqbgqy wrote on March 26, 2023 at 10:37 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

the model merging offered by some stable diffusion UIs do not merge the weights of a network! They merge the denoising results for a single diffusion step from 2 different denoisers, this is very different!

Merging the weights of two different models does not produce something functional in general, it also can only work for 2 models with exactly the same structure. It certainly does not "mix their functionality".

tdgros t1_jdqarbq wrote on March 26, 2023 at 10:27 AM

Reply to comment by [deleted] in Is it possible to merge transformers? [D] by seraphaplaca2

what's the connection between LoRa and the question about merging weights here?

edit: weird, I saw a notification for an answer from you, but can't see the message...

LoRa is a compression method that replaces weight matrices with low rank approximations for single tasks. It does not merge models or weights

Important_Tonight433 t1_jdqamia wrote on March 26, 2023 at 10:25 AM

Reply to comment by Mxbonn in [D] ICML 2023 Reviewer-Author Discussion by zy415

>[D]
>
>ICML
>
>2023 Paper Reviews

I spent multiple rounds answering some questions that can be searched from Google, explaining why this is not a question, and many other papers evaluate benchmarks in the same way

Finally the result ends up with "don't argue with me".

I wish they could at least pay me some tuition fee for educating them.

incrapnito t1_jdqamby wrote on March 26, 2023 at 10:25 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

I think you are looking for federated learning which is complete research field on its own. It digs into combining weights of two neural networks such that both tasks can still be accomplished. Existing approaches should apply to transformers too.

Anis_Mekacher OP t1_jdq6htf wrote on March 26, 2023 at 9:24 AM

Reply to comment by Necessary_Ad_9800 in [D] Keeping track of ML advancements by Anis_Mekacher

not specifically papers, but this newsletter covers AI topics https://runtheai.com/

it's a daily newsletter about the latest AI news.

FirstBabyChancellor t1_jdq6cbb wrote on March 26, 2023 at 9:22 AM

Reply to [D] Title: Best tools and frameworks for working with million-billion image datasets? by v2thegreat

Look into Nvidia DALI. It's moreso designed as a highly efficient and faster alternative to PyTorch's default dataloader, but you can also use it to do a number of preprocessing operations on images -- using GPUs.

Necessary_Ad_9800 t1_jdq69kz wrote on March 26, 2023 at 9:21 AM

Reply to [D] Keeping track of ML advancements by Anis_Mekacher

Is there website that brings out newsletters or YouTube channels talking about weekly news?

mskogly t1_jdq58o4 wrote on March 26, 2023 at 9:05 AM

Reply to [D] What happens if you give as input to bard or GPT4 an ASCII version of a screenshot of a video game and ask it from what game it has been taken or to describe the next likely action or the input? by Periplokos

I believe gpt4 can read and describe the content of images. No need to go via ascii.

Anis_Mekacher OP t1_jdq4r88 wrote on March 26, 2023 at 8:58 AM

Reply to comment by MrFlufypants in [D] Keeping track of ML advancements by Anis_Mekacher

>>> We do a journal series at work. Rule is every engineer has to do one before we get to do another one. Gives presenting skills and forces us to hear new stuff since we all have different preferences.

Is it "open source" for anyone to access? I think such blogs are an excellent advertisements for the companies

>>> Big issue is that recently many of the coolest advancements have been by Facebook, openai...

I've noticed that trend too, it's disappointing, especially when considering that most of these companies' AI teams were built on published papers and open-source stuff

>>> I also read any papers that make the top of this sub, and I’ll usually read a couple of the best performing papers from the big conferences

I've been doing that too and tbh. I like to take my time and read papers thoroughly doing that for each paper that reaches the top of this sub is pretty time-consuming, but overall this sub is an amazing resource to start.

thanks !!

philipgutjahr t1_jdq43bq wrote on March 26, 2023 at 8:48 AM

Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

not sure if this is true, but afaik chat-gpt is basically a implementation of instruct-gpt (where OpenAI have been very thoroughly at RLHF)

"instance of" https://nextword.dev/blog/chatgpt-instructgpt-gpt3-explained-in-plain-english

"sibbling but a lot better" https://openai.com/blog/chatgpt

[deleted] t1_jdq3nn5 wrote on March 26, 2023 at 8:42 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

[deleted]

AntelopeStatus8176 t1_jdq1t5p wrote on March 26, 2023 at 8:15 AM

Reply to [D] Simple Questions Thread by AutoModerator

I have a set of 20.000 raw measurement data slices, each of which
contains 3.000 measurement samplepoints. For each of the data slices,
there is a target value assigned to it. The target values are continous.
My first approach was to do feature engineering on the raw
measurement slices to reduce data and to speed up ML-teaching. This
approach works reasonably well in estimating the target value for
unknown data slices of the testing data set.
My second approach would be to use the raw data slices as input.
On a second thought, this appears to be dramatically computing power
intensive, or at least way more than i can handle with my standard-PC.
To my understanding, this would mean to construct an ANN with 3.000
input nodes and several deep layers.
Can anyone give advice whether teaching with raw measurement data
with this kind of huge datasets does even make sense and if so, which
algorithms to use? Preferably examples in python

rsha256 t1_jdq13w4 wrote on March 26, 2023 at 8:05 AM

Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

What does CV have That makes it “solved”? Stable Diffusion?

Recent comments in /f/MachineLearning