Recent comments in /f/MachineLearning

KallistiTMP t1_jd8tlq0 wrote

I've heard of added value, especially in terms of highlighting risk areas and developing strategies to minimize those risks. They have a pretty good sense of how to make the AI less likely to say racist/sexist/offensive stuff.

Unfortunately, whenever it comes to a question of money vs. ethics, companies always side with money. So in application the only impact they can have is on ethics improvements which don't threaten the bottom line.

1

StablePunFusion t1_jd8qm6b wrote

Thanks for releasing the training data (https://github.com/sahil280114/codealpaca/blob/master/data/code_alpaca_20k.json).

Where was the training data gathered from? Has the data been verified to be correct?

I'm a tad sad to see that most of the training data doesn't have the language tagged anywhere, some do but most don't, so the resulting model might not be super useful as it'll confuse languages, I guess.

1

throwaway2676 t1_jd8qe6f wrote

When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?

Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?

2

brownmamba94 t1_jd8lqry wrote

Also, the N:M sparsity structure is much more constrained in terms of mask diversity compared to unstructured sparsity. In Table 1 in the N:M Transposable sparsity paper, they present the mask diversity constraint between different sparsity techniques (both unstructured and structured), and as expected unstructured sparsity achieves the best. I think this is important especially for dynamic sparse training because now the algorithm has a much larger search space to explore sparse subnetworks. Also, imposing structured sparsity like N:M sparsity tends to reduce the expressivity of a weight matrix at higher sparsity levels, which can be a constraint if you want to get high compression ratios.

3

Carrasco_Santo t1_jd8e34b wrote

Imagine the situation where after so many studies, some international team manages to optimize the functioning of artificial neurons to a point where they are more efficient than biological neurons? We would automatically be outclassed.

And this is possible, scientists around the world have studied ways to optimize natural processes for some purpose, for example, ways to reduce the number of necessary steps that photosynthesis needs to produce sugar, making the process faster and more economical, it may be that the same happens with the functioning of neurons and their capacities.

2

Mxbonn t1_jd8e2ab wrote

I had all borderline scores with 3/4 reviewers asking at least 4 questions and said they are willing to reconsider rating based on answers.

No one has replied or changed rating so far. Discussion period is not over yet but it's frustrating not knowing whether they even read my rebuttal.

EDIT:
Received no responses in the end.

14

Nice_Cod7781 t1_jd8avf1 wrote

Why release without the weights? All it does is force people to expend extra energy and time on something that could have been provided originally. It's bad from a cooperative perspective and doesn't help the environment either.

You're not commercializing this so it's not like you're going to get into any legal trouble for releasing the model.

16

trnka t1_jd82eo1 wrote

If you're using some API, it's probably best to look at the API docs.

If I had to guess, I'd say that top_k is about the beam width in beam search. And top_p is dynamically adjusting the beam width to cover the amount of the probability distribution you specify.

top_k=1 is probably what we'd call a greedy search. It's going left to right and picking the most probable token. The sequence of tokens selected in this way might not be the most probable sequence though.

Again, check the API docs to be sure.

All that said, these are just settings for discovering the most probable sequence in a computationally efficient way. It's still deterministic and still attempting to find the most probable sequence. What I was describing in the previous response was adding some randomness so that it's not deterministic.

1