Recent comments in /f/MachineLearning
KallistiTMP t1_jd8tlq0 wrote
I've heard of added value, especially in terms of highlighting risk areas and developing strategies to minimize those risks. They have a pretty good sense of how to make the AI less likely to say racist/sexist/offensive stuff.
Unfortunately, whenever it comes to a question of money vs. ethics, companies always side with money. So in application the only impact they can have is on ethics improvements which don't threaten the bottom line.
[deleted] t1_jd8tbce wrote
Reply to comment by StablePunFusion in [P] CodeAlpaca Code and Data release by immune_star
[removed]
C0demunkee t1_jd8svm2 wrote
Reply to comment by whyvitamins in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Tesla P40 24gb VRAM, $150 only 1 or 2 gen behind the 3090
pixiegirl417 OP t1_jd8s4nc wrote
Reply to comment by BayesMind in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
I haven't tried to run it locally since I don't have the hardware requirements, and haven't tried to find a way to do it.
However you can check my GitHub if you want to try the server attached inference API (I know it may not be what you're looking for).
StablePunFusion t1_jd8r3xb wrote
Reply to comment by immune_star in [P] CodeAlpaca Code and Data release by immune_star
Do you (or anyone) know of any higher quality sources of training sets for code?
Seems to be lacking, at least when I searched around last time. Maybe it's time to spin up a community initiative around it?
immune_star OP t1_jd8qqsg wrote
Reply to comment by StablePunFusion in [P] CodeAlpaca Code and Data release by immune_star
Data has been generated using text-davinci-003 , not verified to be correct
StablePunFusion t1_jd8qm6b wrote
Reply to [P] CodeAlpaca Code and Data release by immune_star
Thanks for releasing the training data (https://github.com/sahil280114/codealpaca/blob/master/data/code_alpaca_20k.json).
Where was the training data gathered from? Has the data been verified to be correct?
I'm a tad sad to see that most of the training data doesn't have the language tagged anywhere, some do but most don't, so the resulting model might not be super useful as it'll confuse languages, I guess.
throwaway2676 t1_jd8qe6f wrote
Reply to [D] Simple Questions Thread by AutoModerator
When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?
Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?
BayesMind t1_jd8ps8g wrote
Reply to comment by pixiegirl417 in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
Is there an example script somewhere for how to run this? All I've seen is the heavy inference server example in the repo.
[deleted] t1_jd8o44j wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
[deleted]
midasp t1_jd8lyq4 wrote
Reply to comment by Mxbonn in [D] ICML 2023 Reviewer-Author Discussion by zy415
Give it some time. When I was a reviewer, I'd usually sit on responses for at least a day or two to gestate on them before answering.
brownmamba94 t1_jd8lqry wrote
Reply to comment by maizeq in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
Also, the N:M sparsity structure is much more constrained in terms of mask diversity compared to unstructured sparsity. In Table 1 in the N:M Transposable sparsity paper, they present the mask diversity constraint between different sparsity techniques (both unstructured and structured), and as expected unstructured sparsity achieves the best. I think this is important especially for dynamic sparse training because now the algorithm has a much larger search space to explore sparse subnetworks. Also, imposing structured sparsity like N:M sparsity tends to reduce the expressivity of a weight matrix at higher sparsity levels, which can be a constraint if you want to get high compression ratios.
C0demunkee t1_jd8l0tg wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
maybe consider Tesla P40s
24gb, lots of CUDA cores, $150 each
[deleted] t1_jd8ijx5 wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
[deleted]
radi-cho t1_jd8gdzt wrote
Reply to [P] CodeAlpaca Code and Data release by immune_star
Great. Congratulations. I was planning on attempting the same basically, so thanks for open-sourcing it:)
2muchnet42day t1_jd8fnje wrote
Reply to comment by immune_star in [P] CodeAlpaca Code and Data release by immune_star
Would you consider doing a LoRA version of CodeAlpaca and compare the ouputs of the two models?
MazzMyMazz t1_jd8fbij wrote
If there is a novelization of Lethal Weapon, do that using Obama for glover’s character and trump for Mel Gibson.
Traditional-Ad-8715 OP t1_jd8eauf wrote
Reply to comment by SeaMeasurement9 in [Project] AI Voice Narrated Audiobooks by Traditional-Ad-8715
money
Carrasco_Santo t1_jd8e34b wrote
Reply to comment by brownmamba94 in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
Imagine the situation where after so many studies, some international team manages to optimize the functioning of artificial neurons to a point where they are more efficient than biological neurons? We would automatically be outclassed.
And this is possible, scientists around the world have studied ways to optimize natural processes for some purpose, for example, ways to reduce the number of necessary steps that photosynthesis needs to produce sugar, making the process faster and more economical, it may be that the same happens with the functioning of neurons and their capacities.
Mxbonn t1_jd8e2ab wrote
Reply to [D] ICML 2023 Reviewer-Author Discussion by zy415
I had all borderline scores with 3/4 reviewers asking at least 4 questions and said they are willing to reconsider rating based on answers.
No one has replied or changed rating so far. Discussion period is not over yet but it's frustrating not knowing whether they even read my rebuttal.
EDIT:
Received no responses in the end.
le4mu t1_jd8cyno wrote
Don't believe a paper until you have their code and run it.
Nice_Cod7781 t1_jd8avf1 wrote
Reply to [P] CodeAlpaca Code and Data release by immune_star
Why release without the weights? All it does is force people to expend extra energy and time on something that could have been provided originally. It's bad from a cooperative perspective and doesn't help the environment either.
You're not commercializing this so it's not like you're going to get into any legal trouble for releasing the model.
immune_star OP t1_jd892n1 wrote
Reply to comment by 2muchnet42day in [P] CodeAlpaca Code and Data release by immune_star
Primarily I had the hardware needed to do a full finetune so just went ahead with it, also LoRA can lead to a slight loss in quality.
trnka t1_jd82eo1 wrote
Reply to comment by disastorm in [D] Simple Questions Thread by AutoModerator
If you're using some API, it's probably best to look at the API docs.
If I had to guess, I'd say that top_k is about the beam width in beam search. And top_p is dynamically adjusting the beam width to cover the amount of the probability distribution you specify.
top_k=1 is probably what we'd call a greedy search. It's going left to right and picking the most probable token. The sequence of tokens selected in this way might not be the most probable sequence though.
Again, check the API docs to be sure.
All that said, these are just settings for discovering the most probable sequence in a computationally efficient way. It's still deterministic and still attempting to find the most probable sequence. What I was describing in the previous response was adding some randomness so that it's not deterministic.
2muchnet42day t1_jd8u97a wrote
Reply to comment by Nice_Cod7781 in [P] CodeAlpaca Code and Data release by immune_star
We need to start iterating on the same weights and not start from scratch everytime