Recent comments in /f/MachineLearning
_Arsenie_Boca_ t1_je9n0ea wrote
Reply to comment by RicketyCricket in [D] Alternatives to fb Hydra? by alyflex
Thanks, I basically use only the config part of hydra and am regularly annoyed that its so clunky, so spock might be a good alternative. Gonna check it out :)
matthkamis OP t1_je9mpyd wrote
Reply to comment by ZestyData in [D] Can large language models be applied to language translation? by matthkamis
Don’t give me a passive aggressive answer then “..Uh. I’m gonna assume you’re…”
viertys OP t1_je9mpwr wrote
Reply to comment by itsyourboiirow in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
I didn't mention it in the post but I'm using the albumentations module. I rotate, shift, rotate, blur, horizontal flip, downscale and use gauss noise. I get around 400 images after doing this. Is there anything you would suggest?
viertys OP t1_je9mocy wrote
Reply to comment by deep-yearning in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
I have an accuracy of 98.50 and I have dice of around 0.30-0.65 for each image
viertys OP t1_je9mlvj wrote
Reply to comment by Adventurous-Mouse849 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
I didn't mention it in the post, but I'm using the albumentations module. I rotate, shift, rotate, blur, horizontal flip, downscale and use gauss noise. I get around 400 images after doing this. Is there anything you would suggest?
viertys OP t1_je9mjxr wrote
Reply to comment by trajo123 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
Thank you a lot! I will try SMP
ZestyData t1_je9mf4n wrote
Reply to comment by matthkamis in [D] Can large language models be applied to language translation? by matthkamis
Indeed. But that's the answer to your question. Don't downvote me for it.
"Can large language models be applied to language translation". Yes, they already are.
jan_antu t1_je9me91 wrote
Reply to comment by TheAdvisorZabeth in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I read a few of your posts. It seems like you're having a break from reality. I'm a scientist but not a psychologist; I think you should speak with one, or a psychiatrist. Things may be fine for now but you don't want to end up hurting yourself or someone else by accident as this progresses.
matthkamis OP t1_je9m25q wrote
Reply to comment by ZestyData in [D] Can large language models be applied to language translation? by matthkamis
And it doesn’t work well
ZestyData t1_je9ly2p wrote
.. Uh. I'm going to assume you're relatively new to the world of ML. Translation is one of the most common uses for SOTA LLMs.
Its how Google Translate works, as just the most famous example.
What the SOTA translation tools don't yet use is instruct-tuning, to give them conversational interfaces (i.e the difference between GPT and ChatGPT). So they look different than using ChatGPT. But its very largely the same Generative (technically pretrained) Transformers under the hood.
RicketyCricket t1_je9lp7z wrote
Reply to comment by _Arsenie_Boca_ in [D] Alternatives to fb Hydra? by alyflex
Mainly that Spock is much lighter weight and really focuses on just configuration management and stateful ness. Hydra has all these crazy bells and whistles (Ray integration etc) that could be useful for certain things but kinda starts meandering from the original purpose of configuration management imo. Hydra is great and if it works for you then use it. We built Spock internally when I was at Fidelity because Hydra didn’t exist… just so happens that FB/Meta was doing the same thing at the same time so both libraries end up covering a very similar usage space
cc-test t1_je9knor wrote
Reply to comment by RobertAngelo in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Where did I make that claim?
I'm dumb as hell, just have enough brain capacity to be a senior SWE in fintech, which definitely doesn't require you to be some kind of genius.
Thanks for your input, I guess...
EquipmentStandard892 t1_je9kmvi wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
This exactly what I was talking about, I'm studying the llama.cpp to understand how this whole ML LLM world works, and I've found its pretty "simple" in the meanings of the programming itself. I'm a software engineer outside the ML field, and it was pretty interesting to do this deep dive. I'll take a deeper look into this RWKV proposal and maybe make something upon to test. If I found something interesting I comment here 😊
cc-test t1_je9kdtr wrote
Reply to comment by sEi_ in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Unfortunately I read the whole thing, it's incoherent and you seem to assume the reader of your comment lives inside your head when you make vague references.
Educational_Ice151 t1_je9k2u8 wrote
Meta.
SleekEagle t1_je9jlkg wrote
I think hallucination is a serious concern in some fields but for general business-y creative work it's going to be a game changer. Just look at Jasper - a $100M series A.
EDIT: This corresponds to GPT-4 more than ChatGPT
sEi_ t1_je9j5md wrote
Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Did you read more than the first line or? So did it fit your assumption from first line?
Any constructive comments?
saintshing t1_je9iw85 wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.
>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560
RobertAngelo t1_je9i4wn wrote
Reply to comment by cc-test in [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
Experienced ≠ smart
icedrift t1_je9i0wk wrote
Reply to comment by xander76 in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76
What I mean is, why generate the function when only the data needs to be generated? Let's say I need a function that takes the text content of a post and returns an array of recommended flairs for the user to click. Why do this
/**
* This function takes a passage of text, and recommends up to 8
* unique flairs for a user to select. Flairs can be thought of as labels
* that categorize the type of post.
*
* \@param textContent - the text content of a user's post
*
* \@returns an array of flairs represented as strings
*
* \@imaginary
*/
declare function recommendedFlairs(textContent: string) : <string[]>
When you could write out the function and only generate the data?
async function recommendedFlairs(textContent: string) : <string[]> {
const OAIrequest = await someRequest(textContent);
const flairs = formatResponse(OAIrequest);
return flairs
}
In writing all this out I think I figured it out. You're abstracting away a lot of the headaches that come with trying to get the correct outputs out of GPT?
Amster2 t1_je9h981 wrote
Reply to comment by Purplekeyboard in [R] The Debate Over Understanding in AI’s Large Language Models by currentscurrents
Im not sure they arent conscious. They can clearly reference themselves, and seem to undeestand they are a LLM with information cutoof in 21, etc.
He behaves like he is self conscious. How can we determine if they really are or not?
NoLifeGamer2 t1_je9gi5u wrote
I recommend using bootstrapping to create more datapoints, then approve the ones you like and add them to the dataset. Then, train based on the larger dataset.
EquipmentStandard892 t1_je9gc9y wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I've already seen langchain and it's truly amazing, the issue I've encountered and was trying to overcome is more an architectural problem actually, the token context span limit. I was looking to add a layer upon the transformer architecture to bypass this limitations, I've seen MKRL is able to handle higher context lengths, even claiming unlimited context span, although need to study more. I was not thinking about prompt engineering at all.
KingsmanVince t1_je9g82b wrote
We are clearly not tired of ChatGPT posts. As matter of fact, we really want you to speak more about it. /s
ZestyData t1_je9n2f0 wrote
Reply to comment by matthkamis in [D] Can large language models be applied to language translation? by matthkamis
¯\_(ツ)_/¯
not that aggressive to have assumed but ok