fasttosmile
fasttosmile t1_janaaex wrote
Reply to comment by harharveryfunny in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
GCP, speechmatics, rev, otter.ai, assemblyai etc. etc. offer similar or better performance, as well as streaming and a much more rich output.
fasttosmile t1_j97r2fc wrote
Reply to [D] Things you wish you knew before you started training on the cloud? by I_will_delete_myself
byobu > tmux
fasttosmile t1_j8v03xu wrote
Reply to comment by drinkingsomuchcoffee in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
> I don't know what hackable means. You haven't defined it. I'm going to use the most generous interpretation to mean, you can modify it without impacting other places. Well you can do that if it's centralized, just copy paste it into your file and then edit it- that's no excuse to completely ban centralization! Alternatively decompose the centralized function more and only use the pieces you need.
Your definition of hackable is almost it. What’s missing is that being decentralized makes things much, much easier to understand because the code is very straightforward and doesn’t have to take 10 different things into account.
You cant just copy paste a file if it’s centralized, you’ll have to copy paste multiple, and the main issue is it’s gonna take a while to understand which ones (and you'll have to modify the imports etc., unless you copy the entire repo! are you seriously suggesting that lmao) and what’s safe to modify inside of them. Decomposing is just going to make things more complicated for no gain.
Deep learning is about the details, and whenever you start breaking things apart and putting the details in different corners that’s how you end up with code that is hard to understand and people making mistakes and not understanding what’s going on.
> Maybe it should cause 100s of failures if it's a breaking change (a bug). That's a pretty good sign you really did screw something up.
It's a syntax/interface/some-other-not-fundamental bug. A real bug would have already been spotted when checking the test-set performance .
> No it's not. If new code uses a battle tested core, I don't have to review those parts as thoroughly. If it's copy pasted, I still have to review it and make sure they didn't copy an old version with bugs or slightly modified it and broke something. Sounds like this is common as many people have complained about dozens of bugs!
The way code is shown to be correct is by getting SOTA results. If it does that it is "battle tested". If it didn't do that no one would even think of merging it in the first place.
> Yep, you've identified a place where you shouldn't try to fit every idea under a single "Attention" class. That's just common sense programming, not an argument against writing good shared functions or classes.
It is an argument against having shared classes. At the same time, sure you can have some shared code, Huggingface does that.
> It can sometimes. But not always. Having one massive file named main.py is not more readable than a well split program. This seems like basic common sense to me, but here's an actual paper on the subject:
There is an important distinction that you're ignoring here. Having semantically separate objects in one file is indeed confusing. But if put everything related to the model in one file that simplifies things and reduces the working memory people require to read your code.
> Then why does the Bert module have changes as recent as this week with changes from dozens of authors going back years?
The recent change for Bert is some inference Interfaxe code which has to be kept common across all models. That’s their decision, I wouldn’t even do that, just make kwargs mandatory imo.
> Maybe you should check your assumptions before you make a fundamental decision (you know, basic engineering). There's plenty of forked libraries that are not modified and are forked for archival purposes. Nor should you cater to a small minority if most people aren't doing this.
Everyone in deep learning likes to gamble on making some tweaks to the model hoping they’ll get the next ICLR oral. Why else would they care about modifying the model code?
--
I suggest you go read some modeling code from different frameworks, one example is fairseq. I like fairseq, I think it's well done considering it's aims and constraints. But you're crazy if you think it's easier to understand and modify the code for some specific model than in huggingface. Here's the link to fairseq's roberta, you'll need to understand look at a dozen files to see what's happening. In constrast, huggingface is one file.
Spent too much time on this already, not gonna reply anymore.
fasttosmile t1_j8tudd4 wrote
Reply to comment by drinkingsomuchcoffee in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
You don't need to explain what DRY is. You need to understand that there is a trade-off between centralizing (creating shared functions/classes in modules that many other modules import from) a codebase verses keeping it hackable that is unavoidable.
fasttosmile t1_j7ws729 wrote
Reply to comment by DigThatData in [D] Using LLMs as decision engines by These-Assignment-936
So cool!
fasttosmile t1_j6jzvyw wrote
Reply to comment by uhules in [D] What's stopping you from working on speech and voice? by jiamengial
That does not make sense. You don't need kaldi to use the new libraries. And lhotse can be used totally independently of k2 or icefall.
fasttosmile t1_j6jk30j wrote
Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial
Everyone has been moving on from kaldi so it's a little weird to bring that up now.
If you're interested in a modern formats for speech data look into lhotse.
fasttosmile t1_j5bh9am wrote
Reply to [D]Can a bachelor get a job in ML? by alphapussycat
Yes, but it's hard and you have to be good and get lucky. Internships are the way.
fasttosmile t1_j0w1bug wrote
Reply to comment by retard-moron in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite
lmao, fitting username.
fasttosmile t1_izgxj4n wrote
Reply to comment by SeucheAchat9115 in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]
Careful. There are literally dozens of LMing papers that get an improvement on PTB which do not scale to larger datasets.
fasttosmile t1_ixw4q6i wrote
Reply to comment by [deleted] in [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi
lmao
fasttosmile t1_ix4gti4 wrote
Reply to comment by parabellum630 in [R] Tips on training Transformers by parabellum630
This is a great reference to follow: https://github.com/karpathy/minGPT
fasttosmile t1_ix4er1a wrote
Reply to [R] Tips on training Transformers by parabellum630
None of the things you mentioned are close to as important as what your dataset is.
Also it's important to use AdamW with high weight decay.
fasttosmile t1_itiivv7 wrote
Reply to [D] Building the Future of TensorFlow by eparlan
Interesting they've decided to keep investing into it. I suppose with the amount of existing code they must have it was hard not to.
fasttosmile t1_it243op wrote
Reply to [D] is a strong background in math/stats/cs in a necessary condition for becoming a renowned researcher in the ML community? *A passive rant* by [deleted]
You skimmed through multiple one hour long videos?
lmao
fasttosmile t1_iqrolwa wrote
Reply to comment by ClearlyCylindrical in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
There was for a while the belief that the stochasticity was key for good performance (one paper supporting the hypothesis from 2016). Your framing makes it sound like that is still the case - you suggest no other reason for not doing full batch descent - and I think it's important to point out it's not.
fasttosmile t1_iqrel80 wrote
Reply to comment by ClearlyCylindrical in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
This is wrong see: https://www.youtube.com/watch?v=kcVWAKf7UAg
The real reason is it's just faster to train on smaller batches (because the steps are quicker).
fasttosmile t1_japaes4 wrote
Reply to comment by MonstarGaming in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
To be fair, they are technically very competent and the pricing is very cheap. And their marketing is great.
But yeah dealing with B2B customers (where the money is) and integrating feedback from them is a very different thing than what they've been doing so far. They might be angling to serve as a platform for AI companies that then have to deal with average customers. That way they get to only deal with people who understand the limitations of AI. Could work. Will change the company to be less researchy though.