Recent comments in /f/MachineLearning
yaosio t1_je56tet wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
There's a limit, otherwise you would be able to ask it to self-reflect on anything and always get a correct answer eventually. Finding out why it can't get the correct answer the first time would be incredibly useful. Finding out where the limits are and why is also incredibly useful.
ChuckSeven t1_je55o02 wrote
Reply to comment by Haycart in [D] Very good article about the current limitations of GPT-n models by fripperML
The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.
Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.
It's hard to think of chatgpt as a very large transformer ... because we don't know how to think about very large transformers.
mkffl t1_je53xf1 wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
What impact has gpt delivered except some interest from the general population about generative models - which is not insignificant? Not much, so there’s potentially a lot of work needed to turn it into something useful, and I would focus on this.
Jean-Porte t1_je53acs wrote
Reply to [D] Alternatives to fb Hydra? by alyflex
This is much lighter but it's a pure-python config flow manager I made where you can chain experiment classes by adding them (xp1()+xp2() ) https://github.com/sileod/xpflow
r_linux_mod_isahoe t1_je5107o wrote
If something lets you replicate someone's IP with a few trivial steps, and you use it to get financial gains, you will be in trouble.
[deleted] t1_je50yk7 wrote
Reply to [D] Alternatives to fb Hydra? by alyflex
[removed]
harharveryfunny t1_je50vw9 wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
There's no indication that I've seen that it maintains any internal state from one word generated to the next. Therefore the only way it can build upon it's own "thoughts" is by generating "step-by-step" output which is fed back into it. It seems its own output is its only working memory, at least for now (GPT-4), although that's an obvious area for improvement.
RicketyCricket t1_je50mbi wrote
deep-yearning t1_je4zmdc wrote
Reply to "[D]" Is wandb.ai worth using? by frodo_mavinchotil
Yes it's worth using.
WindForce02 t1_je4zh7m wrote
Reply to comment by lostmsu in [D] Prediction time! Lets update those Bayesian priors! How long until human-level AGI? by LanchestersLaw
I don't know if IQ is exactly a good metric here because LLMs merely replicate training data so it would be likely that the training data (which is very big) contains information regarding IQ tests. It would be an indirect comparison because you'd be comparing sheer training data amount with a person's ability to produce thoughts. It would be way more interesting to give GPT4 complex situations that require advanced problem solving skills. Say you got a message that you need to decode and it has multiple layers of encryption and you only have a few hints on how you might go about it, since there's no way to replicate responses based on previous training data I'd be curious to see how far it gets, or let's say a hacking CTF, which is something that not only takes pure coding skill, but also a creative thought process.
itshouldjustglide t1_je4yj0x wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
It seems to be capable of handling the request but it's hard to tell how much of this is just a trick of the light and whether it's actually doing the reflection. Would probably help to know more about how the model actually works.
Hackerjurassicpark t1_je4xk41 wrote
Reply to "[D]" Is wandb.ai worth using? by frodo_mavinchotil
Mlflow
RubenC35 t1_je4xj87 wrote
The weights are a part of the model. They will be under the same umbrella. You cannot use the weights with the model. It is created internally even if you cannot see it
U03B1Q t1_je4xekj wrote
https://dl.acm.org/doi/10.1145/3324884.3416545
There was an ASE paper that found that even under identical hyperparameter seed settings networks had a variance of about 2% due to non-determinism in the parallel computing workflow. If they chose to retrain it instead of copying the old numbers, this performance discrepancy is in line with this work.
rshah4 t1_je4ved0 wrote
Reply to comment by royalemate357 in [D] Do model weights have the same license as the modem architecture? by murphwalker
I agree with this as well. Just because Meta isn't enforcing their license, this does not mean the license has gone away. At some point in the future, Meta could enforce it.
alyflex t1_je4uq2y wrote
Reply to comment by SnooMarzipans3021 in [D] Simple Questions Thread by AutoModerator
Another solution is to use a memory efficient neural network: https://arxiv.org/pdf/1905.10484.pdf With this type of neural network you can easily fit those size images into your neural network. However the problem with them is that they are very difficult to make (you manually have to code up the backpropagation). So depending on your math proficiency and ambitions this might just be too much.
alyflex t1_je4u0rr wrote
Reply to comment by RecoilS14 in [D] Simple Questions Thread by AutoModerator
It really depends what you are intending to use this for. There are many sides to machine learning, but you don't have to know all of them. To name a few very different concepts:
MLOps (Corsera has an excellent series on this) Reinforcement learning GANs Graph neural networks
I would say that once you have an idea about what most of these topics involve it is time to actively dive into some of them by actually trying to code up solutions in them, or downloading well known github projects and trying to run them yourself.
kaphed OP t1_je4slfg wrote
Reply to comment by suflaj in Variance in reported results on ImageNet between papers [D] by kaphed
thanks!
keepthepace t1_je4s97b wrote
Honestly at this point I am not sure weights can be copyrighted: they have no human "author". It is a total gray zone. Tribunals will rule knia few years that the habits taken now are the jurisprudence
SzilvasiPeter t1_je4pknf wrote
Reply to comment by cegras in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Should I bet a coffee? No way... that is too much of a deal.
[deleted] t1_je4of2r wrote
Reply to comment by W_O_H in [D] Build a ChatGPT from zero by manuelfraile
[removed]
Philpax t1_je4nils wrote
Changing the video player you're using to watch a movie doesn't make the movie any less copyrighted; the same kind of mechanics would apply here.
emissaryo t1_je4jvzt wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
Now I'm even more concerned about privacy. Governments will use it for surveillance and the more modalities we add, the more surveillance there will be.
obolli t1_je4juzh wrote
Reply to comment by mlresearchoor in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
I think they used to. Things change when you come under the pressure of returning profits.
WarmSignificance1 t1_je57s2a wrote
Reply to comment by trajo123 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
So I actually think that senior devs copy and paste a lot less than everyone imagines.
I can’t remember the last time I’ve copied code from StackOverflow. Actually, I rarely even use StackOverflow at this point. Going directly to the official docs is always best.