Recent comments in /f/MachineLearning

michaelthwan_ai OP t1_jcxrcfu wrote

I agree with you. 3 thoughts from me

- I think one direction of the so-called safety AI to give a genuine answer, is to give it factual/external info. I mean 1) a Retrieval-based model like searchGPT 2) API calling like toolformer (e.g. check weather API)

- LLM, is essentially a compression problem (I got the idea in lambdalabs). But it cannot remember everything. Therefore an efficient way to solve so are retrieval methods to search a very large space (like pagerank/google search), then obtain a smaller result set and let the LLM organize and filter related content from it.

- Humans are basically like that right? But if we got a query, we may need to read books (external retrieval) which is pretty slow. However, humans have a cool feature, long-term memory, to store things permanently. Imagine if an LLM can select appropriate things during your queries/chat and store them as a text or knowledge base inside it, then it is a knowledge profile to permanently remember the context bonded between you and the AI, instead of the current situation that ChatGPT will forget everything after a restart.

3

Alternative_iggy t1_jcxrabk wrote

I guess I’m at year 10+ now. In the last few years I’ve switched back to academia/ research!

I started out in research where I was happy but made no money, switched to a startup where I made a little more money but often got bored of the problems, when a startup got bought by a bigger company found myself working on sometimes WAY cooler problems but had to deal with a lot of bureaucracy, moved up in management roles for a bit, and now hopped back to straight up research where I’m incredibly happy and have a lot more flexibility.

6

darthstargazer t1_jcxqcgw wrote

Subject : Variational inference and genarative networks

I've been trying to grasp the ideas behind Variational auto encoders (Kingma et al) vs normalized flows (E.G RealNVP)

If someone can explain the link between the two I'd be thankful! Aren't they trying to do the same thing?

1

henkje112 t1_jcxlc7t wrote

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

2

SnooMarzipans3021 t1_jcxk1a3 wrote

Hello, does anyone have experience with vision transformers?

I get wierd grid artifacts, especially on white / bright, textureless walls or sky.

Here is how it looks like: https://imgur.com/a/dwF69Z3
Im using maxim architecture: https://github.com/vztu/maxim-pytorch

My general task is image enchancement (make image prettier)
I have also tried simple GAN methods https://github.com/eezkni/UEGAN which doesnt have such issues

I have researched a bit but im unable to formualte this problem properly. I have found that guided filters might help here but havent tested them yet. Thanks

1

henkje112 t1_jcxjx44 wrote

I'm assuming you're using sklearn for LinearRegression. You're initializing an instance of the LinearRegression class with a normalize parameter, but this is not valid for this class (for a list of possible parameters, see the documentation).

I'm not sure what you're trying to do, but I think you want to normalize your input data? In that case you should ook at MinMaxScaler. This transforms your features by scaling each feature to a given range.

1

fromnighttilldawn t1_jcxgr6b wrote

I don't read any of the papers because there is basically no way to re-implement them or independently verify them. People can feel shocked, surprised, amazed or enlightened all they want while reading the paper but in truth people still have no idea how any of this truly work.

Before at least you had a mathematical model to work with which shows that even at small-scale this idea can lead to something that work as promised on a larger-scale ML model.

Nowadays OpenAI can claim that Jesus came back and cleaned the data for their model and we would actually have no way to actually verify the veracity of this claim.

18

Civil_Collection7267 t1_jcx9jri wrote

LLaMA 13B/30B and LLaMA 7B with the Alpaca LoRA are the best that can be run locally on consumer hardware. LLaMA 65B exists but I wouldn't count that as something that can be run locally by most people.

From my own testing, the 7B model with the LoRA is comparable to 13B in coherency, and it's generally better than the recently released OpenAssistant model. If you'd like to see some examples, I answered many prompts in a r/singularity AMA for Alpaca. Go to this post and sort by new to see the responses. I continued where the OP left off.

10

mike94025 t1_jcx5xvg wrote

Data type?

SDPA currently has 3 kernels implemented by a kernel picker.

  • sdpa_math
  • sdpa_flash
  • sdpa_mem_eff

A kernel picker picks the best given your constraints

  • Math is the trusted kernel from the equation in the paper.
  • Flash only works for FP16 and BF16, and on SM80 (e.g., A100).
  • mem_efficient kernel works on older architecture levels, and supports FP32, but the upside is limited due to lack of compute capacity for FP32. FP16 or BF16 should help. Also, there are requirements on alignment, dropout values etc to qualify for the high-perf SDPA implementations. Dropout required to be 0 @ PT2.0

Also, different kernels parallelize across different dimensions, so B=1 will not work with all of those kernels.

In a nutshell, performance comes at the price of generality, and GPUs are finnecky to get the performance, so our inputs must adhere to those, and parallelization strategies matter for different combinations of dimensions.

5

kaikraul t1_jcx0hhm wrote

Yes, but you're working with older models. But the advantage is that you can adapt them to you. You can get results that you can't get with all the online models. But they lag behind from the current modulation. And I just wonder is it worth it? By the time I have downloaded the whole thing, trained and get results, the whole thing is already 2-3 versions further. I'm running behind on quality.

2

disastorm t1_jcwyjyv wrote

I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?

1