Recent comments in /f/MachineLearning

rylo_ren_ t1_jcvak4c wrote

Hi everyone! This is a simple troubleshooting question. I'm in my master's program for python and I keep running into an issue when I try running this code for a linear regression model:

airfares_lm = LinearRegression(normalize=True)

airfares_lm.fit(train_X, train_y)

print('intercept ', airfares_lm.intercept_) print(pd.DataFrame({'Predictor': X.columns, 'coefficient': airfares_lm.coef_}))

print('Training set') regressionSummary(train_y, airfares_lm.predict(train_X)) print('Validation set') regressionSummary(valid_y, airfares_lm.predict(valid_X))

It keeps returning this error:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last) /var/folders/j1/1b6bkxw165zbtsk8tyf9y8dc0000gn/T/ipykernel_21423/2993181547.py in <cell line: 1>() ----> 1 airfares_lm = LinearRegression(normalize=True) 2 airfares_lm.fit(train_X, train_y) 3 4 # print coefficients 5 print('intercept ', airfares_lm.intercept_)

TypeError: init() got an unexpected keyword argument 'normalize'

I'm really lost, any help would be greatly appreciated! I know there's other ways to do this but I was hoping to try to use this technique since it's the primary way that my TA codes regression models. Thank you!

1

BalorNG t1_jcv99cz wrote

Just like humans, LLMs learn patterns and relationships, not "facts" unless you make it memorize it by repeating training data over and over, but it makes other aspects of the system to degrade.

So, LLMs should be given all the tools humans use to augment their thought - spreadsheets, calculators, databases, CADs, etc and allow them to interface them quickly and efficiently.

11

mike94025 t1_jcv94un wrote

SDPA is used by F.multi_head_attention_forward (if need_weights=False) which is used by nn.MHA and nn.Transformer* as well as other libraries. (source)

Public service announcement: need_weights defaults to True, and guts performance. (Because allocating and writing the attention weight tensor defeats the memory BW advantages of flash attention.)

Also, if `key_padding_mask is not None` performance will suffer (because this is converted into an attention mask, and only the causal attention mask is suppprted by Flash Attention). Use Nested Tensors for variable sequence length batches.

1

mike94025 t1_jcv83hu wrote

Yes - use the backend context manager to disable all other backends to see that you're running the one you want. (Otherwise, since all other backends are disabled, you'll get an error.)

SDPA context manager is intended to facilitate debug (for perf or correctness), and is not (and should not be) required for normal operational usage.

Check out the SPDA tutorial at https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html#explicit-dispatcher-control

1

Secret-Fox-5238 t1_jcv5dhh wrote

This is completely false. Elastic was invented by SQL. You use things like “LIKE” and a few other choice keywords. Just google them or go to Microsoft directly and look at sql select statements. You can string together CTE’s which immediately gives you elasticity. So, sorry, but this is a nonsensical response

−5

nenkoru t1_jcus6rg wrote

Made a few issues and a pull request for changes in the source code adding support for DuckDuckGo. So if anyone willing to ditch Bing as a dependency and OpenAI(in the future) make sure to keep an eye on this project.

I liked the idea that it's all within a terminal. No need to open a browser and ask for questions. Pretty useful for searching without switching a cognitive context from a vim tab with the code to a browser. In december I did something similar with just a wrapper around OpenAI completion and was asking questions about coding. In combination with codequestion it was pretty useful. This one(XOXO) makes it a much pleasant experience.

&#x200B;

Cheers!

11

millenial_wh00p t1_jcuq0zo wrote

No, unfortunately most of my work is with tabular data with a bit of computer vision- I haven’t looked into any application of language models in that area unfortunately. In theory the tokenization in language models shouldn’t be much different than features in tabular/imagery data. There probably are some parallels worth exploring there, I’m just not aware of any papers.

7

alfredr OP t1_jcuoqg4 wrote

Point taken on the "gold rush". My background is CS Theory so the incorporation of combinatorial methods feels right at home. Along these lines, are you aware of the use of any work incorporating (combinatorial) logic verification into generative language models? The end goal would be improved argument synthesis (e.g. mathematical proofs, say)

4

millenial_wh00p t1_jcun8jw wrote

Well beware open ended questions about ai/ml research in the current “gold rush” environment. If you’re into explainability and interpretability, some folks are looking into combinatorial methods for features and their interactions to predict data coverage. This plus anthropic’s papers start to open up some new ground in interpretability for CV.

https://arxiv.org/pdf/2201.12428.pdf

11

alfredr OP t1_jcumejc wrote

I'm an outsider interested in learning the landscape so my intent is to leave the question open-ended, but I'm broadly interested in architectural things like layer-design, attention mechanisms, regularization, model compression, as well as bigger picture considerations like interpretability, explainability, and fairness.

9

millenial_wh00p t1_jcuksz1 wrote

What aspects? New models? Interpretability? Pipelines and scalability? Reinforcement learning? Data assurance? Too many subfields to narrow down in this question to produce a decent list, imo.

With that said, my subfield is in assurance, and some of anthropic’s work in interpretability and privileged bases is extremely interesting. Their toy models paper and the one they released last week about privileged bases in the transformer residual stream present a very novel way of thinking about model explainabity.

29

Jonathan358 t1_jcuh7ya wrote

Hello, I have a very simple question but cannot find any info on:

How to create an exponential range (squared) for hyperparameter values to be tuned? E.g. from 2-64, increament in steps of 2^2?

Not looking for a complicated solution involving lists, ect.

ff_dim=hp.Int('ff_dim', min_value=2, max_value=64, step=n^2)

edit: solved with, sampling="log"

1

ninjasaid13 t1_jcufsqf wrote

I'm getting a new error

C:\Users\ninja\source\repos\alpaca.cpp&gt;make chat
process_begin: CreateProcess(NULL, uname -s, ...) failed. 
process_begin: CreateProcess(NULL, uname -p, ...) failed. 
process_begin: CreateProcess(NULL, uname -m, ...) failed. 
'cc' is not recognized as an internal or external command, 
operable program or batch file. 'g++' is not recognized as an 
internal or external command, operable program or batch file. 
I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I 
CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -
mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX:
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat process_begin: CreateProcess(NULL, g++ 
-I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp 
ggml.o utils.o -o chat, ...) failed. make (e=2): The system 
cannot find the file specified. Makefile:195: recipe for 
target 'chat' failed make: *** [chat] Error 2
1