Recent comments in /f/MachineLearning

Screye t1_jcl549n wrote

Context length is also a hard limit on how many logical-hops the model can make.

If each back-n-forth takes 500-ish tokens, then the model can only reason over 16 hops over 8k tokens. With 32k tokens, it can reason over 64 hops. This might allow for emergent behaviors towards tasks that have previously been deemed impossible due to needing at least a minimum number of hops to reason about.

For what it's worth, I think memory retrieval will work just fine for 90% of scenarios and will stay relevant even for 32k tokens. Esp. if the wiki you are retrieving from is millions of lines.

3

lucidraisin t1_jcl2rkh wrote

yea, literature is scant and all over the place in the efficient attention field. in this paper, i believe they claim it is query-key dimension (d_dot), but i think it should depend on the number of heads too. i don't know of any other papers that explore this topic. i just don't want people to be surprised if they fine tune to greater context lengths and things don't work as well as gpt4

2

lucidraisin t1_jcl0y16 wrote

it is important for everyone to know that there may be a capacity limit to the context length, as explored by this paper. gpt4 may not have this limit, but smaller variants like llama may. it also depends on the task you are trying to solve. you cannot just get 'infinite context', as some would sell you that their network can do. more research needed... hopefully pytorch 2.0 leads to that

27

Taenk t1_jckzuxm wrote

Sorry, I am not an expert, just an enthusiast, so this is a stupid question: Where can I see a list of these few hundred tests and is there some page where I can see comparisons between different models?

3

juliensalinas OP t1_jcky8ok wrote

You're welcome.

A token is a unique entity that can either be a small word, part of a word, or punctuation.
On average, 1 token is made up of 4 characters, and 100 tokens are roughly equivalent to 75 words.
Natural Language Processing models need to turn your text into tokens in order to process it.

9

juliensalinas OP t1_jckwtdj wrote

No if you want such a model to "remember" previous prompts you will need to add them at the top of each requests you are making.

The output can be up to 2048 tokens. But on a Tesla T4 you might not have enough VRAM so maybe you will be limited to 1024 tokens because the GPU will run out of memory above that.

9

juliensalinas OP t1_jcktk8o wrote

This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.

Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration

12

royalemate357 t1_jckqgsr wrote

Pretty sure the main improvement is "torch.compile" which can optimize your code in a nice easy one liner. There's some other nice quality of life improvements like the built in flash attention OP is using, and I think some distributed training stuff. But it's fully backwards compatible, which is great (looking at you tensorflow) https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever

43

Spiritual-Reply5896 t1_jckq519 wrote

Lets say Linux kernel manual is embedded as memories. If we can get accurate semantic representation of the question, then we should be able to find relevant context from the memory, and use enough context to answer the question in fewer tokens compared to providing the whole Linux manual as context. If we assume that computing the attention is as fast as vector search, then its a no-brainer that retrieving only relevant context from memory is better approach than using the whole manual. Its of course a trade off between accuracy and speed/scalability, but I argue its a good tradeoff as text isn't often that information dense.

The ability to produce semantically coherent embeddings from text is the grain and salt of LLM, so why would it be any bigger problem to retrieve these memories from external / infinite database than from context window?

Im just hypothesizing with my limited knowledge, please correct me if I make stupid assumptions :)

2

f-d-t777 t1_jckpovo wrote

Subject: Spacecraft image analysis using computer vision

​

Hi guys,

Im looking to develop a system that uses computer vision algorithms to analyze images captured by spacecraft cameras and identify potential safety hazards or security threats. For example, the system could detect debris or other objects in orbit that could pose a risk to spacecraft.

I am looking to do this using all AWS tools. I am pretty new to this and am developing a technology architecture project around this topic to present for a program I'm doing.

How would I go about approaching/doing this? I am looking to find/create my own mock datasets as well as present the alogrithm/code I used to train my model. More specifically, I am focusing on these aspects for my project:

Preprocess the images: Preprocess the images to improve their quality and prepare them for analysis. This could include cropping, resizing, and adjusting the brightness and contrast of the images.

Train the computer vision algorithms: Train the computer vision algorithms using the dataset of images. There are various computer vision techniques that could be used, such as object detection, segmentation, or classification. The specific technique will depend on the requirements of the system.

​

In addition, it would be cool to have some sort of hardware/interactive portion that actually utilizes a camera to detect things in space. That can be implemented into the system. Once the computer vision algorithms have been trained and evaluated, implement the system. This could involve integrating the algorithms into a larger software system that can process images captured by spacecraft cameras in real-time.

Thank you

1