elbiot t1_je8iym9 wrote on March 30, 2023 at 4:38 AM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

Looks like this was trained on just 150 x-rays and does very well: https://paperswithcode.com/paper/xnet-a-convolutional-neural-network-cnn

Edit: did you look for pre-existing solutions? This was like the second google result. If I were you I'd be looking for public datasets I could use for pretraining and then finetune on my data

_sbmaruf t1_je8iuvl wrote on March 30, 2023 at 4:37 AM

Reply to comment by WarmSignificance1 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

We just released the dataset last week. We are in the process of training some autoregressive models.

GFrings t1_je8i7ro wrote on March 30, 2023 at 4:31 AM

Reply to comment by Technical-Vast1314 in [R] You Only Segment Once: Towards Real-Time Panoptic Segmentation [CVPR 2023] by Technical-Vast1314

Isn't semantic segmentation made redundant by the instance segmentation? Or is there a difference in coverage for the two tasks, in terms of the ground truth labels?

lgastako t1_je8i6dw wrote on March 30, 2023 at 4:30 AM

Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo

Not generally very well.

elbiot t1_je8i0i2 wrote on March 30, 2023 at 4:29 AM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

The second link says fine tuning is a substitute for lengthy prompts, including putting more into it than can fit in the longest prompt. Prompts are a way to give the model new information. What is your definition of knowledge that isn't something you can put into a prompt?

ghostfaceschiller t1_je8habj wrote on March 30, 2023 at 4:21 AM

Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Could you extrapolate what you mean here? I'm not sure I'm following

VelvetyPenus t1_je8gktg wrote on March 30, 2023 at 4:14 AM

Reply to [D] The best way to train an LLM on company data by jaxolingo

First person to use AI to embezzle majority of company profits arrested. Convict used Reddit to ask how to turn company data into AI dataset.

m98789 t1_je8gdwb wrote on March 30, 2023 at 4:12 AM

Reply to comment by MadScientist-1214 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

CVPR is not a journal

huyouare t1_je8fby1 wrote on March 30, 2023 at 4:02 AM

Reply to comment by Appropriate_Ant_4629 in [D] The best way to train an LLM on company data by jaxolingo

I was wondering how this relates to retrieval or SQL queries but it sounds like you’re suggesting that OP finetunes on their dataset regularly. Might be good to try in combination with retrieval, but how would you represent the tabular data as training examples?

r_linux_mod_isahoe t1_je8f4nc wrote on March 30, 2023 at 4:00 AM

Reply to [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76

RIP this sub. Was nice knowing ya all

netham91 t1_je8dxrt wrote on March 30, 2023 at 3:49 AM

Reply to comment by athos45678 in [D] The best way to train an LLM on company data by jaxolingo

Thanks

light24bulbs t1_je8d6bh wrote on March 30, 2023 at 3:42 AM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Continuous retraining is something else.

I'll be training llama soon, I'll get back to you with how it goes.

xander76 OP t1_je8d46w wrote on March 30, 2023 at 3:41 AM

Reply to comment by bluenigma in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76

Yep, I should have been clearer that (at least for now) it only supports JSON-compatible types and doesn’t support type aliases or classes (although we have a prototype version that does). The limitations are documented at https://imaginary.dev/docs/writing-an-imaginary-function .

hitechnical t1_je8d0ev wrote on March 30, 2023 at 3:40 AM

Reply to comment by [deleted] in [D] Simple Questions Thread by AutoModerator

I heard Standford’s LLM can run in smaller devices. Pls google.

HangOutWithMyYangOut t1_je8b3d6 wrote on March 30, 2023 at 3:23 AM

Reply to comment by xander76 in [P] Imaginary programming: implementation-free TypeScript functions for GPT-powered web development by xander76

Awesome thanks so much I'll be diving into it tomorrow

LetGoAndBeReal t1_je8akb1 wrote on March 30, 2023 at 3:18 AM

Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo

I would agree with that last statement. You think you understand this, but you don’t seem to understand what does and doesn’t happen during fine-tuning or to realize that the problem of adding knowledge to LLMs is a notoriously difficult problem that ongoing research is trying to solve.

Try looking at some of the research: https://openreview.net/forum?id=vfsRB5MImo9

Or read what OpenAI says fine-tuning accomplishes: https://platform.openai.com/docs/guides/fine-tuning

Or, better yet, try actually getting a LLM to learn new facts by fine-tuning it. Then you will understand.

[deleted] t1_je8ajeo wrote on March 30, 2023 at 3:18 AM

Reply to comment by Hands0L0 in [D] The best way to train an LLM on company data by jaxolingo

[removed]

Appropriate_Ant_4629 t1_je89aho wrote on March 30, 2023 at 3:07 AM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Databricks announced this week that they're trying to make it easy:

https://www.reuters.com/technology/databricks-pushes-open-source-chatbot-cheaper-chatgpt-alternative-2023-03-24/

Haven't tried it yet, but we will soon.

machineko t1_je88wj9 wrote on March 30, 2023 at 3:04 AM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

machineko t1_je888dc wrote on March 30, 2023 at 2:58 AM

Reply to [D]Suggestions on keeping Llama index cost down by darkbluetwilight

Why not use open source models. Especially it seems like you are trying not to sell the model for commercial purposes, you can easily replace it with open source models. Also, for retrieval-augmented generation, smaller models can be very effective.