Recent comments in /f/MachineLearning

TheGuywithTehHat t1_jcrsjlo wrote

Most of that makes sense. The only thing I would be concerned about is the model training test. Firstly, a unit test should test the smallest possible unit. You should have many unit tests to test your model, and you should focus on those tests being as simple as possible. Nearly every function you write should have its own unit test, and no unit test should test more than one function. Secondly, there is an important difference between verification and validation testing. Verification testing shouldn't test for any particular accuracy threshold or anything like that, it should at most verify things like "model.fit() causes the model to change" or "a linear regression model that is all zeroes produces an output of zero." Verification testing is what you put on your CI pipeline to sanity check your code before it gets merged to master. Validation testing, however, should test model accuracy. It should go on your CD pipeline, and should validate that the model you're trying to push to production isn't low quality.

2

A1-Delta t1_jcrpd05 wrote

Interesting project! I’ve seen many suggest that the training data for transfer learning might actually be the biggest thing holding Alpaca back from a ChatGPT like experience. In other words, that although the OpenAI model allows for the creation of a lot of training data, that data might include a lot of low quality pairs that in an ideal world wouldn’t be included. Do you have any plan to increase the quality of your dataset in addition to the size of it?

I hear your concern about the LLaMA license. It might be bad advice, but personally I wouldn’t worry about it. This is a very popular model people are using for all sorts of things. The chance they are going to come after you seems to me to be small and my understanding is that it’s sort of uncharted legal ground once you’ve done significant fine tuning. That being said, I’m not a lawyer.

LLaMA is a very powerful model and I would hate for you to put all this effort into creating something that ends up being limited and not clearly better than Alpaca simply because of license fears. If I were you though, I’d go with the 13B version. Still small enough to run on many high end consumer GPUs after quantization while providing significantly better baseline performance than the 7B version.

20

relevantmeemayhere t1_jcrotun wrote

Mm, not really.

Bootstrapping is used to determine the standard error of estimates using resampling. From here we can derive tools like confidence intervals, or other interval estimates.

Generally speaking you do not use the bootstrap to tweak the parameters of your model. You use cross validation to do so.

10

baffo32 t1_jcronvh wrote

- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)

- pruning (removing parts of the model that didn’t end up impacting outputs after training)

- further quantization below 4 bits

- distilling to a mixture of experts?

- factoring and distilling parts out into heuristic algorithms?

- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small

EDIT:

- numerous techniques published in papers over the past few years

- distilling into an architecture not limited by e.g. a constraint of being feed forward

3

MysteryInc152 t1_jcrnqc8 wrote

You can try training chatGLM. 6b parameters and initially trained on 1T English/Chinese Tokens. Also completely open source. However, it's already been fine tuned and had RLHF but that was optimized for Chinese Q/A. Could use some English work,

Another option is RWKV. There are 7b and 14b models(I would go with the 14b, it's the better of the two) fine tuned to a context length of 8196 tokens. He plans on increasing context further too.

17

Fender6969 OP t1_jcrnppi wrote

Thanks for the response. I think hardcoding things might make the most sense. Ignoring testing the actual data for a minute, let us say I have an ML pipeline with the following units:

  1. Data Engineering: method that queries data, performs further aggregation in Pandas/PySpark
    1. Unit test: hardcode an input to pass into this function and leverage Pytest/unittest to check for the exact output'
  2. Model Training: method that engineers features and passes data into Sklearn pipeline, which scales/encodes data and trains ML model
    1. Unit test: check for successful predictions on training data to a degree of accuracy based on your evaluation metric
  3. Model Serving: first method that performs ETL for prediction data and second method that loads Sklearn pipeline object to serve prediction
    1. Unit test:
      1. Module 1: same as Data Engineering
      2. Module 2: check for successful predictions

Does the above unit tests make sense to add in a CI pipeline?

1

username001999 t1_jcrn1aq wrote

We Americans live in a country where kids are regularly gunned down in school so we make ourselves feel better by making jokes about how much worse other countries are for events that happened over 30 years ago. Or we don’t even know our own history, like the Kent State Massacre.

−7

Philpax t1_jcrgxbb wrote

As the other commenter said, it's unlikely anyone will advertise a service like this as LLaMA's license terms don't allow for it. In your situation, I'd just rent a cloud GPU server (Lambda Labs etc) and test the models you care about. It'll only end up being a dollar or two if you're quick with your use.

2