Recent comments in /f/MachineLearning

myself991 t1_jcmhn3k wrote

Hi everybody,

I forgot to submit my file for a conference, but cmt3 submission section was open about 45 minutes passed the deadline. Therefore, I could upload it there.

I was wondering if anybody had any experience with submitting supplementary material to cmt3 for a conference an hour after the deadline? Are they going to remove the paper, although they kept the uploading section open?

Also, do conferences normally set deadline in cmt3 a little more than after deadline?

Thanks,

1

HateRedditCantQuitit t1_jcmdot7 wrote

I think of context as a end-to-end connected version of retrieval. You can backprop from loss to retrieved info, but you also want to backprop from loss to the non-retrieved info, which would basically be equivalent to having it all in context (in a handwavy way). Which is to say that just having more context is a simple solution.

I think everyone knows increasing context length is not 100% sufficient, but it sure is a simple convenient solution.

3

bo_peng OP t1_jcmajpx wrote

  • RWKV-LM is now mainly for training, while ChatRWKV is for optimal inference.
  • Someone in RWKV Discord tried it using LoRA (https://github.com/Blealtan/RWKV-LM-LoRA) and the result is quite nice. Join RWKV Discord for latest updates :)
3

Necessary-Meringue-1 t1_jcm5mye wrote

>These models are learning vastly more than language alone

A child growing up does too.

>These models are learning in an extraordinarily difficult way with *only* "predict next word" feedback and nothing else

Literally the point, that LLMs do not learn language like humans at all. Unless you're trying to say that you and I are pure Skinner-type behavioralist learners.

1

Necessary-Meringue-1 t1_jcm4o9d wrote

>the Transformer is proof by demonstration that you don't need a language-specific architecture to learn language, and also that you can learn language via prediction feedback, which it highly likely how our brain does it too.

where to even start, how about this:

The fact that a transformer can appear to learn language on a non-specific architecture does not at all mean that humans work the same way.

​

Did you ingest billions of tokens of English growing up? How did you manage to have decent proficiency at the age of 6? Did you read the entire common crawl corpus by age 10?

​

This kind of argument is on paper stilts. LLMs are extremely impressive, but that does not mean they tell you much about how humans do language.

1

royalemate357 t1_jclz4t0 wrote

hmm, I am not too sure but their blogpost says this:

>TorchInductor uses a pythonic define-by-run loop level IR to automatically map PyTorch models into generated Triton code on GPUs and C++/OpenMP on CPUs.

so it seems like they support CPU. I also tried it briefly on google colab CPU-only, and it seems to work (i didn't benchmark speed though). I doubt it supports non cuda GPUs but then again support for those even in the general case isnt very good.

8

gonomon t1_jclwgtg wrote

Subject: Generating Synthetic Data for Human Action Recognition
Hello,

In my master's thesis, I generated a realistic dataset that
can be used for human action recognition (using the Unity engine). The dataset
contains 2D - 3D pose information and RGB videos. I wanted to test the effects
of this dataset on real-world action detection (directly on videosYouTube) when
the classifier is trained with synthetic data in addition to real-data (NTU
120).
I want to use skeleton-based action recognition methodology
(since it outperforms RGB-only methodologies for NTU 120) and to achieve this I
applied a pose estimator to videos from YouTube, our synthetic dataset, and
NTU120 and trained them since I believe instead of using directly sterile
ground truth information of our dataset, I can apply pose estimator and use
those pose informations directly instead of worrying with domain adaptation
strategies.
Question is: Should I have directly used ground truth pose
information of our synthetic data in trainings with real-data, or the thing I
did does make sense? If there is any usage of pose estimators as domain
adaptation methods, I would be extremely happy if you can share the papers when
commenting.
Best,

1