DigThatData

DigThatData t1_iue7pne wrote

very thought provoking stuff! I wonder if maybe an alternative interpretation of these observations might be something along the lines of deep image prior, i.e. maybe randomly initialized deep architectures are capable of performing edge detection just by virtue of how the gradient responds to the stacked operators?

1

DigThatData t1_itqbww4 wrote

CLIP is definitely what you want here, and it's unclear to me why you are so convinced that a categorical text representation is an important feature considering you're planning on projecting it to a dense text embedding anyway.

You should really learn about CLIP or at least survey the state of multi-modal representation learning before committing to your current layout.

10

DigThatData t1_it8hvbj wrote

each point on the curve represents a decision threshold. given a particular decision threshold, your model will classify points a certain way. as you increment the threshold, it will hit the score of one or more observations, creating a step function as observations are moved from one bin to another as the decision threshold moves across their score.

5

DigThatData t1_it59gaq wrote

it depends on the data. considering the kind of data your working with is one of the least mature media in the analytics industry (video), it might be both significantly more cost effective and likely to produce a high-quality result if you buy the dataset. That said, if you were thinking of spinning up an in-house data annotation resource, this might be a good opportunity to go that route, and I'm sure the ML team wouldn't have any complaints if you gave them a persistent data generating resource like that.

3