Recent comments in /f/deeplearning

eternal-abyss-77 OP t1_iwfpfz7 wrote

Sir, firstly thanks for responding.

I already have implemented this as a working program. But now I am enhancing it, and I have some feeling that I somehow am missing something from the paper, and not understanding it properly.

For example:

The equations [2, 4, 6, 8, 10, 15-18] in pages 3, 4 and 5

The training of the model with generated features with Linear LSE mentioned in page 6-7

And section B Local pixel difference descriptor, para 2 regarding directions. And it's related figures, Figure 3(a,b).

If you can explain these things, i can effectively understand your explanation and ask my doubts wrt my present work on this paper, with code.

0

sEi_ t1_iwfmwgh wrote

Bad link. FFS ALWAYS check if the links you post anywhere works! - I'm senior webdev and know the importance of always checking posted links. Learned the hard way.

I can with wizardry deduct the right url from the bad double url, but would be better if the link is corrected so anyone can view the link you posted.

EDIT: (If) you can not edit the OP then delete it and make new better one.

EDIT2: I need to register to read the paper!

1

BugSlayerJohn t1_iwa32dc wrote

First of all, you don't want an identical or nearly identical weight matrix. You won't achieve that and you don't need to. In principle a well designed model should NOT make radically different predictions when retrained, particularly with the same data, even though the weight matrices will certainly differ at least a little and possibly a lot. The same model trained two different times on the same data with the same hyperparameters will generally converge to nearly identical behaviors, right down to which types of inputs the final model struggles with. If you have the original model, original data, and original hyperparameters, definitely don't be frightened to retrain a model.

If your use case requires you to be able to strongly reason about similarity of inference, you could filter your holdout set for the inputs that both models should accurately predict, run inference for that set against both models, and prepare a small report indicating the similarity of predictions. This should ordinarily be unnecessary, but since it sounds like achieving this similarity is a point of concern, this would allow you to measure it, if for no other purpose than to assuage fears. You should likely expect SOME drift in similarity, the different versions won't be identical, so if the similarity is not as high as you like consider manually reviewing a list of inputs that the two models gave different predictions for to confirm the rate at which the difference really is undesirable.

1

HMasterSunday t1_iw9qr8l wrote

Interesting, I didn't try a test run to time both approaches, I'll do that more often. As per your other point though, my code does account for that already, the number of individual cuts is 1/4 of the length of the full array (len(input_array)/4) so it splits it up into arrays of length 4 anyways. That much I do know at least.

1

sckuzzle t1_iw9fe1i wrote

Writing "short" code isn't a always good thing. Yes your suggestion has less lines, but:

  • It takes ~6 times as long to run

  • It does not return the correct output (split does not take every nth value, but rather groups it into n groups)

I'm absolutely not claiming my code was optimized, but it did clearly show the steps required to calculate the necessary output, so it was easy to understand. Writing "short" code is much more difficult to understand what is happening, and often leads to a bug (as seen here). Also, depending on how you are doing it, it often takes longer to run (the way it was implemented requires it to do extra steps which aren't necessary).

1

HMasterSunday t1_iw9allv wrote

also: numpy.split can create several cuts off of a numpy array so it simplifies to:

import numpy as np def process_data(input_array): cut_array = numpy.split(input_array, (len(array)/4)) max_array =[ ] for cut in cut_array: max_array.append(max(cut)) return max_array

much shorter, used this method recently so it's on the front of my mind

edit: don't know how to format on here, sorry

1

ContributionWild5778 t1_iw97xid wrote

I believe that is an iterative process when doing transfer learning. First you will always freeze the top layers because low level feature extraction is done over there (extracting lines and contours). Unfreeze the last layers and try to train those layers only where high level features are extracted. At the same time it also depends on how different the new dataset is using which you are training the model. If it contains similar characteristics/features freezing top layers would be my choice

1

sckuzzle t1_iw8jv72 wrote

Why are you using a "model" / MLPs at all for this? This is a strictly data processing problem with no model creation required.

Just process your data by throwing away 75% of it, then take the max, then check if each value is equal to the maximum.

Something like (python):

import numpy as np

def process_data(input_array):
  every_fourth = []
  for i in range(len(input_array)):
    if (i+1)%4==0:
      every_fourth.append(input_array[i])
  max_value = max(every_fourth)
  matching_values = (np.array(every_fourth) == max_value)
  return matching_values
8

RichardBJ1 t1_iw733qt wrote

Yes …obviously freezing the only two layers would be asinine! There is a keras blog on it, I do not know why particular layers (TL;DR). It doesn’t say top and bottom that’s for sure. …I agree it would be nice to have method in the choice of layers to freeze rather than arbitrary. I guess visualising layer output might help choose if a small model, but I’ve never tried that. So I do have experience of trying transfer learning, but (apart from tutorials) no experience of success with transfer learning!

1

RichardBJ1 t1_iw71rpv wrote

Good question; I do not have a source for that, have just heard colleagues saying that. Obviously the reason for freezing layers is that we are trying to avoid loosing all the information we have already gained. Should speed up further training by reducing parameter numbers etc. As to actually WHICH layers are best persevered I don’t know. When I have read on it, people typically say “it depends”. But actually my point was I have never found transfer learning to be terribly effective (apart from years ago when I ran a specific transfer learning tutorial!). In my models it only takes a few days to start from scratch and so this it what I do! Transfer learning obviously makes enormous sense if you are working with someone else’s extravagantly trained model and you may be don’t even have the data. But in my case I always do have all the data…

1