Recent comments in /f/deeplearning

RShuk007 t1_izpxx5b wrote

InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?

You can check this out by implementing

https://arxiv.org/abs/2110.11001

Or

https://arxiv.org/abs/1610.02391

On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details

I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.

1

sqweeeeeeeeeeeeeeeps t1_izphlmd wrote

? You are proving your SWIN model is overparameterized for CIFAR. Make an EVEN simpler model than those, you prob won’t be able to with off the shelf distillation. Doing this just for ImageNet literally doesn’t change anything. It’s just a different more complex dataset.

What’s your end goal? To come up with a distillation technique to make NN’s more efficient and smaller?

1

abhijit1247 OP t1_izoxgnp wrote

I understand that, but these are state of the art face detection models (as per papers with codes), one would assume that they would have taken care of these kinds of false positives. This has been a common issue across many face detection models and I hoped that someone could suggest a model that has been trained against it. Fine-tuning the detector would be my last resort.

−8

abhijit1247 OP t1_izow7qy wrote

These are pretrained models, which we can access by their respective libraries and if we check at papers with code rankings, they are one of the best for face detection. I had hoped that someone would have used these libraries in their application and would have solved the issue.

−6

suflaj t1_izorabe wrote

I don't think it's SWIN per se. I think the detectors (which take 5 feature maps of different level of detail) are incompatible with the 4 blocks of transformers which lack the spatial bias convolutional networks provide and the Tiny model being too small.

Other than that, pretraining (near) SOTA models is impractical for anyone other than big corpo for quite some time now. But you could always try asking your mentor for your uni's compute - my faculty offered GPUs ranging from 1080Tis to A100s.

Although I don't realize why you insist on pretraining SWIN, many SWIN models pretrained on ImageNet are already available. So you just have to do the distillation part on some part of the pretraining input distribution. Not only offered as part of MMCV, but Huggingface as well.

3

MazenAmria OP t1_izon556 wrote

> I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.

And that's the problem. If I got say 98% accuracy on CIFAR-10 using SWIN-Tiny and then got the same 98% with a smaller model then I'm not proving anything. There are many simple models that can get 98% on CIFAR-10 so what improvement did I introduce to the SWIN-Tiny? But doing the same thing with ImageNet would be different.

1

suflaj t1_izoh23q wrote

As someone who tried finetuning on SWIN as part of my graduate thesis, I will warn you that you shouldn't expect good results on the Tiny version. No matter what detector I used it performed worse than the ancient RetinaNet for some reason... Regression was near perfect, albeit with many duplicate detections, but classification was complete garbage, getting me up to 0.45 mAP (whereas Retina can get like 0.8 no problem)

So, take at least the small version.

5

gahaalt OP t1_izo28vv wrote

Hello! Thanks for your feedback. Actually, Progress Table is flexible and you can display arbitrary data in table cells. It can be, for example, a string f"{epoch}/{total_epochs}". It's you who defines what will be displayed :)

To make it clearer, I created integrations.md where you can see an example of Progress Table integration with PyTorch and Keras.

3