pr0d_ t1_izqjmmk wrote on December 11, 2022 at 2:35 AM

Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

yeah as per my comment, the DEiT papers explored knowledge distillation based off Vision Transformers. What you want to do here is probably similar, and the resources needed to prove it is huge to say the list. Any chance you've discussed this with your advisor?

pr0d_ t1_izqj9d8 wrote on December 11, 2022 at 2:32 AM

Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

any chance you've read the DEIT papers?

sqweeeeeeeeeeeeeeeps t1_izq7367 wrote on December 11, 2022 at 12:49 AM

Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

Is this a shit post? these are trained on real human faces. Humans look very different than cartoons.

sqweeeeeeeeeeeeeeeps t1_izq6vbc wrote on December 11, 2022 at 12:48 AM

Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

It is.

RShuk007 t1_izpydka wrote on December 10, 2022 at 11:40 PM

Reply to comment by RShuk007 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .

RShuk007 t1_izpxx5b wrote on December 10, 2022 at 11:36 PM

Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?

You can check this out by implementing

https://arxiv.org/abs/2110.11001

Or

https://arxiv.org/abs/1610.02391

On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details

I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.

MazenAmria OP t1_izpii1s wrote on December 10, 2022 at 9:39 PM

Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

To examine SWIN itself whether it's overparameterized or not.

sqweeeeeeeeeeeeeeeps t1_izphlmd wrote on December 10, 2022 at 9:33 PM

Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

? You are proving your SWIN model is overparameterized for CIFAR. Make an EVEN simpler model than those, you prob won’t be able to with off the shelf distillation. Doing this just for ImageNet literally doesn’t change anything. It’s just a different more complex dataset.

What’s your end goal? To come up with a distillation technique to make NN’s more efficient and smaller?

mr_birrd t1_izpew7e wrote on December 10, 2022 at 9:14 PM

Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

that's not how it actually learns and it's a good thing, would be complete overfit if it only gives true positives for the exact images it saw in training.

CauseSigns t1_izp90t9 wrote on December 10, 2022 at 8:36 PM

Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

Unless a model claims to detect only non-cartoon faces, it’s not a false positive. A face is a face

abhijit1247 OP t1_izp78pp wrote on December 10, 2022 at 8:24 PM

Reply to comment by Final-Rush759 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

I think so these models were trained with real human faces (eg.- http://shuoyang1213.me/WIDERFACE/) and not cartoon faces. So, the example that I have shown would be a false positive.

Final-Rush759 t1_izp316u wrote on December 10, 2022 at 7:55 PM

Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

It's not false positive. The models were trained with pictures, not far from cartoons. I think models performed really well.

deepneuralnetwork t1_izoylbj wrote on December 10, 2022 at 7:24 PM

Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

I wouldn’t assume anything with SOTA models. “State of the art” is far less impressive in the AI world than it might sound.

abhijit1247 OP t1_izoyefq wrote on December 10, 2022 at 7:23 PM

Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

Retraining the models might be my last resort to solve the issue as I still want the high performance of these models. And retraining them would definitely come at the cost of their performance.

abhijit1247 OP t1_izoxgnp wrote on December 10, 2022 at 7:16 PM

Reply to comment by deepneuralnetwork in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

I understand that, but these are state of the art face detection models (as per papers with codes), one would assume that they would have taken care of these kinds of false positives. This has been a common issue across many face detection models and I hoped that someone could suggest a model that has been trained against it. Fine-tuning the detector would be my last resort.

abhijit1247 OP t1_izow7qy wrote on December 10, 2022 at 7:07 PM

Reply to comment by sqweeeeeeeeeeeeeeeps in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

These are pretrained models, which we can access by their respective libraries and if we check at papers with code rankings, they are one of the best for face detection. I had hoped that someone would have used these libraries in their application and would have solved the issue.

suflaj t1_izorabe wrote on December 10, 2022 at 6:32 PM

Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

I don't think it's SWIN per se. I think the detectors (which take 5 feature maps of different level of detail) are incompatible with the 4 blocks of transformers which lack the spatial bias convolutional networks provide and the Tiny model being too small.

Other than that, pretraining (near) SOTA models is impractical for anyone other than big corpo for quite some time now. But you could always try asking your mentor for your uni's compute - my faculty offered GPUs ranging from 1080Tis to A100s.

Although I don't realize why you insist on pretraining SWIN, many SWIN models pretrained on ImageNet are already available. So you just have to do the distillation part on some part of the pretraining input distribution. Not only offered as part of MMCV, but Huggingface as well.

MazenAmria OP t1_izonquh wrote on December 10, 2022 at 6:08 PM

Reply to comment by suflaj in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

That's sad; I'm starting to believe that this research idea is impractical or, maybe more accurately, overly ambitious.

MazenAmria OP t1_izon556 wrote on December 10, 2022 at 6:03 PM

Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

> I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.

And that's the problem. If I got say 98% accuracy on CIFAR-10 using SWIN-Tiny and then got the same 98% with a smaller model then I'm not proving anything. There are many simple models that can get 98% on CIFAR-10 so what improvement did I introduce to the SWIN-Tiny? But doing the same thing with ImageNet would be different.

suflaj t1_izoh23q wrote on December 10, 2022 at 5:23 PM

Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

As someone who tried finetuning on SWIN as part of my graduate thesis, I will warn you that you shouldn't expect good results on the Tiny version. No matter what detector I used it performed worse than the ancient RetinaNet for some reason... Regression was near perfect, albeit with many duplicate detections, but classification was complete garbage, getting me up to 0.45 mAP (whereas Retina can get like 0.8 no problem)

So, take at least the small version.

deepneuralnetwork t1_izocssw wrote on December 10, 2022 at 4:55 PM

Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

I mean, it looks like a face. It’s not crazy for a CNN to come to the same conclusion.

If you want the model to ignore cartoon faces, you need to train it to do so. Simple as that.

sqweeeeeeeeeeeeeeeps t1_izob3yb wrote on December 10, 2022 at 4:44 PM

Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria

MNIST and Imagenrt is a huge range. Try something in between, preferably multiple. For example CIFAR-10 and CIFAR-100. I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.

David202023 t1_izo8xam wrote on December 10, 2022 at 4:28 PM

Reply to comment by gahaalt in Progress Table - is it better than TQDM for your use case? by gahaalt

Amazing, will try it soon, thanks so much :)

sqweeeeeeeeeeeeeeeps t1_izo7ejq wrote on December 10, 2022 at 4:17 PM

Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

Are you even retraining these models on cartoon faces?

gahaalt OP t1_izo28vv wrote on December 10, 2022 at 3:40 PM

Reply to comment by RichardBJ1 in Progress Table - is it better than TQDM for your use case? by gahaalt

Hello! Thanks for your feedback. Actually, Progress Table is flexible and you can display arbitrary data in table cells. It can be, for example, a string f"{epoch}/{total_epochs}". It's you who defines what will be displayed :)

To make it clearer, I created integrations.md where you can see an example of Progress Table integration with PyTorch and Keras.

Recent comments in /f/deeplearning