Recent comments in /f/deeplearning
pr0d_ t1_izqj9d8 wrote
Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
any chance you've read the DEIT papers?
sqweeeeeeeeeeeeeeeps t1_izq7367 wrote
Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
Is this a shit post? these are trained on real human faces. Humans look very different than cartoons.
sqweeeeeeeeeeeeeeeps t1_izq6vbc wrote
Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
It is.
RShuk007 t1_izpydka wrote
Reply to comment by RShuk007 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .
RShuk007 t1_izpxx5b wrote
Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?
You can check this out by implementing
https://arxiv.org/abs/2110.11001
Or
https://arxiv.org/abs/1610.02391
On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details
I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.
MazenAmria OP t1_izpii1s wrote
Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
To examine SWIN itself whether it's overparameterized or not.
sqweeeeeeeeeeeeeeeps t1_izphlmd wrote
Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
? You are proving your SWIN model is overparameterized for CIFAR. Make an EVEN simpler model than those, you prob won’t be able to with off the shelf distillation. Doing this just for ImageNet literally doesn’t change anything. It’s just a different more complex dataset.
What’s your end goal? To come up with a distillation technique to make NN’s more efficient and smaller?
mr_birrd t1_izpew7e wrote
Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
that's not how it actually learns and it's a good thing, would be complete overfit if it only gives true positives for the exact images it saw in training.
CauseSigns t1_izp90t9 wrote
Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
Unless a model claims to detect only non-cartoon faces, it’s not a false positive. A face is a face
abhijit1247 OP t1_izp78pp wrote
Reply to comment by Final-Rush759 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
I think so these models were trained with real human faces (eg.- http://shuoyang1213.me/WIDERFACE/) and not cartoon faces. So, the example that I have shown would be a false positive.
Final-Rush759 t1_izp316u wrote
Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
It's not false positive. The models were trained with pictures, not far from cartoons. I think models performed really well.
deepneuralnetwork t1_izoylbj wrote
Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
I wouldn’t assume anything with SOTA models. “State of the art” is far less impressive in the AI world than it might sound.
abhijit1247 OP t1_izoyefq wrote
Reply to comment by abhijit1247 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
Retraining the models might be my last resort to solve the issue as I still want the high performance of these models. And retraining them would definitely come at the cost of their performance.
abhijit1247 OP t1_izoxgnp wrote
Reply to comment by deepneuralnetwork in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
I understand that, but these are state of the art face detection models (as per papers with codes), one would assume that they would have taken care of these kinds of false positives. This has been a common issue across many face detection models and I hoped that someone could suggest a model that has been trained against it. Fine-tuning the detector would be my last resort.
abhijit1247 OP t1_izow7qy wrote
Reply to comment by sqweeeeeeeeeeeeeeeps in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
These are pretrained models, which we can access by their respective libraries and if we check at papers with code rankings, they are one of the best for face detection. I had hoped that someone would have used these libraries in their application and would have solved the issue.
suflaj t1_izorabe wrote
Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
I don't think it's SWIN per se. I think the detectors (which take 5 feature maps of different level of detail) are incompatible with the 4 blocks of transformers which lack the spatial bias convolutional networks provide and the Tiny model being too small.
Other than that, pretraining (near) SOTA models is impractical for anyone other than big corpo for quite some time now. But you could always try asking your mentor for your uni's compute - my faculty offered GPUs ranging from 1080Tis to A100s.
Although I don't realize why you insist on pretraining SWIN, many SWIN models pretrained on ImageNet are already available. So you just have to do the distillation part on some part of the pretraining input distribution. Not only offered as part of MMCV, but Huggingface as well.
MazenAmria OP t1_izonquh wrote
Reply to comment by suflaj in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
That's sad; I'm starting to believe that this research idea is impractical or, maybe more accurately, overly ambitious.
MazenAmria OP t1_izon556 wrote
Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
> I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.
And that's the problem. If I got say 98% accuracy on CIFAR-10 using SWIN-Tiny and then got the same 98% with a smaller model then I'm not proving anything. There are many simple models that can get 98% on CIFAR-10 so what improvement did I introduce to the SWIN-Tiny? But doing the same thing with ImageNet would be different.
suflaj t1_izoh23q wrote
Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
As someone who tried finetuning on SWIN as part of my graduate thesis, I will warn you that you shouldn't expect good results on the Tiny version. No matter what detector I used it performed worse than the ancient RetinaNet for some reason... Regression was near perfect, albeit with many duplicate detections, but classification was complete garbage, getting me up to 0.45 mAP (whereas Retina can get like 0.8 no problem)
So, take at least the small version.
deepneuralnetwork t1_izocssw wrote
Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
I mean, it looks like a face. It’s not crazy for a CNN to come to the same conclusion.
If you want the model to ignore cartoon faces, you need to train it to do so. Simple as that.
sqweeeeeeeeeeeeeeeps t1_izob3yb wrote
Reply to Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
MNIST and Imagenrt is a huge range. Try something in between, preferably multiple. For example CIFAR-10 and CIFAR-100. I would expect it to perform more similarly to the full SWIN model on citar-10 because less data complexity.
David202023 t1_izo8xam wrote
Reply to comment by gahaalt in Progress Table - is it better than TQDM for your use case? by gahaalt
Amazing, will try it soon, thanks so much :)
sqweeeeeeeeeeeeeeeps t1_izo7ejq wrote
Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
Are you even retraining these models on cartoon faces?
gahaalt OP t1_izo28vv wrote
Reply to comment by RichardBJ1 in Progress Table - is it better than TQDM for your use case? by gahaalt
Hello! Thanks for your feedback. Actually, Progress Table is flexible and you can display arbitrary data in table cells. It can be, for example, a string f"{epoch}/{total_epochs}". It's you who defines what will be displayed :)
To make it clearer, I created integrations.md where you can see an example of Progress Table integration with PyTorch and Keras.
pr0d_ t1_izqjmmk wrote
Reply to comment by MazenAmria in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
yeah as per my comment, the DEiT papers explored knowledge distillation based off Vision Transformers. What you want to do here is probably similar, and the resources needed to prove it is huge to say the list. Any chance you've discussed this with your advisor?