Submitted by Ananth_A_007 t3_zgpmtn in MachineLearning
I am aware that one 1x1 convolution is needed for separable convolution but when else is it useful. I see it used in mobilenetv2 before the depthwise separable convolution later in the bottleneck but not sure why. I also see it used with stride 2 when max pooling could be used instead. Could someone please explain the logic behind this. Thanks.
MathChief t1_izjarfb wrote
1x1 conv is essentially a linear transformation (of number of channels) as the other redditor suggests, same as
nn.Linearin PyTorch.What I would to add is in PyTorch the 1x1 conv by default accepts tensor of shapes
(B, C, *), for example(B, C, H, W)in 2d, this is convenient for implementing purposes. If you usenn.Linear, the channel dimension has to be first permuted to the last, and then applying the linear transformation, and permuted back. While using the 1x1 conv, which is essentially a wrapper for the C function that does the einsum automatically, it is just a single line thus the code is cleaner and less error prone.