Recent comments in /f/deeplearning

rikonaka t1_j3jyzos wrote

Well, I’m not sure how feasible the method of two power supplies for one host is😂, and it’s still a problem with the motherboard. I can’t comment on the stability of the method of connecting four 3090s using graphics card expansion (because I haven’t done so yet), I think you should carefully consider your plan, the cost of trial and error is not low.

1

soupstock123 OP t1_j3jx9ko wrote

Yeah, there's no way to add two PSUs on the pcparpicker, so that's mean to be 2 of the 1000W ones.

The b650 supports 4. It has enough ports. The blocking is not an issue because I'm going to be using GPU risers to fit the 4 GPUs.

To respond to your first comment, the thread ripper is also very expensive, and I'm waiting until Sept 2023 when threadripper 7 comes out and drops prices for thread rippers.

1

rikonaka t1_j3jvvne wrote

I read your shopping list, there's has two problems, the motherboard and power supply, the power of one 3090 is 350 watts, and the power of four is 1400 watts, so your power supply should be at least 2000 watts (The specific calculation will be made after the CPU is determined), the problem with the motherboard is that the b650 does not support four 3090 video cards, and it only has two videos card slots.😉

1

Blasket_Basket t1_j3h24nj wrote

Move experience above education since you have significant work experience. Similarly, move the team lead CV role to the top of that section, about the research assistant roles. Recruiters want to know you have work experience first and foremost. You come across as significantly less competent/senior to recruiters if the first thing they hear about is the stuff you're doing as a grad assistant.

2

dtjon1 t1_j3g0zhe wrote

The pointnet family of NN's use special mathematical functions called "symmetric" functions to leverage the unordered and unstructured nature of point clouds. For some set of points, no affine transformation (rotation, scale, translation) nor any reordering of points should have any effect on the output of the model. These symmetric functions enable pointnet to handle these cases.

It's hard to discuss pointnet in simple terms beyond this point, but the training process basically has two parts:

  1. We learn these symmetric functions for our dataset and use them to build a representation of the data that is meaningful to the model. This is called feature extraction, and gives us a feature vector. Think of this like an abbreviated form of the data that is able to be used by the model.

  2. We can then use this feature vector to perform whatever task we want. For classification we typically just throw the feature vector into a second neural network (usually an MLP) which outputs a probability distribution over our classes.

All of this happens together during training - the model learns to extract meaningful features from your data and also learns to perform whatever task you have in mind. I really recommend watching the authors' presentation for more info.

Pointnet is really easy to use with some programming experience and doesn't require massive compute requirements, yet is still really powerful. Pointnet++ can be a little trickier as the main implementation I've seen requires custom cuda kernels.

10

BalanceStandard4941 t1_j3frfqe wrote

Because the points are not like pixels in a continuous space, pointnet first sample a few anchors from the point set. Then every anchor point will find their k nearest neighbor(like CNN works on windows of pixels). Then with shared MLP layers, point will now have higher dimension of latent features. Last, to aggregate features of local points, max-pooling will used on every group of points that we clustered previously.

This is one layer they called Set Abstraction layer. Which repeat for 4 times. After SA layers, Feature Propagation Layers can be used if ur task is segmentation, which just upsampling the points.

5

bitemenow999 t1_j3e5jms wrote

I would suggest changing the publications to the standard format, nobody needs to know what journal/platform all of them are bad... Also drop PhD from the prof. name, they are assumed to have a Phd by default. Drop "selected" from selected projects.

3

i_do_too_ t1_j3e0q70 wrote

Good CV overall, I'd decrease the number of personal projects and increase the space allocated to experience. This is probably contrary to what you have heard, but you go for full time roles, you should put experience first and then education. I say this because you have good experience and you wouldn't want to join at entry level, but at least L4. For that, you should promote your experience.

14

ASalvail t1_j3dxr6l wrote

It looks a bit crammed, so I'd ditch the summary. I never read those anyway and if I can't tell at a glance what you've worked on, something is wrong. If you want to keep it, I would emphasize which sub branch of AI you're interested and/or specialized in.

I would emphasize that full-time industry experience: that'll tell me I won't need to show you how to work in a team and that mnist isn't the usual dataset quality you should expect. Do point out it's full-time. You can deduce it from the dates but I typically look at CVs for max 1 min for initial triage.

Otherwise it looks pretty great!

10

trajo123 t1_j3c38rx wrote

Several things I noticed in your code:

  • your model doesn't use any transfer function
  • the combination of final activation function and loss function is incorrect
  • for CNN you should be using BatchNorm2D layers

The code should look something like this:

    def __init__(self, input_size, num_classes):
        super(CNNClassifier, self).__init__()
        self.input_size = input_size
        self.num_classes = num_classes
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1) # increase the number of channels
        self.bn1 = nn.BatchNorm2d(32)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=128, kernel_size=3, stride=1, padding=1) # increase the number of channels
        self.bn2 = nn.BatchNorm2d(128)
        self.fc1 = nn.Linear(128, 256)  # note the smaller numbers
        self.fc2 = nn.Linear(256, num_classes)
        self.bn1 = nn.BatchNorm2d(32),
        self.final_pool = nn.AdaptiveAvgPool2d(1)  # before flatten, you should use AdaptiveMaxPool2d, or AdaptiveAvgPool2d to get rid of the spatial dimensions, essentially treat each filter as one feature
        # self.softmax = nn.Softmax(dim=1) - not needed, see below. Also Softmax is not correct for use with NLLLoss, he correct one would be LogSoftmax(dim=1)
        self.f = nn.ReLU()
        
    def forward(self, x):     
        x = self.conv1(x)
        x = self.pool(x)
        x = self.f(x)  # apply the transfer function
        x = self.bn1(x) # apply batch norm (this can also be placed before the transfer function)

        x = self.conv2(x)   
        x = self.pool(x)
        x = self.f(x)  # apply the transfer function        
        x = self.bn2(x) # apply batch norm (this can also be placed before the transfer function)

        # since you are now using batchnorm, you could add a few more blocks like the one above, vanishing gradients are less of a concern now

        x = self.final_pool(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = self.f(x)  # apply the transfer function, here you could try tanh as well
        x = self.fc2(x)
        # x = self.softmax(x)  # no need for a function here because it is incorporated into the loss function for numerical/computational efficiency reasons
        return x

Also, the loss should be

# criterion = nn.NLLLoss()
criterion = nn.CrossEntropyLoss()  # the more natural choice of loss function for classification, actually for binary classification the more natural choice would be BCEWithLogitsLoss, but then you need to set the number of number of output units to 1.
1

FastestLearner t1_j3c0yju wrote

You are not using non-linearity. Yours is just a linear model. Deep CNNs thrive on non-linearity. Try adding a ReLU layer after every MaxPool. Also, for better convergence, add BN layers after each Conv. Don’t use two Linear layers (mostly redundant). Use AvgPool instead of Flatten. Replace Softmax with LogSoftmax. Set Adam lr=1e-4, decay=1e-4.

PM me if you face any more issues.

3

trajo123 t1_j3busy6 wrote

First of all, the dataset size is way too small to train a model from scratch to give meaningful results on this relatively complex task (more complex than MNIST for example, which has a training set of 60000 images). Second, your model is way too small/simple for this task even if you would have 100 times more data. I strongly suggest "Transfer Learning" - fine-tuning a pre-trained model by replacing the classification head, freezing the rest of the model in place and training on your dataset.

Something along these lines:

from torchvision import transforms, models

# ...

model = models.swin_b(weights=models.Swin_B_Weights.IMAGENET1K_V1)
model.heads[0] = nn.Linear(model.heads[0].in_features, 1, bias=True)
# ...
)

In the pre-trained model documentation you will see what training recippe was used and what transforms were applied to the image. Typically:

transforms.Normalize(
                mean=(0.485, 0.456, 0.406),
                std=(0.229, 0.224, 0.225),
            )
            
transforms.Resize((224, 224), interpolation=transforms.InterpolationMode.BICUBIC)

See more at <https://pytorch.org/vision/stable/models.html#table-of-all-available-classification-weights>. You can also find pre-trained models HuggingFace / VisionModels.

Hope this helps, good luck!

3

suflaj t1_j3bubtm wrote

Another problem you will likely have is your very small convolutions. Basically, output channels of 8 and 16 are probably only enough to solve MNIST. You should then probably use something more like 32 and 64, and use larger kernels and strides to hopefully reduce reliance on the linears to do the work for you.

Finally, you are not using nonlinear activations between layers. Your whole network essentially acts like one smaller convolutional layer with a flatten and softmax.

1