Lecture 9: Hardware and Software

Source: youtube

How do FP32 cores compare to tensor cores?

Where a FP32 core can do 2FLOP/cycle, a tensor core can do 128 FLOP/cycle for the RTX Titan. So about 128x speed up for matrix multiply-add.

How to utilize tensor cores in PyTorch?

Use 16-bit precision. Or google “use mixed precision”.

Why are many variables/hyperparams set to powers of 2?

Due to underlying hardware being based on powers of two these are the most efficient as they fully use the available compute. There is no need for zero padding matrix multiply-add for instance.

How do Google’s TPU’s stack up agains Nvidia’s tensor cores?

Google advertises with 420 TFLOPs and Nvidia with 130 TFLOPs.

What are important differences between needs of GPUs for gaming and deep learning.

What are important software frameworks?

These are important frameworks, with the most important in bold.

What are the three important concepts in PyTorch?

How does PyTorch’s autograd work?

Why would you define a autograd.Function in PyTorch?

A class which inherits from “autograd.Function” implements a python function which will be represented as a single node in the computation graph PyTorch builds. Instead one could use standard python primitives to implement the same function, however, each used primitive will then add a single node to the computation graph. This might lead to instability errors such as NANs. However, using “autograd.Function” is not really common in practice.

What is the NN module in PyTorch?

The NN module is an object oriented way of building networks which makes it easier to build models as standard layers, loss functions and tracking of parameters with gradients are all already implemented.

What does the optimizer module in PyTorch do?

It updates all the parameters in a model for you by looping over them and using the learning rate.

How to create static computation graphs in PyTorch?

Use the method “jit.script”.

Why would one use a dynamic or static computation graph?