Deep Computer Vision: CNN
This track builds intuition for convolutional neural networks (CNNs) and how they power real computer vision systems: class labels, boxes, masks, depth, tracking, and more, often under noise, drift, and latency that benchmarks leave out.
Work through the lessons in order; each one sets up vocabulary and motivation for the next.
Prerequisites
Deep Neural Networks
Stacking layers, backpropagation, activation functions, overfitting
Foundations of Regression
Logistic regression, loss functions, gradient descent
Lessons
The visual revolution
Deep learning ended decades of hand-crafted vision
Deep learning's role in computer vision
Learned features beat hand-crafted ones at every benchmark
From pixels to perception
Images are 3D tensors (H × W × C); models predict labels from them
Feature detection & spatial hierarchy
Networks learn edges → textures → shapes → objects automatically
Preserving spatial structure with CNNs
Convolutions respect locality; fully-connected layers discard it
Filters, features & the power of convolutions
Slide a small learned filter across the image to detect one pattern
Learning to see: CNN internals
Conv–ReLU–Pool stacks compress space while deepening channels
Unlocks
CNN Architectures (ResNet, EfficientNet)
Classic and modern networks that scale vision to ImageNet
Object Detection & Segmentation
Extend classification to localise and segment objects
Vision Transformers (ViT)
Apply attention to image patches as an alternative to convolutions