Video details loaded
HomeMIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018Lecture 27: Backpropagation: Find Partial Derivatives
Lecture 27: Backpropagation: Find Partial Derivatives
52:38
Description
In this lecture, Professor Strang presents Professor Sra’s theorem which proves the convergence of stochastic gradient descent (SGD). He then reviews backpropagation, a method to compute derivatives quickly, using the chain rule.
SummaryComputational graph: Each step in computing \(F(x)\) from the weights
Derivative of each step + chain rule gives gradient of \(F\).
Reverse mode: Backwards from output to input
The key step to optimizing weights is backprop + stoch grad descent.
Related section in textbook: VII.3
Instructor: Prof. Gilbert Strang