Computer Vision (CMU)

Computer Vision: Algorithms and Applications

Multiple View Geometry (best reference for geometry and vision)

Image processing

Basics of filtering

Image pyramids

Hough lines

Feature detection and correspondences

Corner detection

Feature descriptors

Transformations and geometry

Transformation $x’=f(x;p)$


Image alignment

  Structure (scene geometry) Motion (camera geometry) Measurements
Pose Estimation known estimate 3D to 2D correspondences
Triangulation estimate known 2D to 2D correspondences
Reconstruction estimate estimate 2D to 2D correspondences

Camera calibration


Epipolar geometry

Structure from motion

Physics-based vision (not finished)

Objects, faces, and learning

Dealing with motion

Optical flow


$I(x+u\delta t,y+v\delta t,t+\delta t)=I(x,y,t)$

$\dfrac{dI}{dt}=\dfrac{\partial I}{\partial x}\dfrac{dx}{dt}+\dfrac{\partial I}{\partial y}\dfrac{dy}{dt}+\dfrac{\partial I}{\partial t}=0y$ ($dt$ is total derivative)


Constant flow


Use a 5*5 image patch, gives 25 equaltions:


Can be solved by LS

\[\begin{bmatrix}\sum\limits_iI_x(p_i)I_x(p_i) & \sum\limits_iI_x(p_i)I_y(p_i)\\\sum\limits_iI_x(p_i)I_y(p_i) &\sum\limits_iI_y(p_i)I_y(p_i)\end{bmatrix}\begin{bmatrix}u\\ v\end{bmatrix}=-\begin{bmatrix}\sum\limits_iI_x(p_i)I_t(p_i)\\ \sum\limits_iI_y(p_i)I_t(p_i)\end{bmatrix}\]

Matrix is the same as Harris Corner Detector.

Aperture problem

Horn-Schunck optical flow

\[\min\limits_{u,v} \sum_{i, j}\left\{E_{s}(i, j)+\lambda E_{d}(i, j)\right\}\]

Image Alignment

$\min\limits_p \sum\limits_x[I(W(x;p))-T(x)]^2$ where

Lucas-Kanade alignment

$I(W(x;p+\Delta p))\approx I(W(x;p))+\dfrac{\partial I(W(x;p))}{\partial x’}\dfrac{\partial W(x;p)}{\partial p}\Delta p$

Assume we have a initial guess of $p$, then solve for $\Delta p$ using LS

\[\min\limits_p \sum\limits_x[I(W(x;p))+\nabla I \frac{\partial \mathbf{W}}{\partial p} \Delta p-T(x)]^2\]

Left: Lucas Kanade (Additive alignment), Right: Shum-Szeliski (Compositional alignment)

In conclusion, the update rules are:

Kalman Filtering

Tracking (KLT, Mean-Shift)

Kanade-Lucas-Tomasi (KLT)

  1. Find corners satisfying $\min(\lambda_1,\lambda_2)>\lambda$
  2. For each corner compute displacement to next frame using the Lucas-Kanade method
  3. Store displacement of each corner, update corner position
  4. (optional) Add more corner points every M frames using 1
  5. Repeat 2 to 4
  6. Returns long trajectories for each corner point


Like gradient descent of the kernel density estimate $P(x)$ A

Mean-Shift for images

Each pixel is point with a weight

Non-rigid object tracking

Temporal inference and SLAM

Temporal state model. States $X_t$ and observation $E_t$. Assumptions:

Not finished