Computer Vision (CMU)

http://www.cs.cmu.edu/~16385/

Computer Vision: Algorithms and Applications

Multiple View Geometry (best reference for geometry and vision)


Image processing

Basics of filtering

Image pyramids

Hough lines

Feature detection and correspondences

Corner detection

Feature descriptors

Transformations and geometry

Transformation $x’=f(x;p)$

Homographies

Image alignment

  Structure (scene geometry) Motion (camera geometry) Measurements
Pose Estimation known estimate 3D to 2D correspondences
Triangulation estimate known 2D to 2D correspondences
Reconstruction estimate estimate 2D to 2D correspondences

Camera calibration

Triangulation

Epipolar geometry

Structure from motion

Physics-based vision (not finished)

Objects, faces, and learning

Dealing with motion

Optical flow

Assumptions:

$I(x+u\delta t,y+v\delta t,t+\delta t)=I(x,y,t)$

$\dfrac{dI}{dt}=\dfrac{\partial I}{\partial x}\dfrac{dx}{dt}+\dfrac{\partial I}{\partial y}\dfrac{dy}{dt}+\dfrac{\partial I}{\partial t}=0y$ ($dt$ is total derivative)

$I_xu+I_yv+I_t=0$

Constant flow

Assumptions:

Use a 5*5 image patch, gives 25 equaltions:

$I_x(p_i)u+I_y(p_i)v+I_t(p_i)=0$

Can be solved by LS

Matrix is the same as Harris Corner Detector.

Aperture problem

Horn-Schunck optical flow

Image Alignment

$\min\limits_p \sum\limits_x[I(W(x;p))-T(x)]^2$ where

Lucas-Kanade alignment

$I(W(x;p+\Delta p))\approx I(W(x;p))+\dfrac{\partial I(W(x;p))}{\partial x’}\dfrac{\partial W(x;p)}{\partial p}\Delta p$

Assume we have a initial guess of $p$, then solve for $\Delta p$ using LS

Left: Lucas Kanade (Additive alignment), Right: Shum-Szeliski (Compositional alignment)

In conclusion, the update rules are:

Kalman Filtering

Tracking (KLT, Mean-Shift)

Kanade-Lucas-Tomasi (KLT)

  1. Find corners satisfying $\min(\lambda_1,\lambda_2)>\lambda$
  2. For each corner compute displacement to next frame using the Lucas-Kanade method
  3. Store displacement of each corner, update corner position
  4. (optional) Add more corner points every M frames using 1
  5. Repeat 2 to 4
  6. Returns long trajectories for each corner point

Mean-Shift

Like gradient descent of the kernel density estimate $P(x)$ A

Mean-Shift for images

Each pixel is point with a weight

Non-rigid object tracking

Temporal inference and SLAM

Temporal state model. States $X_t$ and observation $E_t$. Assumptions:

Not finished