14 Years

I’ve been working on model based 3D tracking on and off for quite a while now.

Year 1 (2005)

This was my main contribution to the field of 3D tracking. To my knowledge, it was the joint first (there was another paper from my lab mate using a different technique) real time tacking system that processed the entire image frame. Both techniques were much more robust than the ones that went before. My one also debuted an early version of the FAST corner detector (I didn’t put that page there).

You can see the tracking works because the model (rendered as purple lines) stays stuck to the image.  The tracker operated in real time, well field rate, which was 50Hz fields of 756×288 pixels of analogue video from some sort of Pulnix camera, captured on a BT878 card of some sort on a dual PIII at 850 MHz (running Redhat of some description). It wasn’t mobile (I had two 21″ CRT monitors), so I wasn’t watching the screen as I was capturing video; I found a long spool of thin 75 ohm co-ax which is why it had any kind of mobility. It, somewhat unexpectedly, tracked almost until I put the camera down on the table at the end. It was a bit of an anticlimactic finish, but I didn’t expect it to work quite so well.

Year 14 (2019)

This is the project I’ve been working on recently (landmarkers). It’s nice to see technology move from a proof of concept, academic curiosity to a robust production system usable in the wild by people who aren’t computer vision researchers. Also, I didn’t do the graphics in this one which is why it looks rather cooler than a bunch of purple lines.




Linear 3D triangulation

I came across this 3D linear triangular method in TheiaSFM:

bool TriangulateNView(const std::vector<Matrix3x4d>& poses,
                      const std::vector<Vector2d>& points,
Vector4d* triangulated_point)
  CHECK_EQ(poses.size(), points.size());

  Matrix4d design_matrix = Matrix4d::Zero();
  for (int i = 0; i < points.size(); i++) {
    const Vector3d norm_point = points[i].homogeneous().normalized();
    const Eigen::Matrix cost_term =
        poses[i].matrix() -
        norm_point * norm_point.transpose() * poses[i].matrix();
    design_matrix = design_matrix + cost_term.transpose() * cost_term;
  Eigen::SelfAdjointEigenSolver eigen_solver(design_matrix);
  *triangulated_point = eigen_solver.eigenvectors().col(0);
  return eigen_solver.info() == Eigen::Success;

I was aware of the DLT (direct linear transform), but it didn't look like any formulation I've seen before. It's actually pretty neat. Let's say you're trying to find an unknown homogeneous point in 3D, \mathbf{X} = [X, Y, Z, 1]. What we have is N poses, P, represented as 3\times 4 matrices and the corresponding 2D coordinates represented as homogeneous points in \mathbb R^3. The 2D points are written as \mathbf{x} = [ x, y, 1].

Since we're triangulating the 3D point, and we have homogeneous coordinate (i.e. \alpha \mathbf{x} \equiv \mathbf{x}) then for all i we should have:
\alpha_i \mathbf{x}_i \approx P_i \mathbf X
given an scale factor \alpha.

Now let's pick apart the code above. Let's call design_matrix D and cost_term C. On line 12, we have:
\displaystyle D = \sum_{i=1}^{N} C_i^\top C_i
And line 15 we’re finding the eigenvector corresponding to the smallest eigenvalue of D (SelfAdjointSolver produces them in a sorted order), i.e.
\mathbf{X} \approx \displaystyle \underset{\mathbf{v}, |\mathbf{v}|=1}{\text{argmin}}\ \mathbf{v}^\top D \mathbf{v}

We can rewrite D = \mathbf{C}^\top\mathbf{C} where:
\mathbf{C} = \left[ \begin{matrix} C_1\\ C_2\\ \vdots\\ C_N\\ \end{matrix}\right], which substituting in above gives:
\mathbf{X} \approx \displaystyle \underset{\mathbf{v}, |\mathbf{v}|=1}{\text{argmin}}\ \|\mathbf{C v}\|_2^2,
which is of course the right singular vector corresponding to the smallest singular value of C. Using eigen decomposition is much more efficient the size is O(1), not O(N), but probably at the penalty of less numerical precision.

Either way we’re trying to find the approximate nullspace of \mathbf{C}, which means finding something that’s roughly in the null space of all the C_is. But why?

On lines 8–11, we have:
C_i = P_i - \mathbf{\hat{x}\hat{x}^\top}P_i,
and we’re claiming \mathbf{X} is about in the null space. Let’s see what happens when we multiply by it:
(P_i - \mathbf{\hat{x}\hat{x}^\top}P_i) \mathbf{X} = P_i \mathbf{X} -\mathbf{\hat{x}\hat{x}^\top}P_i \mathbf{X}\\
Now, substituring in the first equation we have all the way back at the top gives:
\approx \alpha \mathbf{x} - \alpha\mathbf{\hat{x}\hat{x}^\top x} = \alpha \mathbf{x} - \alpha\mathbf{\hat{x} |x|} = \alpha \mathbf{x} - \alpha\mathbf{x} = 0
taa daa!

So there you go! If there is no noise, \mathbf{X} is in the right null space of C_i and so the null space of \mathbf C and of course D. If there is noise, then it’s closest to being in the null space of all of C_i measured in a least squares sense.

Note though that it’s just an algebraic error, not a reprojection error so it will give slightly oddly scaled results which will give more weight to some points than others. It is however a rather elegant solution.