- Camera model
- Bundle adjustment with planar checkerboard
- VAE outlier corner detection
- Experimental results
We model the cameras as the usual pinhole camera following Zhang's method[1]. A 3D world point is denoted as and its projected image point as , related by:
where are extrinsic parameters with rotation (parameterized as the angle-axis vector) and translation, and is an arbitrary scale factor. is the camera intrinsic matrix with focal lengths and principal points in pixel unit. describes the skew of the two image axes which can be assumed zero for most modern cameras[2].
The lens distortions are modeled using the coefficients for radial and for tangential distortions following the widely used lens model[2] relating a distorted image point and undistorted image point by:
As a result, each camera is modeled with 15 parameters (6 extrinsics, 4 intrinsics, and 5 lens distortion parameters).
We denote the positions of checkerboard corners w.r.t. the body coordinate frame as a vectorized form where is the number of corners on a checkerboard. To enforce the rigid planar assumption, we make constant and set . Then, a rigid body transformation is applied to each corner to obtain their world positions . The transformation is parameterized by the angle-axis vector for rotation and translation vector :
where computes a rotation matrix using Rodrigues' formula, and denotes position of corner in .
Using this planar model, we minimize the following sum of mean reprojection errors over all frames:
where is frame index, is camera index with encoding visibility information, is a vector of 15 parameters of camera , is the pose of the checkerboard, are the vectorized positions of observed corners, projects each of the world points onto camera's image plane, and is for averaging.
OpenCV/Matlab corner detectors are known to give incorrect or inaccurate corners especially when the checkerboard is tilted at a large angle w.r.t. the image plane, is under poor lighting, or other objects are present in the image [6, 7, 8]. Therefore, We treat incorrectly or inaccurately detected corners as outliers and identify them using VAE.
VAE deals with an unknown underlying probabilistic distribution defined over the data points in some potentially high-dimensional space [3]. The aim of VAE is to approximate the underlying distribution with a chosen model , parameterized by and marginalized over the latent variables :
The objective is to estimate the parameters via Maximum Likelihood Estimation (MLE) on the likelihood of the data set consisting of i.i.d. samples . Using variational inference for approximating the intractable true posterior involved in with some tractable distribution , then it is sufficient for MLE to maximize the evidence lower bound :
where is Kullback-Leibler (KL) divergence[4].
To make the maximization of the evidence lower bound tractable, the prior is set as isotropic unit Gaussian and the posterior as Gaussians . Also, this parameterization allows the use of the reparameterization trick to make the random sampling of in differentiable w.r.t. by reparameterizing it as , with . Moreover, by choosing where is pre-determined and after collecting constant terms into , maximizing becomes equivalent to minimizing:
For VAEs, refers to a probabilistic encoder and to a probabilistic decoder.
For input images, we first remove shading information through binarization using Otsu's method following Gaussian blurring. The last layer in the decoder is a sigmoid activation which returns pixel values in range [0,1].
-
Effects of corner outliers on the calibration accuracy
Figure 1: (a) 16 camera studio setup and (b) its virtual counterpart with a randomly moving checkerboard visualized.We generate four sets of synthetic image points of a virtual planar checkerboard (Figure 1). We include different percentages of outlier corners (0%, 0.01%, 0.1%, and 1%) in each image set where each outlier corner is randomly offset from its ground-truth location by 10-15 pixels. Then, we estimate the parameters of 16 cameras using each of the four image sets and compare the root mean square error (RMSE) between the ground-truth.
Figure 2: RMSE comparisons of estimated camera parameters, grouped by percentages of outlier corners.The result is obtained as in Figure 2, where the geodesic distance on unit sphere is used for computing the error in camera orientations , and denotes camera position. Since each calibration is obtained in an arbitrary coordinates system, we align them with the reference coordinates using Procrustes analysis before computing RMSE.
-
Performance evaluations
We generate 10k synthetic corner images with 1% outliers by cropping 15x15 pixels from the template corner after applying random affine transformations:
The outlier corners are offset from the center of crop by 2-4 pixels (Figure 3b).
We set the latent space of VAE and AE to 2 dimensional, and VAE's KL divergence weight to =1. Then, VAE and AE are trained for 1k epochs using Adam optimizer with learning rate 1e-3. For PCA and kPCA (with a Gaussian kernel), we use 2 principal axes for reconstructions.
We use the same binarized inputs for both training and testing (Figure 3) and compare the results using the area under the receiver operator characteristic curve (AUROC) and the precision recall curve (AUPRC). Note, when the data for binary classification is heavily imbalanced, AUPRC can be less misleading than AUROC[5].
Figure 3: Reconstructions and normalized losses of synthetic corners: (a) inliers and (b) outliers.From Figure 4, we observe AUPRC of VAE is the largest, outperforming the rest. For VAE, the precision ratio remains very close to 1 even as the recall ratio approaches 1, implying VAE can remove outliers without also removing the inliers.
Figure 4: Area under the receiver operator characteristic curve (AUROC) and the precision recall curve (AUPRC).The performance differences are more pronounced in the normalized reconstruction losses plot of every data points in Figure 5. In contrast to AE, PCA, and kPCA, a clear classification margin is achieved by VAE which implies the outliers can be identified with high confidence.
Figure 5: Normalized reconstruction losses of every corner crops by VAE, AE, PCA, and kPCA.We capture images (~172 frames per camera) of a freely moving checkerboard and crop 15x15 pixels centered at detected corners. Similarly, we train VAE with heuristically determined KL divergence weight =0.01 for 200 epochs.
Figure 6: Original grayscale crops (above) and VAE reconstructions (below) sorted by normalized losses.The result in Figure 6 shows that the outlier corners are reconstructed with large loss, and via visual inspection 37 outliers can be identified (~ 0.015% outliers). Similar to Figure 5, a clear classification margin can be observed in the reconstruction loss plot in Figure 7. Utilizing these, we can determine the outlier corners with high confidence from tens to thousands of images.
Figure 7: Normalized VAE reconstruction losses of every corner crops displaying distinct classification margin.
b. Improvements on 3D reconstructions accuracy
Figure 8: (a) Bowling ball captured at three different locations in three frames and (b) its stereo triangulations visualized.We capture images (~172 frames per camera) of a freely moving checkerboard and calibrate cameras using two different methods: (1) with and (2) without outlier corners. Then, we capture images of a polyester bowling ball at three different locations and reconstruct its 34 manually-marked visible feature points (Figure 8a). We triangulate its feature points using pairs of calibrated cameras {(1, 2), (2, 3), ..., (16,1)} to obtain a set of sparse point clouds (Figure 8b). The radius of the ball is measured ~108 mm, but we do not rely solely on this since it is human-measured and the reconstruction error is usually very small (<1 mm). Instead, we evaluate the reconstruction consistency based on the radius values between the triangulated points and center of spheres fitted to each point cloud set via least-square fitting. If the calibrations are accurate, every radius values should have very small deviations from their mean.
Figure 9: Total of 102 feature points triangulated using camera pairs calibrated with/without corner outliers (mean, standard deviation).The box plots of the obtained radiuses are shown in Figure 9. The radiuses display smaller deviations with statistically significant difference (p < 0.05 from paired sample t-test) when images containing the outlier corners are removed.
[1] Zhang Z. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence. 2000 Nov;22(11):1330-4.
[2] Szeliski R. Computer vision: algorithms and applications. Springer Science & Business Media; 2010 Sep 30.
[3] Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. 2013 Dec 20.
[4] Murphy KP. Machine learning: a probabilistic perspective. MIT press; 2012 Sep 7.
[5] Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd international conference on Machine learning 2006 Jun 25 (pp. 233-240).
[6] Shu C, Brunton A, Fiala M. Automatic grid finding in calibration patterns using Delaunay triangulation. National Research Council of Canada; 2003 Aug.
[7] Albarelli A, Rodolà E, Torsello A. Robust camera calibration using inaccurate targets. IEEE Trans. Pattern Anal. Mach. Intell.. 2009;31:376-83.
[8] Wang Z, Wang Z, Wu Y. Recognition of corners of planar checkboard calibration pattern image. In2010 Chinese Control and Decision Conference 2010 May 26 (pp. 3224-3228). IEEE.