Multiple View Imaging

Depth Rendering in Computer Vision



University of Bern, ARTORG Center, AIMI

Outline

  1. Depth Sensing Technologies

  2. Stereo Vision
    a. Parallax Model
    b. Stereo Algorithms
    c. Prerequisites
    d. Deep Learning

  3. Light-Fields
    a. Fundamentals
    b. Plenoptic Camera

    1. Ray Geometry
    2. Calibration
    3. Color Consistency

    c. Deep Learning

  4. Conclusions

Depth Sensing Technologies

  1. Time of Arrival (purely active):

    • measured in time domain (across different spectra)
    • Radar (10 m - 4 mm), Ultrasound (20 mm - 0.2 mm), Lidar (200 nm - 1000 nm)
  2. Parallax (active or passive):

    • measured in ray geometry domain:

      1. passive: diffuse light source for stereo and light-fields
      2. active: laser point projection (e.g. structured light)
  3. Other

    • Depth from Defocus (e.g. microscopy)
    • Interferometry (e.g. optical coherence tomography)

Stereo Vision

a. Parallax

b. Stereo Algorithms

c. Prerequisites

d. Deep Learning

Stereo - Parallax Model

Triangulation:

ZB=bdandZ=B×bd\frac{Z}{B}=\frac{b}{d} \quad \text{and} \quad Z=\frac{B\times b}{d}

  • disparity dd
  • baseline BB
  • image distance bb
  • distance ZZ
  • epipolar lines

goal in stereo vision:

point correspondence detection

Stereo - Block Matching

d,C(x,d)=mRIL(x+m)IR(x+m+d)1\forall d , \, \, C(\mathbf{x}, d)= \sum_{\mathbf{m} \in \mathcal{R}}\left\|I_L(\mathbf{x}+\mathbf{m})-I_R(\mathbf{x}+\mathbf{m}+d)\right\|_1

  • C(x,d)C(\mathbf{x}, d) cost volume (similarity metric)
  • IL(x)I_L(\mathbf{x}), IR(x)I_R(\mathbf{x}) left and right image pair
  • x=(x,y)\mathbf{x}=(x, y) global anchor coordinate pair
  • R\mathcal{R} block region at x\mathbf{x}
  • m=(m,n)\mathbf{m}=(m, n) local block coordinate pair
  • dd horizontal disparity
  • 1\left\|\cdot\right\|_1 absolute value, a.k.a. 1\ell_1 norm

disparity map

d(x)=arg mindC(x,d)d^\star(\mathbf{x}) = \underset{d}{\operatorname{arg\,min}} \, C(\mathbf{x}, d)

Further notes

Stereo - Semi-Global Matching

d,S(x,d)=C(x,d)+rNR(x,d,qr)\forall d , \, \, S(\mathbf{x}, d)= C(\mathbf{x}, d) + \sum_{r \in \mathcal{N}} \, R(\mathbf{x}, d, q_r)

  • qrq_r disparity of adjacent pixel at index rr
  • N\mathcal{N} neighborhood directions (e.g. 4, 8 or 16)
  • R(x,d,qr)R(\mathbf{x}, d, q_r) regularizer term for cost aggregation:
    R(x,d,qr)={0d=qrP1dqr=1P2dqr>1 R(\mathbf{x}, d, q_r) = \begin{cases} 0 \quad &d = q_r \\ P_1 &|d - q_r| = 1 \\ P_2 &|d - q_r| > 1 \end{cases}
  • P1,P2P_1, P_2 penalizer where P1<P2P_1<P_2

d(x)=arg mindS(x,d)d^\star(\mathbf{x}) = \underset{d}{\operatorname{arg\,min}} \, S(\mathbf{x}, d)

Further notes

Stereo Vision - Prerequisites (1)

image rectification

  1. distortion (esp. wide angle lenses)
  • Brown-Conrady model:
    • xdistorted=x(1+k1r2+k2r4+k3r6)\mathbf{x}_{\text{distorted}}=\mathbf{x}(1+k_1r^2+k_2r^4+k_3r^6)

    • assumes radial distortion (point-symmetric)

  1. camera coplanarity
  • intrinsic calibration (focal length, image center)
  • extrinsic calibration (rotation, translation)
  • goal: horizontal epipolar lines

Further notes

Stereo Vision - Prerequisites (2)

image alignment

  • histogram matching
  • noise reduction (e.g. Gaussian filter)
  • down-scale to boost performance:

O(H×W×D)\mathcal{O}(H \times W \times D)

n2H×Wn^2\approx H\times W

O(n2×D)\mathcal{O}(n^2\times D)

  • hand-crafted features (e.g. census)

Stereo - Deep Learning

Siamese (or twin) neural network:

  • 2x CNN pipelines with identical weights

  • learned features outperform hand-crafted [1]

    • convolution layers
    • output ~1/4 of input size
    • spatial pyramid pooling
  • 2x dense layers (no activation layer)

  • similarity of dense layer outputs (e.g. SAD)

  • disparity regression (e.g. softmax)

  • subject to ongoing research:

    [1] MC-CNN by Zbontar and LeCun (2015)

    [2] PSMNet by Chang and Chen (2018)

    [3] DeepPruner by Duggal et al. (2019)

Light-Fields

a. Fundamentals

b. Plenoptic Camera

c. Deep Learning

Light-Fields - Definition

  • at least four views at consistent spacing

4-D light-field notation*:

LF(u,v,s,t)L_F(u,v,s,t)

  • LFL_F describes ray vector piercing through two stacked planes
  • 2-D coordinate pairs (u,v)(u,v) and (s,t)(s,t) correspond to angular and spatial domain


* Levoy and Hanrahan, Light Field Rendering (1996)

Light-fields

Plenoptic Camera

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Model Derivation























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Plenoptic Camera - Refocusing Model























* Hahne et al., Refocusing Distance of a Standard Plenoptic Camera (2016, Optics Express)

Refocusing Model

Sharpness Metric:S=HETEwhere\text{Sharpness Metric:} \, \quad S = \frac{HE}{TE} \quad \text{where}

TE=ω=1Ωψ=1ΨX[σω,ρψ]2TE = \sum_{\omega=1}^{\Omega} \sum_{\psi=1}^{\Psi} \mathcal{X}\left[\sigma_{\omega} \, , \, \rho_{\psi}\right]^2 \,

HE=TEω=1QHψ=1QVX[σω,ρψ]2HE = TE - \sum_{\omega=1}^{Q_H} \sum_{\psi=1}^{Q_V} \mathcal{X}\left[\sigma_{\omega} \, , \, \rho_{\psi}\right]^2 \,

Magnitude of Discrete Fourier Transform.:

X[σω,ρψ]=FFT(Ea[sj,th])\mathcal{X}\left[\sigma_{\omega} \, , \, \rho_{\psi}\right] = \text{FFT} \left(E''_a\left[s_j \, , \, t_h\right] \right)


Max. error: 0.35 % (Ray simulation in Zemax)*


* Hahne et al., (2016, Optics Express)

Plenoptic Camera - Parallax Model























* Hahne et al., Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2018, Int. J. of Comp. Vis.)

Plenoptic Camera - Parallax Model























* Hahne et al., Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2018, Int. J. of Comp. Vis.)

Plenoptic Camera - Parallax Model























* Hahne et al., Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2018, Int. J. of Comp. Vis.)

Plenoptic Camera - Parallax Model























* Hahne et al., Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2018, Int. J. of Comp. Vis.)

Plenoptic Camera - Parallax Model





















Max. error: 0.33 % from experiments and ray simulations for several lens types and focus positions *

* Hahne et al., Baseline and Triangulation Geometry in a Standard Plenoptic Camera (2018, Int. J. of Comp. Vis.)

Plenoptic Camera - Software Tool: PlenoptiSign

w:900 left

* Hahne and Aggoun, PlenoptiSign (2019, SoftwareX), https://github.com/hahnec/plenoptisign

Plenoptic Camera - Calibration (1)

fundamental goal:

registration of view geometric properties

problem a):

  • arbitrary lens setups require generic method

solution a):

  • scale space pyramid P(ν,x)P(\nu, \mathbf{x}) via

ν,P(ν+1,x)=D2(P(ν,x)2G(σ,x))\forall\nu, \, \, P(\nu+1, \mathbf{x}) = \mathcal{D}_2\left(P(\nu, \mathbf{x})\ast \nabla^2 G(\sigma, \mathbf{x})\right)

  • ν\nu^{\star} corresponds to micro image size MM

ν=arg maxν{maxxP(ν,x)}\nu^{\star} = \underset{\nu}{\operatorname{arg\,max}}\left\{\underset{\mathbf{x}}{\operatorname{max}} \, P\left(\nu, \mathbf{x}\right)\right\}


* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.)

Plenoptic Camera - Calibration (2)

problem b): noisy peak estimates cˉn\bar{\mathbf{c}}_{n}

solution b): centroid grid c^n\hat{{\mathbf{c}}}_{n} regression

c^nz~n=c~n=Pg~nor[k~nl~nz~n]=[p1p2p3p4p5p6p7p81][k~nl~nz~n]\hat{{\mathbf{c}}}_{n}\tilde{z}'_{n}=\tilde{{\mathbf{c}}}'_{n}=\mathbf{P}\tilde{\mathbf{g}}_{n} \quad \text{or} \quad \begin{bmatrix} \tilde{k}'_{n} \\ \tilde{l}'_{n} \\ \tilde{z}'_{n} \\ \end{bmatrix} = \begin{bmatrix} p_1 & p_2 & p_3 \\ p_4 & p_5 & p_6 \\ p_7 & p_8 & 1 \\ \end{bmatrix} \begin{bmatrix} \tilde{k}_{n} \\ \tilde{l}_{n} \\ \tilde{z}_{n} \\ \end{bmatrix}

arg minpn=1Cfnwheren, fn=cˉnc^n2\underset{\mathbf{p}}{\operatorname{arg\,min}} \sum_{n=1}^{|{\mathbf{C}}|} f_n \quad \text{where} \quad \forall n, \, f_n=\lVert \bar{{\mathbf{c}}}_{n}-\hat{{\mathbf{c}}}_{n}\rVert_2

PR3×3p=[p1p2p81]R9×1\mathbf{P} \in \mathbb{R}^{3 \times 3} \rightarrow \mathbf{p}=\begin{bmatrix}p_1 & p_2 & \dots & p_8 & 1\end{bmatrix}^{\intercal} \in \mathbb{R}^{9\times 1}

pk+1=pk(JJ+μDD)1Jf\mathbf{p}_{k+1}=\mathbf{p}_k-(\mathbf{J}^\intercal\mathbf{J}+\mu\mathbf{D}^\intercal\mathbf{D})^{-1}\mathbf{J}^\intercal\mathbf{f}

* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.)

Plenoptic Camera - Calibration (3)

problem c): vignetting at micro images

solution c): regularized regression

  • cost metric Fj,h\mathbf{F}_{j,h} given by

Fj,h=cˉj,hc^j,h2+βR(cˉj,h,c^j,h,M),j,h\mathbf{F}_{j,h}=\lVert \bar{{\mathbf{c}}}_{j,h}-\hat{{\mathbf{c}}}_{j,h}\rVert_2 + \beta R\left(\bar{{\mathbf{c}}}_{j,h}, \hat{{\mathbf{c}}}_{j,h}, M\right) , \quad \forall j, h

  • weight β\beta and regularizer R()R(\cdot)

R(cˉj,h,c^j,h,M)={0,if dˉj,h+M/M^<0(kˉ,lˉ)dˉj,h,otherwiseR\left(\bar{{\mathbf{c}}}_{j,h}, \hat{{\mathbf{c}}}_{j,h}, M\right) = \begin{cases} 0, & \text{if}\ \bar{\mathbf{d}}_{j,h} + M/\hat{M} < 0 \\ \sum_{(\bar{k}, \bar{l})}\bar{\mathbf{d}}_{j,h}, & \text{otherwise} \end{cases}

  • distance measure dˉj,h\bar{\mathbf{d}}_{j,h}

dˉj,h=cˉj,h(p3,p6)c^j,h(p3,p6)\bar{\mathbf{d}}_{j,h}=|\bar{{\mathbf{c}}}_{j,h}-(p_3, p_6)|-|\hat{{\mathbf{c}}}_{j,h}-(p_3, p_6)|


* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.)

Plenoptic Camera - Color Equalization (1)

original target aligned *
w:400 w:400 w:400
  • transfer of Probability Density Function (PDF) denoted as N()\mathcal{N}(\cdot) given by:

N(R;μr,Σr)=exp(12(Rμr)Σr1(Rμr))(2π)rank(Σr)Σr\mathcal{N}(\mathbf{R};\boldsymbol\mu_r, \mathbf{\Sigma}_{r}) = \frac{\exp\left(-\frac{1}{2} (\mathbf{R}-\boldsymbol\mu_r)^\intercal \mathbf{\Sigma}^{-1}_r (\mathbf{R}-\boldsymbol\mu_r) \right)}{\sqrt{(2\pi)^{\text{rank}\left(\mathbf{\Sigma}_r\right)} |\mathbf{\Sigma}_r |}}

* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.), https://github.com/hahnec/color-matcher

Color Equalization (2)

  • source RR3×N\mathbf{R}\in\mathbb{R}^{3\times N} and target ZR3×N\mathbf{Z}\in\mathbb{R}^{3\times N}
  • Σr\mathbf{\Sigma}_{r} and Σz\mathbf{\Sigma}_{z} denote covariance matrices
  • μr\mathbf{\boldsymbol\mu}_r and μz\mathbf{\boldsymbol\mu}_z as mean vectors
  • color transfer t^(R)\hat{t}(\mathbf{R}) requires MVGDs to be

N(Z;μz,Σz)N(t^(R);μz,Σz)\mathcal{N}(\mathbf{Z}; \boldsymbol\mu_z, \mathbf{\Sigma}_z) \propto \mathcal{N}(\hat{t}(\mathbf{R}); \boldsymbol\mu_z, \mathbf{\Sigma}_z)

  • drop terms and substitute t^(R)\hat{t}(\mathbf{R}) for Z\mathbf{Z}

(Zμz)Σz1(t^(R)μz)=(Rμr)Σr1(Rμr)(\mathbf{Z} - \boldsymbol\mu_z)^\intercal \mathbf{\Sigma}^{-1}_z \left(\hat{t}(\mathbf{R}) - \boldsymbol\mu_z\right) = (\mathbf{R} - \boldsymbol\mu_r)^\intercal \mathbf{\Sigma}^{-1}_r(\mathbf{R} - \boldsymbol\mu_r)

t^(R)μz=((Zμz)Σz1)+(Rμr)Σr1(Rμr)\hat{t}(\mathbf{R})-\boldsymbol\mu_z = \left((\mathbf{Z} - \boldsymbol\mu_z)^\intercal \mathbf{\Sigma}^{-1}_z\right)^{+}(\mathbf{R} - \boldsymbol\mu_r)^\intercal \mathbf{\Sigma}^{-1}_r(\mathbf{R} - \boldsymbol\mu_r)

M=((Zμz)Σz1)+(Rμr)Σr1\mathbf{M} = \left((\mathbf{Z} - \boldsymbol\mu_z)^\intercal \mathbf{\Sigma}^{-1}_z\right)^+ (\mathbf{R} - \boldsymbol\mu_r)^\intercal \mathbf{\Sigma}^{-1}_r

t^(R)=M(Rμr)+μz\hat{t}(\mathbf{R}) = \mathbf{M}(\mathbf{R} - \boldsymbol\mu_r) + \boldsymbol\mu_z

* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.)

Light-fields - Depth

  • conversion to Epi-Polar Images (EPIs)

  • slope estimation of epi-polar lines

  • solution by Wanner et al. (2012)

    • eigenvectors of Structure tensor:

    [Gσ(SxSx)Gσ(SxSy)Gσ(SxSy)Gσ(SySy)]=[JxxJxyJxyJyy]\begin{bmatrix} G_{\sigma} \ast (S_x S_x) & G_{\sigma} \ast (S_x S_y) \\ G_{\sigma} \ast (S_x S_y) & G_{\sigma} \ast (S_y S_y) \\ \end{bmatrix} = \begin{bmatrix} J_{xx} & J_{xy} \\ J_{xy} & J_{yy} \\ \end{bmatrix}

    dy=(JyyJxx)2+4Jxy2(Jxx+Jyy)2d_{y^*}= \frac{(J_{yy} - J_{xx})^2 + 4J_{xy}^2}{(J_{xx} + J_{yy})^2}

  • available at github.com/hahnec/depthy
    w:160 center

Plenoptic Camera - Software Tool: PlenoptiCam



w:450 center



* Hahne and Aggoun (2021, IEEE Trans. on Image Proc.), https://github.com/hahnec/plenopticam

Light-fields - Deep Learning

Learning to Synthesize a 4D RGBD Light Field from a Single Image *

  • 3300 light-fields dataset containing flowers and plants scenes
  • two-fold CNNs:
    1. geometry from Lambertian emitter (10 layer network)
    2. occluded rays and non-Lambertian prediction (5 layer network)
single image input 4-D depth rays 4-D view output
w:340 w:340 w:340

* Srinivasan et al. (2017), https://github.com/pratulsrinivasan/Local_Light_Field_Synthesis

Conclusions

  • light-fields extend stereo vision sharing commonalities, e.g.:

    • triangulation and epipolar geometry
    • optical aberrations (distortion, vignetting)
    • similar pre-processing (histogram matching)
  • light-fields require dedicated signal processing

    • 4-D calibration
    • epipolar depth map extraction
    • refocusing

Thank you!

- will also highlight milestones reached by peer researchers - to set my work in proper context

(SGM)

for performace gain:

Terminology

fit