Deep learning made easy with TERSE networks

Sebastiaan Vermeulen

EIPC 2020

Econometrics: Can we recognize \(f\)?

Machine learning: Can we approximate \(f\)?

\[y=f(x)+\epsilon\]

Neural Networks

\[ h_1{( x_t)} = \begin{bmatrix} u\left(a_{11} + x_t' b_{11}\right) \\ \dots \\ u\left(a_{n_1 1} + x_t' b_{n_1 1}\right) \end{bmatrix} \quad h_k{( x_t)} = \begin{bmatrix} u\left({a_{1k} + h_{k-1}{( x_t)}}' b_{1k}\right) \\ \dots \\ u\left({a_{n_k k} + h_{k-1}{(x_t)}}' b_{n_k k}\right) \end{bmatrix} \quad y_t = h_k{( x_t)}'\beta \]

Activation functions

Partitioning the 'input space'

\[(x_2-x_1)^+ - (x_2-0.5)^+\]

Redundancy in the network formula

  • Every node is scale-invariant
  • Every layer is permutation-invariant

\[ h_1{( x_t)} = \begin{bmatrix} u\left(a_{11} + x_t' b_{11}\right) \\ \dots \\ u\left(a_{n_1 1} + x_t' b_{n_1 1}\right) \end{bmatrix} \]\[ h_k{( x_t)} = \begin{bmatrix} u\left({a_{1k} + h_{k-1}{( x_t)}}' b_{1k}\right) \\ \dots \\ u\left({a_{n_k k} + h_{k-1}{(x_t)}}' b_{n_k k}\right) \end{bmatrix} \]\[ y_t = h_k{( x_t)}'\beta \]

TERSEnet: devoid of superfluity

Fix the partitioning

Ensure every region has identifiable regression

TEsselated Regression Simplices Encoding network

TERSEnet is like a linear model:

\[ h_{1,i+dj}{( x_t)} = {(j - n{(x_{t,i})}^+)}^+ \]\[ h_{2,k}{( x_t)} = \left(1-\sum_{i=1}^d h_{1,i+dj_i}{( x_t)} \right)^+, \quad j_i = w(k,i) \] \( \hat y_t = h_{2}{( x_t)}'\) \(\beta\)

(Regularized) least squares with OLS or LARS

Performance (theoretical)

  • TERSEnet is parameter efficient for \(C^2\) functions \[\|f-g\|_\infty=O(W^{-2/d})\]

  • TERSEnet parameters have unique global (easy) optimum

Performance (practical)

TERSEnet is

  • interpretable
  • robust
  • fast

(Currently) only for vanilla deep regression

Insights have wider applications