l01_neuralnetwork

Neuron

A linear/non-linear function that takes in multiple inputs and produces one output.

\[ y = f\left( w_1\times x_1 + w_2\times x_2 + \dots + w_n \times x_n + w_0\right) \]

\[ y = f\left(\sum_{i=1}^n w_i x_i + w_0 \right) = f\left(\textbf{w}^T\textbf{x} + w_0\right) \]

Activation functions

The purpose of activation function is to induce non-linearity into the model.

Let \(z = \textbf{w}^T\textbf{x}+w_0\),

\(z\) is called pre-activation vector (or scalar), \(\textbf{w} = \{w_1,\dots,w_n\}\) and \(\textbf{x} = \{x_1,\dots,x_n\}\)

\[ y = f\left(z\right) \]

It has a linear component \(z\) covered by a non-linear function \(f\)
Some of the activation functions
- Rectified Linear Unit (ReLU), \(y = max(0,z)\)
- Sigmoid function, \(y = \frac{1}{1+exp(-z)}\)
- Hyperbolic tan function, \(y = tanh(z)\)

Neural Network

Engineering problems will require functions with multiple outputs for a set of inputs
Example: 2D incompressible flow over flat plate. \(f_{NN}: x,y \to u,v,p\)

for each blue neuron in hidden layer in the schematic, \(z_j = \textbf{w}^T_j\textbf{x} + w_{0,j},\;\; j=1,2,3,4; \; \; \textbf{x}=\{x,y\}^T\)
In matrix form, it will be \[ \begin{bmatrix}z_1 \\ z_2 \\ z_3 \\ z_4\end{bmatrix} = \begin{bmatrix} \textbf{w}_1^T \\ \textbf{w}_2^T \\ \textbf{w}_3^T \\ \textbf{w}_4^T \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} w_{0,1} \\ w_{0,2} \\ w_{0,3} \\ w_{0,4} \end{bmatrix} \] \[ \textbf{z} = \textbf{W}\textbf{x}+\textbf{b} \]

\(\textbf{z},\textbf{b}\) are \((4\times 1)\) vector, \(\textbf{x}=\{x,y\}^T\) is \((2\times 1)\) input vector and \(\textbf{W}\) is \((4\times 2)\) weight matrix

Universal Approximation Theorem (Cybenkov 1989)

Theorem 1 Let \(I_n\) denote the n-dimensional unit cube \([0,1]^T\) and \(C(I_n)\) be the space of continuous functions defined in \(I_n\). Let \(x\in\mathbb{R}^n\) and \(\sigma\) be any continuous discriminatory function. Then the finite sums of the form

\[ G(x) = \sum_{j=1}^N \alpha_j \sigma(y_j^T x + \theta_j) \]

are dense in \(C(I_n)\). In other words, given any \(f\in C(I_n)\) and \(\epsilon>0\), there is a sum, \(G(x)\) of above form for which \[ |G(x) - f(x)| < \epsilon, \forall \; \; x\in I_n \]

where \(y_j \in \mathbb{R}^n\) and \(\alpha_j, \theta_j \in \mathbb{R}\).

Forward propagation

Lets take the same network having 1 hidden layer with 4 neurons and 1 output layer with 3 neurons

foward propagation - process of passing input data through network’s layers to compute final output

Hidden layer \[ \begin{align} \textbf{z}_h &= \textbf{W}_{hi}\textbf{x}+\textbf{b}_h \\ \textbf{a}_h &= f_h\left(\textbf{z}_h\right) \end{align} \]
Output layer \[ \begin{align} \textbf{z}_o &= \textbf{W}_{oh}\textbf{a}_h+\textbf{b}_o \\ \textbf{a}_o &= f_o\left(\textbf{z}_o\right) \end{align} \]

\(\textbf{a}_0 = \{u,v,p\}\) is the estimated vector of outputs from the network