DSPython - Introduction to Deep Learning

Topic 1: What is Deep Learning?

Deep Learning is a subfield of Machine Learning based on Artificial Neural Networks (ANNs), which are inspired by the structure of the human brain. While traditional ML models (like SVM or Random Forest) can be powerful, Deep Learning excels at finding complex patterns in massive datasets, especially unstructured data like images, audio, and text.

The "deep" in Deep Learning refers to having multiple layers in the network. A shallow network might have one "hidden" layer, while a deep network can have hundreds.

Key Components of a Neural Network

Neurons (or Perceptrons): The basic unit of a network. A neuron receives one or more inputs, applies a mathematical operation (multiplying by "weights"), and passes the result through an "activation function."
Layers: Neurons are organized into layers.
1. Input Layer: Receives the raw data (e.g., the 4 features of the Iris dataset).
2. Hidden Layer(s): The "deep" part. These layers sit between the input and output. They are responsible for finding patterns and learning internal representations of the data.
3. Output Layer: Produces the final result (e.g., the probability for each of the 3 Iris species).

Topic 2: The "Learning" Process

How does a network "learn" to map inputs (features) to outputs (predictions)? It's a four-step process repeated many times (called "epochs").

1. Forward Propagation (Make a Guess): The input data is fed into the network. Each neuron performs its calculation, and the data "flows" forward through the layers until it reaches the output layer, which makes a prediction. Initially, this prediction is just a random guess.
2. Loss Function (Measure the Error): The network's guess is compared to the actual, true label using a Loss Function (e.g., 'Mean Squared Error' or 'Categorical Cross-Entropy'). This function outputs a single number representing how "wrong" the guess was.
3. Backpropagation (Find Who to Blame): This is the "magic" of deep learning. The error is sent *backward* through the network. Calculus (specifically, the chain rule) is used to calculate how much each individual neuron's **weight** contributed to the total error.
4. Optimization (Update the Weights): An Optimizer (like 'Adam' or 'Gradient Descent') takes the information from backpropagation and "nudges" all the weights in the network in the correct direction to *reduce* the error.

By repeating this process thousands of times, the network's weights are gradually tuned until its predictions become highly accurate.

Topic 3: Key Concepts in Keras

In our code practice, we will use `TensorFlow Keras`, a user-friendly library for building networks. You'll encounter these key terms:

1. The `Sequential` Model

This is the simplest way to build a model in Keras. It's a plain stack of layers, where you add them one by one.

from tensorflow.keras.models import Sequential
model = Sequential()

2. `Dense` Layers

A "Dense" layer is the most basic layer type. It means that every neuron in that layer is connected to *every* neuron in the *previous* layer.

from tensorflow.keras.layers import Dense
# Adds a hidden layer with 10 neurons
model.add(Dense(10, activation='relu'))

3. Activation Functions

Activation functions introduce non-linearity, allowing the network to learn complex patterns. Without them, a neural network would just be a simple linear model (like Linear Regression).

`relu` (Rectified Linear Unit): The most popular choice for hidden layers. It's very simple: `f(x) = max(0, x)`.
`softmax` (Softmax): Used for the *output layer* in a multi-class classification problem. It converts the model's raw output into probabilities that sum to 1. (e.g., `[0.1, 0.8, 0.1]`, meaning 80% chance it's class 2).
`sigmoid` (Sigmoid): Used for the *output layer* in a binary (0 or 1) classification problem.

4. `model.compile()`

This step configures the model for training. It's where you define the learning process:

`optimizer`: The algorithm to use for updating weights (e.g., `'adam'` is a great default).
`loss`: The loss function to measure error (e.g., `'categorical_crossentropy'` for multi-class problems).
`metrics`: What to report during training (e.g., `['accuracy']`).

Topic 4: Deep Learning vs. Traditional Machine Learning

The Main Difference: Feature Engineering

In Traditional ML (like KNN or SVM), you must perform manual feature engineering. You have to *tell* the model what's important. The model's success depends heavily on your domain knowledge and scaling.
In Deep Learning, the network performs automatic feature representation learning. The hidden layers *learn* the most important features on their own. You can feed it raw pixels, and the first layer might learn edges, the next might learn shapes, and a deeper layer might learn to recognize faces.

Advantages of Deep Learning

State-of-the-art performance on many problems (especially unstructured data).
Learns features automatically from raw data.
Highly flexible architecture that can be adapted to many problem types (CNNs for images, RNNs/LSTMs for text).

Disadvantages of Deep Learning

Needs *a lot* of data to perform well.
Computationally expensive: Requires powerful GPUs for training.
"Black Box" problem: Can be very difficult to interpret *why* the model made a certain decision.

Introduction to Deep Learning