Loss function
A loss function is a measure of how far the model's predictions are from the ground truth labels. It is used to train the model to minimize the loss and improve its predictions.
In Keras, there are many different loss functions available. The most common loss function for binary classification is the binary crossentropy loss. This loss function is used when the model outputs a probability, as in the case of a single-unit layer with a sigmoid activation.
The binary crossentropy loss is defined as follows:
loss = -(y_true * tf.math.log(y_pred) + (1 - y_true) * tf.math.log(1 - y_pred))
where:
- `y_true` is the ground truth label
- `y_pred` is the model's prediction
Optimizer
An optimizer is an algorithm that updates the model's parameters to minimize the loss function. There are many different optimizers available, each with its own strengths and weaknesses.
The most common optimizer for training deep learning models is the Adam optimizer. Adam is a stochastic gradient descent optimizer that uses adaptive learning rates for each parameter. This makes it more efficient than other optimizers, such as stochastic gradient descent (SGD).
The Adam optimizer is defined as follows:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
where:
- `learning_rate` is the learning rate
Training the model
Once the loss function and optimizer have been chosen, the model can be trained. This is done by repeatedly feeding the model data and updating its parameters to minimize the loss.
The training process is repeated until the model converges, meaning that the loss function stops decreasing.
Here is an example of how to use the binary crossentropy loss and Adam optimizer to train a model on the MNIST dataset.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
# Normalize the data
x_train = x_train / 255.0
x_test = x_test / 255.0
# Create the model
model = models.Sequential()
model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=10)
# Evaluate the model
model.evaluate(x_test, y_test)
This code will train a model on the MNIST dataset for 10 epochs. After training, the model will be evaluated on the test set. The accuracy of the model will be printed to the console.