ML Basics with Keras in Tensorflow: Load the dataset

In this article, we will learn how to load a dataset into Keras and prepare it for training. We will use the `text_dataset_from_directory` utility to create a labeled `tf.data.Dataset`. We will also split the dataset into three splits: train, validation, and test.

Create a directory structure

Next, we need to create a directory structure for our data. We will need two folders, one for each class. For example:

main_directory/

class_a/

a_text_1.txt

a_text_2.txt

class_b/

b_text_1.txt

b_text_2.txt

Load the data

Now, we can load the data into a `tf.data.Dataset` using the `text_dataset_from_directory` utility. This utility will create a dataset for each class, and label the data with the class name. For example:

dataset = tf.keras.utils.text_dataset_from_directory(

main_directory,

batch_size=32,

label_mode="binary",

shuffle=True,

)

The `batch_size` argument specifies the number of examples to load in each batch. The `label_mode` argument specifies the type of label. In this case, we are using a binary classification problem, so the label mode is "binary". The `shuffle` argument specifies whether to shuffle the data before each epoch.

Split the dataset

Finally, we need to split the dataset into three splits: train, validation, and test. We will use the `train_test_split` utility to do this. The `train_test_split` utility will split the data into a 60/20/20 split by default. For example:

(train_dataset, test_dataset) = tf.keras.utils.train_test_split(

dataset,

test_size=0.2,

)

(train_dataset, validation_dataset) = tf.keras.utils.train_test_split(

train_dataset,

test_size=0.25,

)

The `train_dataset` will contain 60% of the data, the `validation_dataset` will contain 20% of the data, and the `test_dataset` will contain 20% of the data.

Now that we have loaded the data and split it into three splits, we can start training our model. In the next article, we will learn how to create a simple model and train it on the train dataset.

ML Basics with Keras in Tensorflow: Load the dataset

Create a directory structure

Load the data

Split the dataset

Post a Comment

C Programming: An Overview

Learn with us

Contact form

ML Basics with Keras in Tensorflow: Load the dataset

Create a directory structure

Load the data

Split the dataset

You may like these posts

Post a Comment

Contact form