ML Basics with Keras in Tensorflow: Regression: Split the data into training and test sets

0

What is a training set and a test set?

A training set is a subset of the data that is used to train the model. The model learns to predict the output values for the input values in the training set.

A test set is a subset of the data that is used to evaluate the model's performance. The model is not trained on the test set, so it is a fair way to evaluate the model's ability to generalize to new data.

How to split the data into training and test sets?

To split the data into training and test sets, we can use the `train_test_split()` function from the `sklearn.model_selection` library.

The `train_test_split()` function takes three arguments:

  • The data
  • The target variable
  • The test size

The data is the input data for the model. The target variable is the output data that the model is trying to predict. The test size is the percentage of the data that will be used for the test set.

For example, to split the data into a training set and a test set with a test size of 20%, we would use the following code:

from sklearn.model_selection import train_test_split


# Get the data

data = pd.read_csv('data.csv')


# Get the target variable

target = data['target']


# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2)

This code will split the data into two sets:

  • The training set will contain 80% of the data.
  • The test set will contain 20% of the data.

The training set will be used to train the model. The test set will be used to evaluate the model's performance.

Tags

Post a Comment

0Comments
Post a Comment (0)