ML Basics with Keras in Tensorflow Text classification with TensorFlow Hub: Movie reviews

0
In this article, we will learn how to use Keras and TensorFlow Hub to classify movie reviews as positive or negative. This is an example of binary—or two-class—classification, an important and widely applicable kind of machine learning problem.

Getting started

First, we need to install the necessary libraries.

pip install tensorflow-hub
pip install tensorflow-datasets

Next, we need to load the IMDB dataset. This dataset contains the text of 50,000 movie reviews from the Internet Movie Database. They are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are balanced, meaning they contain an equal number of positive and negative reviews.

import tensorflow_datasets as tfds

dataset = tfds.load("imdb_reviews", split="train")

Preprocessing the data

The data needs to be preprocessed before we can train a model on it. This includes cleaning the text, removing stop words, and vectorizing the words.

def preprocess_text(text):
  # Clean the text
  text = text.lower()
  text = text.replace("[^a-zA-Z]", " ")

  # Remove stop words
  stop_words = set(stopwords.words("english"))
  text = " ".join([word for word in text.split() if word not in stop_words])

  # Vectorize the words
  tokenizer = Tokenizer()
  tokenizer.fit_on_texts([text])
  word_ids = tokenizer.texts_to_sequences([text])

  return word_ids

dataset = dataset.map(preprocess_text)

Training the model

Now that the data is preprocessed, we can train a model on it. We will use a simple neural network with two hidden layers.

model = tf.keras.Sequential([
  tf.keras.layers.Embedding(input_dim=len(tokenizer.word_index), output_dim=128),
  tf.keras.layers.LSTM(64),
  tf.keras.layers.Dense(1, activation="sigmoid")
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

model.fit(dataset, epochs=10)

Evaluating the model

We can evaluate the model on the test set to see how well it performs.

test_loss, test_accuracy = model.evaluate(dataset["test"])

print("Test loss:", test_loss)
Tags

Post a Comment

0Comments
Post a Comment (0)