Learn NLP including text preprocessing, embeddings, sentiment analysis, and sequence-to-sequence models.
Learn NLP including text preprocessing, embeddings, sentiment analysis, and sequence-to-sequence models.
Learn how to prepare raw text data for NLP tasks.
Content by: Nirav Khanpara
AI/ML Engineer
Text preprocessing helps clean and standardize text, improving model accuracy.
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
nltk.download('punkt')
nltk.download('stopwords')
text = "Natural Language Processing is fun and powerful!"
tokens = word_tokenize(text.lower())
tokens = [t for t in tokens if t not in string.punctuation]
tokens = [t for t in tokens if t not in stopwords.words('english')]
print(tokens)
Test your understanding of this topic:
Understand word embeddings and their role in NLP.
Content by: Nirav Khanpara
AI/ML Engineer
Word embeddings map words into dense vectors, capturing semantic relationships.
from gensim.models import Word2Vec
sentences = [["natural", "language", "processing"], ["machine", "learning", "is", "fun"]]
model = Word2Vec(sentences, vector_size=50, window=5, min_count=1, workers=4)
print(model.wv['natural'])
Test your understanding of this topic:
Learn to classify text sentiment using deep learning.
Content by: Nirav Khanpara
AI/ML Engineer
A task to determine whether text expresses positive, negative, or neutral emotions.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
sentences = ["I love NLP", "I hate spam emails"]
labels = np.array([1, 0]) # 1: Positive, 0: Negative
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
padded = pad_sequences(sequences, maxlen=5)
model = keras.Sequential([
keras.layers.Embedding(1000, 16, input_length=5),
keras.layers.GlobalAveragePooling1D(),
keras.layers.Dense(16, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(padded, labels, epochs=10, verbose=1)
Test your understanding of this topic:
Explore Seq2Seq models for translation and text generation.
Content by: Nirav Khanpara
AI/ML Engineer
from tensorflow import keras
encoder_inputs = keras.layers.Input(shape=(None,))
x = keras.layers.Embedding(1000, 64)(encoder_inputs)
encoder_outputs, state_h, state_c = keras.layers.LSTM(64, return_state=True)(x)
encoder_states = [state_h, state_c]
decoder_inputs = keras.layers.Input(shape=(None,))
x = keras.layers.Embedding(1000, 64)(decoder_inputs)
decoder_lstm = keras.layers.LSTM(64, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(x, initial_state=encoder_states)
decoder_dense = keras.layers.Dense(1000, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = keras.models.Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy')
Test your understanding of this topic:
Continue your learning journey and master the next set of concepts.
Back to Course Overview