🌟What is NLP?
📚NLP (Natural Language Processing) evaluates natural language:
Text 📚 (webpage 💻, SMS 📲, email 📨, and menus 🍽️)
Audio 🔊 (Siri)
Signs and gestures 🖖
Others (songs 🎤, music sheet 🎼, and Morse code 🧑💻)
🌟There are countless more examples of natural languages that provide a more direct interaction between machines 🤖 and humans 🧑💻.
💫NLP dates back to the 1950s with the paper (Turing) 📃 evaluating whether a computer could convince a human to believe that they are humans 🧑💻 through a Turing Test.
Tokenization
🌟Tokenization splits text 📃into fragments: words, characters or sentences and removes redundant details (punctuation marks ⁉️, emoticons😀, and digits 🔢)
import nltk
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize # to tokenize sentences
nltk.download()
sent = """
I'm reading a book.
It is Python Machine Learning By Example,
2nd Edition, by Yuxi (Hayden) Liu.
"""
print(names.words()[:10])
print(word_tokenize(sent))
print(sent_tokenize(sent))
💻The code snippet produces the following output:
📰Newsgroup data
📰Newsgroup data contains data from 20,000 documents across 20 online newsgroups. 🌟
from sklearn.datasets import fetch_20newsgroups
groups = fetch_20newsgroups()
groups.keys()
📰A key value dictionary stores the data object using the following keys ⬆️.
groups.target_names
📰The target_names
give the names of the Newspapers 🔽, which can be encoded as integers 🔢.
groups.target
import seaborn as sns
sns.distplot(groups.target)
💻The Seaborn package 🎒produces a histogram of the topics to measure how the news 📰 categories are distributed.
💻Seaborn Installation Guide 🎓
python -m pip install -U matplotlib
pip install seaborn
🌟The matplotlib library visualizes the histogram 📊 and pip installs both libraries.
conda install -c conda-forge matplotlib
conda install seaborn
🌟conda installation guide for seaborn 🎓⬆️.