Unit
6 Natural Language
Natural Language
Natural Language is the language that humans
use to communicate with each other —
such as English, Hindi, Tamil, Bengali, etc.
Natural Language Processing (NLP)
Ø Natural
Language Processing (NLP)
is a branch of Artificial Intelligence (AI) that helps computers understand,
interpret, and respond to human language
— both spoken and written.
Ø Natural
Language Processing (NLP)
is the part of AI that allows computers to understand and use human language.
Ø NLP helps computers read,
listen, and talk like humans do.
How NLP Works (Step-by-Step):
- Input Text or Speech – You speak or type
something.
- Processing – The AI breaks it into
words and meanings.
- Understanding – It figures out what you
mean.
- Response Generation – It gives an appropriate
answer or action.
Features of Natural Languages
|
Feature |
Explanation
(Simple Words) |
Example |
|
1. Ambiguity |
The same word or sentence can
have more than one meaning. |
“I saw the man with the
telescope.” (Who has the telescope?) |
|
2. Context Dependence |
The meaning of words depends on
the situation or context. |
“Bank” can mean a river bank
or a money bank. |
|
3. Variety (Rich Vocabulary) |
Natural languages have many
ways to say the same thing. |
“Hi,” “Hello,” “Hey” — all
greetings. |
|
4. Evolving Nature |
Natural languages change
over time — new words are added, meanings shift. |
“Selfie,” “emoji,” “blog” are
new words. |
|
5. Grammar and Structure |
Each language follows its own
rules for sentence formation. |
English: Subject + Verb +
Object → “I eat mango.” |
|
6. Emotion and Tone |
Natural languages can express
feelings, humor, and sarcasm. |
“That’s great!” (can be genuine
or sarcastic). |
|
7. Cultural Influence |
Language reflects culture,
habits, and traditions of people. |
Certain greetings or idioms are
unique to each culture. |
Computer Language
Computer languages are
languages used to interact with a computer, such as Python, C++, Java, HTML,
etc.
Why is NLP important?
Computers can only
process electronic signals in the form of binary language. Natural Language
Processing facilitates this conversion to digital form from the natural form.

Applications of Natural Language Processing
|
Application |
Description
(Simple Words) |
Example |
|
1. Chatbots & Virtual
Assistants |
NLP helps AI assistants
understand spoken commands and respond. |
Siri, Alexa, Google Assistant |
|
2. Language Translation |
Converts one language into
another automatically. |
Google Translate |
|
3. Sentiment Analysis |
Finds out the emotion
(positive, negative, neutral) in a message or post. |
Analyzing product reviews or
tweets |
|
4. Speech Recognition |
Converts spoken words into text
so computers can understand them. |
Voice typing in Google Docs |
|
5. Spam Email Filtering |
NLP detects and blocks unwanted
or spam messages. |
Gmail spam filter |
|
6. Text Summarization |
Creates short summaries of long
articles or documents. |
News summary apps |
|
7. Autocorrect & Predictive
Text |
Suggests words or fixes
spelling while typing. |
Mobile keyboard suggestions |
|
8. Customer Support Systems |
Chatbots use NLP to answer
customer queries automatically. |
Online help chat windows |
Stages of Natural Language Processing (NLP)
The stages of Natural
Language Processing (NLP) typically involve the following

Lexical Analysis:
Ø NLP starts with identifying the structure of input
words.
Ø It is the process of dividing a large chunk of words
into structural paragraphs, sentences, and words. Lexicon stands for a
collection of the various words and phrases used in a language.

Syntactic Analysis / Parsing
Ø
It
is the process of checking the grammar of sentences and phrases. It forms a relationship
among words and eliminates logically incorrect sentences.
Semantic Analysis
Ø In this stage, the input text is now checked for
meaning, and every word and phrase is checked for meaningfulness.

Discourse Integration
Ø It is the process of forming the story of the
sentence. Every sentence should have a relationship with its preceding and
succeeding sentences.

Pragmatic Analysis
Ø In this stage, sentences are checked for their
relevance in the real world. Pragmatic means practical or logical, i.e., this
step requires knowledge of the intent in a sentence. It also means to discard
the actual word meaning taken after semantic analysis and take the intended
meaning.


Chatbots
Ø A chatbot is a computer program that's designed to simulate
human conversation through voice commands or text chats or both. It can answer
questions and troubleshoot customer problems, evaluate and qualify prospects,
generate sales leads and increase sales on an ecommerce site
Examples:
·
Siri (Apple)
·
Alexa (Amazon)
·
Google Assistant
·
Customer help chat on shopping
or banking websites
Types of Chatbot

Text Processing
Ø Text
Processing is the process of
allowing computers to read, understand, and work with text data using techniques from Natural
Language Processing (NLP).
Ø Text processing means teaching a computer to read
and work with text, just like humans
do — such as identifying words, sentences, meanings, or emotions in a message.

Text Normalization
Ø Text Normalization is the process of cleaning and preparing text
data so that the computer can easily understand and analyze it.
Ø Text normalization means converting messy or mixed-up
text into a standard and consistent format before processing
it.
Steps of Text Normalization
1. Sentence Segmentation
2. Tokenization
3. Removing Stop words, Special Characters and Numbers
4. Converting Text to a Common Case
5. Stemming
6. Lemmatization
Sentence Segmentation
Under sentence
segmentation, the whole corpus is divided into sentences. Each sentence is
taken as a different data so now the whole corpus gets reduced to sentences

Tokenization
After segmenting the
sentences, each sentence is then further divided into tokens. Tokens is a term
used for any word or number or special character occurring in a sentence. Under
tokenisation, every word, number and special character is considered separately
and each of them is now a separate token

Removing Stop words, Special Characters and
Numbers
Ø In this step, the tokens which are not necessary are
removed from the token list. What are the possible words which we might not
require?
Ø Stop
words are very common words in
a language that do not add much meaning to a sentence.
Examples: is, am, are, the, a, an, in, on, at, for, to, etc.
Sentence: “The cat is on the mat.”
After removing stop words → “cat mat”
Ø Special
characters are symbols or punctuation marks that are not useful for text analysis.
Examples: @, #, $, %, !, ?, *, &, (, ) etc.
Sentence: “Wow!!! This movie is awesome :)”
After removing special characters → “Wow This movie is awesome”
Ø Numbers are
sometimes removed when they don’t add meaning to the sentence (like phone
numbers or product codes).
Example:
Sentence: “I have 2 dogs and 3 cats.”
After removing numbers → “I have dogs and cats”
Converting Text to a Common Case
Ø Converting text
to a common case means changing all the letters in a text to either uppercase or lowercase so that the computer
treats them as the same word
Ø Computers
see “Apple”, “apple”, and “APPLE” as different words.
Stemming
Stemming is the process of reducing a word to its base or root form by
removing prefixes or suffixes.
Stemming means cutting off word endings
(like ing, ed, s) so that similar words
are grouped together.
It focuses on the root meaning of the
word.
Example
|
Original
Word |
After
Stemming |
|
playing |
play |
|
studies |
studi |
|
running |
run |
|
jumped |
jump |
|
easily |
easi |
Lemmatization
Ø Lemmatization is the
process of converting a word into its base or dictionary form (lemma)
by understanding its meaning and grammar.
Ø Stemming
and lemmatization both are alternative processes to each other as the role of
both the processes is same – removal of affixes. But the difference between
both of them isthat in lemmatization, the word we get after affix removal (also
known as lemma) is a meaningful one. Lemmatization makes sure that a lemma is a
word with meaning and hence it takes a longer time to execute than stemming.
Bag of Words
Ø
Bag of Words (BoW) is a
method used in Natural Language Processing (NLP)
to convert text into numbers so that a
computer can understand and analyze it.
Ø
Bag of Words (BoW) is a
technique used to represent text or images as numbers
so that a computer can understand and analyze them.
Ø
In the bag of words, we get the occurrences of
each word and construct the vocabulary for the corpus.
Key Points:
Ø
BoW ignores grammar and order of
words or pixels.
Ø
It focuses on frequency (how many times
each word or feature appears).
Ø
It’s used in text classification,
sentiment analysis, and image recognition.
Ø BoW changes text into numerical
form for computers.
Ø It ignores grammar and order
of words.
Ø It focuses only on word
frequency (how many times each word appears).
Ø It is used in NLP for:
ü Text classification
ü Sentiment analysis
ü Spam detection
Here is the step-by-step approach to implementing the bag of words
algorithm:
1. Text
Processing: Collect data and
pre-process it
2. Create
a Dictionary: Make a list of
all the unique words occurring in the corpus. (Vocabulary)
3. Create
document vectors: For each
document in the corpus, find out how many times the word from the unique list
of words has occurred.
4. Create
document vectors for all the documents.
Let us go through all the steps with an
example:
Step 1: Collecting data and pre-processing
it.
Document 1: Aman and Avni are stressed
Document 2: Aman went to a therapist
Document 3: Avni went to download a health chatbot
Here are three documents having one sentence
each. After text normalisation, the text becomes:
Document 1: [aman, and, avni, are, stressed]
Document 2: [aman, went, to, a, therapist]
Document 3: [avni, went, to, download, a, health, chatbot]
Step 2: Create a Dictionary Go through all
the steps and create a dictionary i.e., list down all the words which occur in
all three documents:
Dictionary (Vocabulary):
[aman, and, avni, are, stressed, went,
download, health, chatbot, therapist, a, to ]
|
|
Document 1: |
Document 1: |
Document 3: |
|
aman |
1 |
1 |
0 |
|
and |
1 |
0 |
0 |
|
avni |
1 |
0 |
1 |
|
are |
1 |
0 |
0 |
|
stressed |
1 |
0 |
0 |
|
went |
0 |
1 |
1 |
|
download |
0 |
0 |
1 |
|
health |
0 |
0 |
1 |
|
chatbot |
0 |
0 |
1 |
|
therapist |
0 |
1 |
0 |
|
a |
0 |
1 |
1 |
|
to |
0 |
1 |
1 |
TF-IDF: Term Frequency – Inverse Document
Frequency
TF-IDF is a
method used in Natural Language Processing (NLP)
to find how important a word is in a
document (or sentence) compared to all other documents.

Applications of TFIDF
TF IDF is
commonly used in the Natural Language Processing domain. Some of its applications
are:
|
1 |
2 |
3 |
4 |
|
Document Classification |
Topic Modelling |
Information Retrieval System |
Stop word filtering |
|
Helps in classifying the type and genre of a document. |
It helps in predicting the topic for a corpus |
To extract the important information out of a corpus. |
Helps in removing unnecessary words from a text body. |

|
Code
NLP |
No-Code
NLP |
|
NLTK
package: Natural Language Tool Kit or NLTK is a
package readily available for text processing in Python. The package contains
functions and modules which can be used for Natural Language Processing. |
Orange
Data Mining: It is a machine learning tool for data
analysis through Python and visual programming. We can perform operations on
data through simple drag-and-drop steps. |
|
Spa
Cy: Spa Cy is an open-source natural
language processing (NLP) library designed to build NLP applications. It
offers various features such as tokenization, part-of speech tagging, named
entity recognition, dependency parsing, and more. |
Monkey
Learn: Monkey Learn
is a text analysis platform that offers NLP tools and machine learning models
for text analysis, supporting tasks such as classification, sentiment
analysis, and entity recognition. Users can create custom models or use
pretrained ones for tasks like social media monitoring and customer feedback
analysis |
No comments:
Post a Comment
for more information please share like comment and subscribe