w3ajay

tally tutorial point, ms word 2013, ms excel 2013, ms powerpoint 2010,ccc question with answer in hindi 2021, Tally Prime in hindi , tally prime,Python,in python,programming in python,python

Sunday, November 2, 2025

class 10 Unit 6 Natural Language

 

Unit 6    Natural Language

Natural Language

Natural Language is the language that humans use to communicate with each other — such as English, Hindi, Tamil, Bengali, etc.

Natural Language Processing (NLP)

Ø Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps computers understand, interpret, and respond to human language — both spoken and written.

Ø Natural Language Processing (NLP) is the part of AI that allows computers to understand and use human language.

Ø NLP helps computers read, listen, and talk like humans do.

How NLP Works (Step-by-Step):

  1. Input Text or Speech – You speak or type something.
  2. Processing – The AI breaks it into words and meanings.
  3. Understanding – It figures out what you mean.
  4. Response Generation – It gives an appropriate answer or action.

Features of Natural Languages

Feature

Explanation (Simple Words)

Example

1. Ambiguity

The same word or sentence can have more than one meaning.

“I saw the man with the telescope.” (Who has the telescope?)

2. Context Dependence

The meaning of words depends on the situation or context.

“Bank” can mean a river bank or a money bank.

3. Variety (Rich Vocabulary)

Natural languages have many ways to say the same thing.

“Hi,” “Hello,” “Hey” — all greetings.

4. Evolving Nature

Natural languages change over time — new words are added, meanings shift.

“Selfie,” “emoji,” “blog” are new words.

5. Grammar and Structure

Each language follows its own rules for sentence formation.

English: Subject + Verb + Object → “I eat mango.”

6. Emotion and Tone

Natural languages can express feelings, humor, and sarcasm.

“That’s great!” (can be genuine or sarcastic).

7. Cultural Influence

Language reflects culture, habits, and traditions of people.

Certain greetings or idioms are unique to each culture.

 

 

Computer Language

Computer languages are languages used to interact with a computer, such as Python, C++, Java, HTML, etc.

Why is NLP important?

Computers can only process electronic signals in the form of binary language. Natural Language Processing facilitates this conversion to digital form from the natural form.

 

 

 

 

 

 

 

 

 

 


Applications of Natural Language Processing

Application

Description (Simple Words)

Example

1. Chatbots & Virtual Assistants

NLP helps AI assistants understand spoken commands and respond.

Siri, Alexa, Google Assistant

2. Language Translation

Converts one language into another automatically.

Google Translate

3. Sentiment Analysis

Finds out the emotion (positive, negative, neutral) in a message or post.

Analyzing product reviews or tweets

4. Speech Recognition

Converts spoken words into text so computers can understand them.

Voice typing in Google Docs

5. Spam Email Filtering

NLP detects and blocks unwanted or spam messages.

Gmail spam filter

6. Text Summarization

Creates short summaries of long articles or documents.

News summary apps

7. Autocorrect & Predictive Text

Suggests words or fixes spelling while typing.

Mobile keyboard suggestions

8. Customer Support Systems

Chatbots use NLP to answer customer queries automatically.

Online help chat windows

 

Stages of Natural Language Processing (NLP)

The stages of Natural Language Processing (NLP) typically involve the following

 

 

 

 

 

 


Lexical Analysis:

Ø NLP starts with identifying the structure of input words.

Ø It is the process of dividing a large chunk of words into structural paragraphs, sentences, and words. Lexicon stands for a collection of the various words and phrases used in a language.

 

 

 

 

 

 

 

 

 

Syntactic Analysis / Parsing

Ø It is the process of checking the grammar of sentences and phrases. It forms a relationship among words and eliminates logically incorrect sentences.

 

 

 

 

Semantic Analysis

Ø In this stage, the input text is now checked for meaning, and every word and phrase is checked for meaningfulness.

Discourse Integration

Ø It is the process of forming the story of the sentence. Every sentence should have a relationship with its preceding and succeeding sentences.

Pragmatic Analysis

Ø In this stage, sentences are checked for their relevance in the real world. Pragmatic means practical or logical, i.e., this step requires knowledge of the intent in a sentence. It also means to discard the actual word meaning taken after semantic analysis and take the intended meaning.

 

 

 

 

 

 

 

 

 

 

 

Chatbots

Ø A chatbot is a computer program that's designed to simulate human conversation through voice commands or text chats or both. It can answer questions and troubleshoot customer problems, evaluate and qualify prospects, generate sales leads and increase sales on an ecommerce site

Examples:

·         Siri (Apple)

·         Alexa (Amazon)

·         Google Assistant

·         Customer help chat on shopping or banking websites

 

 

Types of Chatbot

Text Processing

Ø Text Processing is the process of allowing computers to read, understand, and work with text data using techniques from Natural Language Processing (NLP).

Ø Text processing means teaching a computer to read and work with text, just like humans do — such as identifying words, sentences, meanings, or emotions in a message.

 

 

 

 

 

 

 

 

 

 

 

 

 


Text Normalization

Ø Text Normalization is the process of cleaning and preparing text data so that the computer can easily understand and analyze  it.

Ø Text normalization means converting messy or mixed-up text into a standard and consistent format before processing it.

Steps of Text Normalization

1.    Sentence Segmentation

2.    Tokenization

3.    Removing Stop words, Special Characters and Numbers

4.    Converting Text to a Common Case

5.    Stemming

6.   Lemmatization

Sentence Segmentation

Under sentence segmentation, the whole corpus is divided into sentences. Each sentence is taken as a different data so now the whole corpus gets reduced to sentences

 

Tokenization

After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence. Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token

 

Removing Stop words, Special Characters and Numbers

Ø  In this step, the tokens which are not necessary are removed from the token list. What are the possible words which we might not require?

Ø Stop words are very common words in a language that do not add much meaning to a sentence.

Examples: is, am, are, the, a, an, in, on, at, for, to, etc.

Sentence: “The cat is on the mat.”
After removing stop words → “cat mat”

Ø Special characters are symbols or punctuation marks that are not useful for text analysis.

Examples: @, #, $, %, !, ?, *, &, (, ) etc.

Sentence: “Wow!!! This movie is awesome :)”
After removing special characters → “Wow This movie is awesome”

Ø Numbers are sometimes removed when they don’t add meaning to the sentence (like phone numbers or product codes).

Example:
Sentence: “I have 2 dogs and 3 cats.”
After removing numbers → “I have dogs and cats”

Converting Text to a Common Case

Ø Converting text to a common case means changing all the letters in a text to either uppercase or lowercase so that the computer treats them as the same word

Ø Computers see “Apple”, “apple”, and “APPLE” as different words.

Stemming

Stemming is the process of reducing a word to its base or root form by removing prefixes or suffixes.

Stemming means cutting off word endings (like ing, ed, s) so that similar words are grouped together.
It focuses on the root meaning of the word.

Example

Original Word

After Stemming

playing

play

studies

studi

running

run

jumped

jump

easily

easi

 

Lemmatization

Ø Lemmatization is the process of converting a word into its base or dictionary form (lemma) by understanding its meaning and grammar.

Ø Stemming and lemmatization both are alternative processes to each other as the role of both the processes is same – removal of affixes. But the difference between both of them isthat in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one. Lemmatization makes sure that a lemma is a word with meaning and hence it takes a longer time to execute than stemming.

Bag of Words

Ø  Bag of Words (BoW) is a method used in Natural Language Processing (NLP) to convert text into numbers so that a computer can understand and analyze it.

Ø  Bag of Words (BoW) is a technique used to represent text or images as numbers so that a computer can understand and analyze them.

Ø  In the bag of words, we get the occurrences of each word and construct the vocabulary for the corpus.

Key Points:

Ø  BoW ignores grammar and order of words or pixels.

Ø  It focuses on frequency (how many times each word or feature appears).

Ø  It’s used in text classification, sentiment analysis, and image recognition.

Ø  BoW changes text into numerical form for computers.

Ø  It ignores grammar and order of words.

Ø  It focuses only on word frequency (how many times each word appears).

Ø  It is used in NLP for:

ü  Text classification

ü  Sentiment analysis

ü  Spam detection

 

 

Here is the step-by-step approach to implementing the bag of words algorithm:

1.      Text Processing: Collect data and pre-process it

2.      Create a Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary) 

3.      Create document vectors: For each document in the corpus, find out how many times the word from the unique list of words has occurred.

4.     Create document vectors for all the documents.

Let us go through all the steps with an example:

Step 1: Collecting data and pre-processing it.

Document 1: Aman and Avni are stressed

Document 2: Aman went to a therapist

Document 3: Avni went to download a health chatbot

Here are three documents having one sentence each. After text normalisation, the text becomes:

Document 1: [aman, and, avni, are, stressed]

Document 2: [aman, went, to, a, therapist]

Document 3: [avni, went, to, download, a, health, chatbot]

Step 2: Create a Dictionary Go through all the steps and create a dictionary i.e., list down all the words which occur in all three documents:

Dictionary (Vocabulary):

[aman, and, avni, are, stressed, went, download, health, chatbot, therapist, a, to ]

 

Document 1:

Document 1:

Document 3:

aman

1

1

0

and

1

0

0

avni

1

0

1

are

1

0

0

stressed

1

0

0

went

0

1

1

download

0

0

1

health

0

0

1

chatbot

0

0

1

therapist

0

1

0

a

0

1

1

to

0

1

1

 

TF-IDF: Term Frequency – Inverse Document Frequency

TF-IDF is a method used in Natural Language Processing (NLP) to find how important a word is in a document (or sentence) compared to all other documents.

 

 

 

 

 

 

 

 

 

Applications of TFIDF

 TF IDF is commonly used in the Natural Language Processing domain. Some of its applications are:

1

2

3

4

Document Classification

Topic Modelling

Information Retrieval System

Stop word filtering

Helps in classifying the type and genre of a document.

It helps in predicting the topic for a corpus

To extract the important information out of a corpus.

Helps in removing unnecessary words from a text body.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Code NLP

No-Code NLP

NLTK package: Natural Language Tool Kit or NLTK is a package readily available for text processing in Python. The package contains functions and modules which can be used for Natural Language Processing.

Orange Data Mining: It is a machine learning tool for data analysis through Python and visual programming. We can perform operations on data through simple drag-and-drop steps.

Spa Cy: Spa Cy is an open-source natural language processing (NLP) library designed to build NLP applications. It offers various features such as tokenization, part-of speech tagging, named entity recognition, dependency parsing, and more.

Monkey Learn: Monkey Learn is a text analysis platform that offers NLP tools and machine learning models for text analysis, supporting tasks such as classification, sentiment analysis, and entity recognition. Users can create custom models or use pretrained ones for tasks like social media monitoring and customer feedback analysis

 

 

No comments:

Post a Comment

for more information please share like comment and subscribe