class 10 Unit 6 Natural Language

Unit 6 Natural Language

Natural Language

Natural Language is the language that humans use to communicate with each other — such as English, Hindi, Tamil, Bengali, etc.

Natural Language Processing (NLP)

Ø Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps computers understand, interpret, and respond to human language — both spoken and written.

Ø Natural Language Processing (NLP) is the part of AI that allows computers to understand and use human language.

Ø NLP helps computers read, listen, and talk like humans do.

How NLP Works (Step-by-Step):

Input Text or Speech – You speak or type something.
Processing – The AI breaks it into words and meanings.
Understanding – It figures out what you mean.
Response Generation – It gives an appropriate answer or action.

Features of Natural Languages

Feature	Explanation (Simple Words)	Example
1. Ambiguity	The same word or sentence can have more than one meaning.	“I saw the man with the telescope.” (Who has the telescope?)
2. Context Dependence	The meaning of words depends on the situation or context.	“Bank” can mean a river bank or a money bank.
3. Variety (Rich Vocabulary)	Natural languages have many ways to say the same thing.	“Hi,” “Hello,” “Hey” — all greetings.
4. Evolving Nature	Natural languages change over time — new words are added, meanings shift.	“Selfie,” “emoji,” “blog” are new words.
5. Grammar and Structure	Each language follows its own rules for sentence formation.	English: Subject + Verb + Object → “I eat mango.”
6. Emotion and Tone	Natural languages can express feelings, humor, and sarcasm.	“That’s great!” (can be genuine or sarcastic).
7. Cultural Influence	Language reflects culture, habits, and traditions of people.	Certain greetings or idioms are unique to each culture.

Computer Language

Computer languages are languages used to interact with a computer, such as Python, C++, Java, HTML, etc.

Why is NLP important?

Computers can only process electronic signals in the form of binary language. Natural Language Processing facilitates this conversion to digital form from the natural form.

Applications of Natural Language Processing

Application	Description (Simple Words)	Example
1. Chatbots & Virtual Assistants	NLP helps AI assistants understand spoken commands and respond.	Siri, Alexa, Google Assistant
2. Language Translation	Converts one language into another automatically.	Google Translate
3. Sentiment Analysis	Finds out the emotion (positive, negative, neutral) in a message or post.	Analyzing product reviews or tweets
4. Speech Recognition	Converts spoken words into text so computers can understand them.	Voice typing in Google Docs
5. Spam Email Filtering	NLP detects and blocks unwanted or spam messages.	Gmail spam filter
6. Text Summarization	Creates short summaries of long articles or documents.	News summary apps
7. Autocorrect & Predictive Text	Suggests words or fixes spelling while typing.	Mobile keyboard suggestions
8. Customer Support Systems	Chatbots use NLP to answer customer queries automatically.	Online help chat windows

Stages of Natural Language Processing (NLP)

The stages of Natural Language Processing (NLP) typically involve the following

Lexical Analysis:

Ø NLP starts with identifying the structure of input words.

Ø It is the process of dividing a large chunk of words into structural paragraphs, sentences, and words. Lexicon stands for a collection of the various words and phrases used in a language.

Syntactic Analysis / Parsing

Ø It is the process of checking the grammar of sentences and phrases. It forms a relationship among words and eliminates logically incorrect sentences.

Semantic Analysis

Ø In this stage, the input text is now checked for meaning, and every word and phrase is checked for meaningfulness.

Discourse Integration

Ø It is the process of forming the story of the sentence. Every sentence should have a relationship with its preceding and succeeding sentences.

Pragmatic Analysis

Ø In this stage, sentences are checked for their relevance in the real world. Pragmatic means practical or logical, i.e., this step requires knowledge of the intent in a sentence. It also means to discard the actual word meaning taken after semantic analysis and take the intended meaning.

Chatbots

Ø A chatbot is a computer program that's designed to simulate human conversation through voice commands or text chats or both. It can answer questions and troubleshoot customer problems, evaluate and qualify prospects, generate sales leads and increase sales on an ecommerce site

Examples:

· Siri (Apple)

· Alexa (Amazon)

· Google Assistant

· Customer help chat on shopping or banking websites

Types of Chatbot

Text Processing

Ø Text Processing is the process of allowing computers to read, understand, and work with text data using techniques from Natural Language Processing (NLP).

Ø Text processing means teaching a computer to read and work with text, just like humans do — such as identifying words, sentences, meanings, or emotions in a message.

Text Normalization

Ø Text Normalization is the process of cleaning and preparing text data so that the computer can easily understand and analyze it.

Ø Text normalization means converting messy or mixed-up text into a standard and consistent format before processing it.

Steps of Text Normalization

1. Sentence Segmentation

2. Tokenization

3. Removing Stop words, Special Characters and Numbers

4. Converting Text to a Common Case

5. Stemming

6. Lemmatization

Sentence Segmentation

Under sentence segmentation, the whole corpus is divided into sentences. Each sentence is taken as a different data so now the whole corpus gets reduced to sentences

Tokenization

After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence. Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token

Removing Stop words, Special Characters and Numbers

Ø In this step, the tokens which are not necessary are removed from the token list. What are the possible words which we might not require?

Ø Stop words are very common words in a language that do not add much meaning to a sentence.

Examples: is, am, are, the, a, an, in, on, at, for, to, etc.

Sentence: “The cat is on the mat.”
After removing stop words → “cat mat”

Ø Special characters are symbols or punctuation marks that are not useful for text analysis.

Examples: @, #, $, %, !, ?, *, &, (, ) etc.

Sentence: “Wow!!! This movie is awesome :)”
After removing special characters → “Wow This movie is awesome”

Ø Numbers are sometimes removed when they don’t add meaning to the sentence (like phone numbers or product codes).

Example:
Sentence: “I have 2 dogs and 3 cats.”
After removing numbers → “I have dogs and cats”

Converting Text to a Common Case

Ø Converting text to a common case means changing all the letters in a text to either uppercase or lowercase so that the computer treats them as the same word

Ø Computers see “Apple”, “apple”, and “APPLE” as different words.

Stemming

Stemming is the process of reducing a word to its base or root form by removing prefixes or suffixes.

Stemming means cutting off word endings (like ing, ed, s) so that similar words are grouped together.
It focuses on the root meaning of the word.

Example

Original Word	After Stemming
playing	play
studies	studi
running	run
jumped	jump
easily	easi

Lemmatization

Ø Lemmatization is the process of converting a word into its base or dictionary form (lemma) by understanding its meaning and grammar.

Ø Stemming and lemmatization both are alternative processes to each other as the role of both the processes is same – removal of affixes. But the difference between both of them isthat in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one. Lemmatization makes sure that a lemma is a word with meaning and hence it takes a longer time to execute than stemming.

Bag of Words

Ø Bag of Words (BoW) is a method used in Natural Language Processing (NLP) to convert text into numbers so that a computer can understand and analyze it.

Ø Bag of Words (BoW) is a technique used to represent text or images as numbers so that a computer can understand and analyze them.

Ø In the bag of words, we get the occurrences of each word and construct the vocabulary for the corpus.

Key Points:

Ø BoW ignores grammar and order of words or pixels.

Ø It focuses on frequency (how many times each word or feature appears).

Ø It’s used in text classification, sentiment analysis, and image recognition.

Ø BoW changes text into numerical form for computers.

Ø It ignores grammar and order of words.

Ø It focuses only on word frequency (how many times each word appears).

Ø It is used in NLP for:

ü Text classification

ü Sentiment analysis

ü Spam detection

Here is the step-by-step approach to implementing the bag of words algorithm:

1. Text Processing: Collect data and pre-process it

2. Create a Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)

3. Create document vectors: For each document in the corpus, find out how many times the word from the unique list of words has occurred.

4. Create document vectors for all the documents.

Let us go through all the steps with an example:

Step 1: Collecting data and pre-processing it.

Document 1: Aman and Avni are stressed

Document 2: Aman went to a therapist

Document 3: Avni went to download a health chatbot

Here are three documents having one sentence each. After text normalisation, the text becomes:

Document 1: [aman, and, avni, are, stressed]

Document 2: [aman, went, to, a, therapist]

Document 3: [avni, went, to, download, a, health, chatbot]

Step 2: Create a Dictionary Go through all the steps and create a dictionary i.e., list down all the words which occur in all three documents:

Dictionary (Vocabulary):

[aman, and, avni, are, stressed, went, download, health, chatbot, therapist, a, to ]

	Document 1:	Document 1:	Document 3:
aman	1	1	0
and	1	0	0
avni	1	0	1
are	1	0	0
stressed	1	0	0
went	0	1	1
download	0	0	1
health	0	0	1
chatbot	0	0	1
therapist	0	1	0
a	0	1	1
to	0	1	1

TF-IDF: Term Frequency – Inverse Document Frequency

TF-IDF is a method used in Natural Language Processing (NLP) to find how important a word is in a document (or sentence) compared to all other documents.

Applications of TFIDF

TF IDF is commonly used in the Natural Language Processing domain. Some of its applications are:

1	2	3	4
Document Classification	Topic Modelling	Information Retrieval System	Stop word filtering
Helps in classifying the type and genre of a document.	It helps in predicting the topic for a corpus	To extract the important information out of a corpus.	Helps in removing unnecessary words from a text body.

Code NLP	No-Code NLP
NLTK package: Natural Language Tool Kit or NLTK is a package readily available for text processing in Python. The package contains functions and modules which can be used for Natural Language Processing.	Orange Data Mining: It is a machine learning tool for data analysis through Python and visual programming. We can perform operations on data through simple drag-and-drop steps.
Spa Cy: Spa Cy is an open-source natural language processing (NLP) library designed to build NLP applications. It offers various features such as tokenization, part-of speech tagging, named entity recognition, dependency parsing, and more.	Monkey Learn: Monkey Learn is a text analysis platform that offers NLP tools and machine learning models for text analysis, supporting tasks such as classification, sentiment analysis, and entity recognition. Users can create custom models or use pretrained ones for tasks like social media monitoring and customer feedback analysis

Sunday, November 2, 2025

class 10 Unit 6 Natural Language

Examples:

Key Points:

Here is the step-by-step approach to implementing the bag of words algorithm:

No comments:

Post a Comment

Follow Us

Facebook

Recent in Sports

Link List

Default Variables

Top Social Widget

Menu Footer Widget

Social Plugin

पढ़ने के लिए टॉपिक चुने

Categories

BLOG

Pages - Menu

Blog Archive

Recent Post

Tags

Recent Comments