Project Based Learning

This page contains the NCERT Informatics Practices class 12 chapter 7 Project Based Learning from the book Informatics Practices. You can find the solutions for the chapter 7 of NCERT class 12 Informatics Practices, for the Project Related Details in this page. So is the case if you are looking for NCERT class 12 Informatics Practices related topic Project Based Learning project solutions.

Buy Class 12 Informatics Practices Books NOW!

PROJECTS

Description

Murugan plans to launch an online shopping platform— ‘APPAREL EASY’. He plans to have two broad categories of merchandise—Men, Women. Under both the categories— Clothing, Footwear and Accessories will be the sub-categories. Also, on his shopping platform, he is planning to launch two mega events— Festive Sale (a month before Diwali to Christmas), End of Season Sale (February and August). Murugan also wants to keep a record of his monthly revenue generation sales and category wise sales, with special focus on mega events. A record should also be kept on discounts being offered by the manufacturers, payment sites or any discount offered as a promotional campaign by the APPAREL EASY portal.

Specification

The details of the Men and Women apparels should be stored in a data file with fields as Apparel Code, Name, Category, Size, Price, Customer Name, Payment Mode, Discount Code, etc.

If the Category is Men, then apparels can be Men’s Trousers, Men’s Shirt, Men’s Jeans, Men’s T-shirt. If the Category is Women, then the apparels can be Skirts, Top, Pants, Jeans, Kurta, etc.

If the Payment Mode is Credit or Debit Card, then the Credit card number, name, CVV and validity should be entered.

If the Payment Mode is Cash on Delivery(COD), then no details to be asked.

Randomly select the merchandise to be put on sale. The selected merchandise should not be more than 70 per cent of the total merchandise.

The discount code can be either FEST (for Festive) or EOS (for End of Season). The discounts for FEST will be 10 per cent and for EOS will be 15 per cent.

You need to visualise the data structure, keeping all the requirements of Murugan in mind, and then implement it using Python Pandas. Thereafter, you need to design a software to store details of the merchandise to put them online for sale. At the same time, records of customers visiting the e-commerce site and the number of customers placing the order also have to be maintained. The data collected should be plotted appropriately to help Murugan make decisions for future marketing and promotion strategies.

1) What I planned to build (as a student)

Objectives

•

Store merchandise details in a data file (CSV) with fields like Apparel Code, Name, Category, Size, Price, Customer Name, Payment Mode, Discount Code etc.

•

Maintain records of:

•

customers visiting the site

•

customers placing orders

•

Compute:

•

monthly revenue generation

•

category-wise sales

•

special focus on mega events (Festive Sale, End of Season Sale)

•

Use Python Pandas to implement data handling and plot appropriately for decisions.

2) Data structure (CSV files I used)

I created 3 simple CSV files:

merchandise.csv (Master list of items)

Columns:

•

ApparelCode, Name, GenderCategory, SubCategory, Size, Price

visits.csv (Customer visits record)

Columns:

•

VisitID, VisitDate, CustomerName, GenderViewed, Device, DidOrder (Y/N)

orders.csv (Orders placed)

Columns (as per requirements + a few useful ones):

•

OrderID, OrderDate, Month, Event, ApparelCode, Name, GenderCategory, SubCategory, Size, Price, PaymentMode, DiscountCode, DiscountPercent, FinalAmount

Discount rules:

•

FEST = 10%

•

EOS = 15%

Payment Mode rule:

•

If Credit/Debit Card → card details to be entered

•

If COD → no details asked

(In my project file, I stored only a masked card like XXXX-XXXX-XXXX-1234 for safety.)

3) Sample data I prepared (presented as a student)

A) Sample merchandise.csv (10 rows shown)

MTS101

Men’s T-shirt

Men

Clothing

799

MSH201

Men’s Shirt

Men

Clothing

1,299

MJE301

Men’s Jeans

Men

Clothing

1,899

MTR401

Men’s Trousers

Men

Clothing

1,599

MFW501

Men’s Sneakers

Men

Footwear

2,499

MAC601

Men’s Wallet

Men

Accessories

999

WKT101

Women’s Kurta

Women

Clothing

1,499

WTP201

Women’s Top

Women

Clothing

899

WFW301

Women’s Heels

Women

Footwear

2,199

WAC401

Women’s Handbag

Women

Accessories

1,999

B) Random merchandise selection for sale (≤ 70%)

Suppose total merchandise in my file = 20 items.

Maximum allowed on sale = 70% of 20 = 14 items.

I randomly selected 14 items for the sale list (valid).

C) Sample visits.csv (8 rows shown)

2025-02-02

Riya

Women

Mobile

2025-02-02

Aman

Men

Laptop

2025-02-05

Neha

Women

Mobile

2025-08-10

Kabir

Men

Mobile

2025-08-11

Sara

Women

Laptop

2025-11-20

Rahul

Men

Mobile

2025-12-05

Ananya

Women

Mobile

2025-12-06

Mohit

Men

Laptop

From this, I can calculate:

•

Total visitors

•

Orders placed

•

Conversion rate = orders / visitors

D) Sample orders.csv (10 rows shown)

101

2025-02-02

Feb

EOS

WKT101

Women

Clothing

1,499

COD

EOS

1,274.15

102

2025-02-05

Feb

EOS

MFW501

Men

Footwear

2,499

Card

EOS

2,124.15

103

2025-08-10

Aug

EOS

WAC401

Women

Accessories

1,999

Card

EOS

1,699.15

104

2025-08-11

Aug

EOS

MSH201

Men

Clothing

1,299

COD

EOS

1,104.15

105

2025-11-20

Nov

FEST

MJE301

Men

Clothing

1,899

Card

FEST

1,709.10

106

2025-11-21

Nov

FEST

WTP201

Women

Clothing

899

COD

FEST

809.10

107

2025-12-05

Dec

FEST

WFW301

Women

Footwear

2,199

Card

FEST

1,979.10

108

2025-12-06

Dec

FEST

MAC601

Men

Accessories

999

COD

FEST

899.10

109

2025-03-12

Mar

None

MTS101

Men

Clothing

799

COD

None

799.00

110

2025-04-18

Apr

None

WKT101

Women

Clothing

1,499

Card

None

1,499.00

4) What analysis I performed using Pandas

A) Monthly revenue generation

I grouped orders by Month and summed FinalAmount.

Sample output (from my sample orders above):

Feb

3,398.30

Aug

2,803.30

Nov

2,518.20

Dec

2,878.20

Mar

799.00

Apr

1,499.00

B) Category-wise sales (Men/Women + Clothing/Footwear/Accessories)

I grouped by:

•

GenderCategory

•

SubCategory

and summed FinalAmount.

Men

Clothing

3,312.25

Men

Footwear

2,124.15

Men

Accessories

899.10

Women

Clothing

3,582.25

Women

Footwear

1,979.10

Women

Accessories

1,699.15

C) Mega event focus (Festive Sale vs End of Season Sale)

Grouped by Event and summed FinalAmount.

EOS

6,201.60

FEST

5,396.40

None

2,298.00

D) Discounts record (FEST/EOS/None)

Grouped by DiscountCode.

Sample output:

EOS

(1499×0.15 + 2499×0.15 + 1999×0.15 + 1299×0.15) = 1,050.00

FEST

(1899×0.10 + 899×0.10 + 2199×0.10 + 999×0.10) = 599.60

None

(This helps Murugan track which sale gives more discount and which gives more revenue.)

5) Python (Pandas) code

import pandas as pd
import matplotlib.pyplot as plt

mer = pd.read_csv("merchandise.csv")
vis = pd.read_csv("visits.csv")
ord = pd.read_csv("orders.csv")

# 1) Monthly revenue generation
monthly_rev = ord.groupby("Month")["FinalAmount"].sum().sort_index()
print(monthly_rev)

# 2) Category wise sales
cat_sales = ord.groupby(["GenderCategory","SubCategory"])["FinalAmount"].sum()
print(cat_sales)

# 3) Event-wise revenue (mega events focus)
event_rev = ord.groupby("Event")["FinalAmount"].sum()
print(event_rev)

# 4) Conversion rate (customers visiting vs placing order)
total_visits = len(vis)
total_orders = (vis["DidOrder"] == "Y").sum()
conversion_rate = (total_orders / total_visits) * 100
print("Conversion Rate:", round(conversion_rate, 2), "%")

# Plots (as asked: “plotted appropriately”)
monthly_rev.plot(kind="bar")
plt.title("Monthly Revenue Generation")
plt.ylabel("Revenue (₹)")
plt.show()

event_rev.plot(kind="bar")
plt.title("Revenue During Mega Events")
plt.ylabel("Revenue (₹)")
plt.show()

6) Conclusion

From my analysis, I can clearly see:

•

Which month earns the highest revenue (monthly revenue generation).

•

Which category (Men/Women) and sub-category (Clothing/Footwear/Accessories) sells more.

•

How much revenue comes during mega events like Festive Sale and End of Season Sale.

•

How discounts (FEST 10%, EOS 15%) affect revenue and sales.

Bar Chart depicting the APPAREL EASY: Monthly Revenue Generation (Sample Data)

Bar Chart depicting the APPAREL EASY: Revenue by Mega Events (Sample Data)

Bar Chart depicting the APPAREL EASY: Category-wise Sales (Gender + Sub-category)

Bar Chart depicting the APPAREL EASY: Customers Visiting vs Customers Placing Orders

Description

Realising the importance of Reduce, Reuse and Recycle, the Bookworm club every year organises a Book Donation Camp. The Book Donation camp collects books and notebooks. The volunteers assess the condition of the books and categorise them as Fit, Needs mending, or Unfit. The unfit books’ pages are used to create paper bags and envelopes. The other categories of books are resold at half the price. They accept notebooks that have pages left in them. The pages are torn from the notebooks, and are attractively bound to create a new notebook and sold. They create a variety of recycled objects and sell them. They want to create a software for this purpose and store details about the camp. To be able to efficiently store, retrieve and visualise data, they need to implement the following using Pandas.

Specification

The details of collections are stored in a CSV file with column headings as Item category, Item ID, Item name, Item type, Condition.

If the Item Category is Book, then the Item Type can be either Academic or Non Academic, and Item Id shall be prefixed with a ‘B’. In case of Academic, class shall be entered.

If the Item Category is Notebook then the Item can be Single line, Four Line, Five Line, and Item Id shall be prefixed with an ‘N’.

Condition can only be Fit, Needs Mending or Unfit. After the items are refurbished, the data are stored in another CSV file containing the following column headings: Item id, Item name, Item Category, Quantity, Price. Item Category can be Paper bags, Notebook, Books. In case of books Class is also to be entered.

Another CSV file to store orders is created that stores Item Category, Item name, Quantity and Price. In case of an order, the refurbished CSV shall update the quantity.

To ensure effective decision making, it is required that different data are plotted using appropriate plots to show sales, items refurbished, and items collected.

1) Project idea and purpose

I am a volunteer in the Bookworm club. Every year we organise a Book Donation Camp to support Reduce, Reuse and Recycle. In this camp, we collect books and notebooks. Then volunteers check the condition of each book and categorise it as Fit, Needs mending, or Unfit. The unfit books’ pages are used to create paper bags and envelopes. The other books (Fit / Needs mending) are resold at half the price. For notebooks that still have usable pages, we tear the pages and attractively bind them to make a new notebook and sell it.

To store, retrieve, and visualise this data efficiently, we have to implement the work using Python Pandas.

2) What the book asks us to create (as a student)

According to the project specification, we need 3 CSV files:

A) Collections CSV (raw collection data)

Column headings must be: Item category, Item ID, Item name, Item type, Condition.

Rules:

•

If Item Category = Book:

•

Item Type = Academic or Non Academic

•

Item ID must start with “B”

•

If Academic, class shall be entered

•

If Item Category = Notebook:

•

Item type can be Single line / Four Line / Five Line

•

Item ID must start with “N”

•

Condition can only be Fit / Needs Mending / Unfit

B) Refurbished CSV (after refurbishing)

After refurbishment, store data in another CSV with headings:

Item id, Item name, Item Category, Quantity, Price.

•

Item Category here can be Paper bags, Notebook, Books

•

For books, Class is also to be entered

C) Orders CSV (orders placed by customers)

Another CSV for orders must store:

Item Category, Item name, Quantity, Price.

And when an order happens, the refurbished CSV shall update the quantity.

Plots/Diagrams (important!)

We must plot data using appropriate plots to show:

•

sales

•

items refurbished

•

items collected

3) My sample data (as a student)

File 1: collections.csv (sample)

Book

B001

Mathematics (Class 10)

Academic

Fit

Book

B002

English Literature

Non Academic

Needs mending

Book

B003

Science (Class 9)

Academic

Unfit

Notebook

N001

Classwork Notebook

Single line

Fit

Notebook

N002

Rough Notebook

Four Line

Fit

Notebook

N003

Project Notebook

Five Line

Needs mending

Book

B004

Social Science (Class 8)

Academic

Fit

Book

B005

Short Stories

Non Academic

Unfit

Note (student assumption): The “class shall be entered” for academic books, so I wrote it in the Item name like “(Class 10)”. (In a real project, we can also add a separate Class column.)

File 2: refurbished.csv (sample)

(After refurbishment: books for resale, rebound notebooks, and paper bags/envelopes from unfit pages)

B001

Mathematics (Class 10)

Books

120

B002

English Literature

Books

B004

Social Science (Class 8)

Books

100

N001

Rebound Single line Notebook

Notebook

N002

Rebound Four Line Notebook

Notebook

PB01

Paper Bags (from unfit pages)

Paper bags

ENV1

Envelopes (from unfit pages)

Paper bags

Note (student assumption): Prices are just sample values. For books, I assumed “half price” selling as mentioned in the description.

File 3: orders.csv (sample)

Books

Mathematics (Class 10)

120

Notebook

Rebound Single line Notebook

Paper bags

Paper Bags (from unfit pages)

Books

Social Science (Class 8)

100

Notebook

Rebound Four Line Notebook

Paper bags

Envelopes (from unfit pages)

4) Outputs I generated from the sample data

A) Items collected

Collected (count) by item category

•

Book = 5

•

Notebook = 3

Collected (count) by condition

•

Fit = 4

•

Needs mending = 2

•

Unfit = 2

B) Items refurbished (stock available)

Refurbished stock (total quantity) by category

•

Paper bags = 50

•

Notebook = 11

•

Books = 9

C) Sales summary (Revenue = Quantity × Price)

Sales (total revenue) by category

•

Books = ₹ 320

•

Notebook = ₹ 125

•

Paper bags = ₹ 55

5) Output diagrams (as required in Project 2)

As per the requirement, I plotted items collected, items refurbished, and sales using bar charts.

The diagrams I generated are:

Items Collected (Count) by Item Category

Items Refurbished (Total Quantity) by Category

Sales (Total Revenue) by Category

(Extra helpful plot) Collected Items by Condition (Fit / Needs mending / Unfit)

6) Python (Pandas) code I used

import pandas as pd
import matplotlib.pyplot as plt

# ---------- Step 1: Read CSV files ----------
collections = pd.read_csv("collections.csv")
refurbished = pd.read_csv("refurbished.csv")
orders = pd.read_csv("orders.csv")

# ---------- Step 2: Basic validation (as per specification) ----------
allowed_conditions = {"Fit", "Needs mending", "Needs Mending", "Unfit"}

# Condition validation
bad_conditions = collections[~collections["Condition"].isin(allowed_conditions)]
if not bad_conditions.empty:
    print("Invalid conditions found:\n", bad_conditions)

# ID prefix validation
bad_book_ids = collections[(collections["Item category"] == "Book") & (~collections["Item ID"].astype(str).str.startswith("B"))]
bad_notebook_ids = collections[(collections["Item category"] == "Notebook") & (~collections["Item ID"].astype(str).str.startswith("N"))]
if not bad_book_ids.empty:
    print("Invalid Book IDs:\n", bad_book_ids)
if not bad_notebook_ids.empty:
    print("Invalid Notebook IDs:\n", bad_notebook_ids)

# ---------- Step 3: Items collected (counts) ----------
collected_by_category = collections.groupby("Item category")["Item ID"].count()
collected_by_condition = collections.groupby("Condition")["Item ID"].count()

print("Collected by category:\n", collected_by_category)
print("Collected by condition:\n", collected_by_condition)

# ---------- Step 4: Items refurbished (quantity) ----------
refurbished_qty = refurbished.groupby("Item Category")["Quantity"].sum()
print("Refurbished quantity by category:\n", refurbished_qty)

# ---------- Step 5: Sales (revenue) ----------
orders["Revenue"] = orders["Quantity"] * orders["Price"]
sales_by_category = orders.groupby("Item Category")["Revenue"].sum()
print("Sales (revenue) by category:\n", sales_by_category)

# ---------- Step 6: Update refurbished stock after orders ----------
# We update by matching (Item Category + Item name)
# (In real software, Item id would be best for matching.)
stock = refurbished.copy()
stock_key = stock["Item Category"].astype(str) + "||" + stock["Item name"].astype(str)
order_key = orders["Item Category"].astype(str) + "||" + orders["Item name"].astype(str)

stock["key"] = stock_key
orders["key"] = order_key

ordered_qty = orders.groupby("key")["Quantity"].sum().reset_index()

# Merge ordered qty into stock and subtract
stock = stock.merge(ordered_qty, on="key", how="left", suffixes=("", "_ordered"))
stock["Quantity_ordered"] = stock["Quantity_ordered"].fillna(0).astype(int)
stock["Quantity"] = stock["Quantity"] - stock["Quantity_ordered"]

# Ensure quantity does not go negative
stock.loc[stock["Quantity"] < 0, "Quantity"] = 0

# Save updated stock
stock.drop(columns=["key", "Quantity_ordered"]).to_csv("refurbished_updated.csv", index=False)

# ---------- Step 7: Plots (as required) ----------
plt.figure()
collected_by_category.plot(kind="bar")
plt.title("Items Collected (Count) by Item Category")
plt.ylabel("Count")
plt.tight_layout()
plt.show()

plt.figure()
refurbished_qty.plot(kind="bar")
plt.title("Items Refurbished (Total Quantity) by Category")
plt.ylabel("Quantity")
plt.tight_layout()
plt.show()

plt.figure()
sales_by_category.plot(kind="bar")
plt.title("Sales (Total Revenue) by Category")
plt.ylabel("Revenue (₹)")
plt.tight_layout()
plt.show()

# Optional: by condition
plt.figure()
collected_by_condition.plot(kind="bar")
plt.title("Collected Items by Condition")
plt.ylabel("Count")
plt.tight_layout()
plt.show()

7) Conclusion

This project helps the Bookworm club run the Book Donation Camp in a systematic way. We can store collection data, refurbish items, take orders, and automatically update quantity after each order. Most importantly, the plots for sales, items refurbished, and items collected help in effective decision making for the next camp.

Bar Chart depicting the Books Donation Camp: Items Collected (Count) by Item Category

Bar Chart depicting the Books Donation Camp: Items Refurbished (Total Quantity) by Category

Bar Chart depicting the Books Donation Camp: Sales (Total Revenue) by Category

Bar Chart depicting the Books Donation Camp: Collected Items by Condition

Description

With the Internet revolution everyone today is now connected. Teenagers spend a good amount of time on social networking sites, and it plays a vital role in their behaviour. It is considered that excessive use of social networking sites has sometimes a serious impact on the mental health of individuals. A well-crafted survey questionnaire can help in exploring and finding many facts.

Specifications

Create a survey questionnaire using any of the freely available online tools (such as google forms) and store the responses in a CSV file.

Prepare some data analysis questions that you expect them to answer

Import the CSV file in Pandas DataFrame

Perform statistical computation such as mean, median, etc., with respect to the identified questions

Visualise the findings of the survey using appropriate charts.

1) Aim of the project

Because of the Internet revolution, most of us are connected all the time. Teenagers spend a good amount of time on social networking sites, and it can affect our behaviour. It is also considered that excessive use can sometimes have an impact on mental health. So, a well-crafted survey questionnaire can help us explore facts and patterns.

As per the project description, we must:

Create a survey questionnaire using an online tool (like Google Forms) and store responses in a CSV

Prepare data analysis questions

Import CSV into Pandas DataFrame

Do statistical computation like mean, median

Visualise findings using appropriate charts

2) Google Survey (Google Forms) details

Form Title

“Survey: Effect of Social Networking Sites on Behaviour of Teenagers”

Form Description (what I wrote in the form)

“This is a short survey to understand how social networking sites affect teenagers’ behaviour, study routine, and mental health. The survey is anonymous and only for academic project work.”

Settings I selected (recommended)

✅

Collect responses (default)

✅

Limit to 1 response (optional, if school emails exist)

✅

Edit after submit = OFF (to keep data consistent)

✅

Show progress bar = ON (optional)

✅

Required questions = ON for key questions

Sections (simple structure)

Section A:

Basic Information

Section B:

Social Media Usage

Section C:

Behaviour / Study / Well-being

Section D:

Optional Comments

Questions included (with types + options)

Section A: Basic Information

1. Age (Short answer, response validation = number 12–19)

2. Class (Dropdown: 8, 9, 10, 11, 12)

Section B: Social Networking Sites Usage

3. How much time do you spend daily on social networking sites? (Multiple choice)

•

0–1 hour

•

1–2 hours

•

2–3 hours

•

3–4 hours

•

4–5 hours

•

5+ hours

4. Which social networking site do you use the most? (Multiple choice)

•

WhatsApp / Instagram / YouTube / Snapchat / Facebook / X (Twitter) / Other

5. Main purpose of using social networking sites (Multiple choice)

•

Chatting, Entertainment, Learning, Study groups, News, Socialising, Other

Section C: Behaviour / Study / Mental health

6. Do you use social networking sites just before sleeping? (Yes/No)

7. How many hours do you sleep per night (average)? (Short answer, number 3–10)

8. Mood after using social networking sites (Linear scale 1–5)

•

1 = Very bad, 5 = Very good

9. Concentration while studying (Linear scale 1–5)

•

1 = Very low, 5 = Very high

10. How often do you feel anxiety/stress due to social networking? (Multiple choice)

•

Never, Rarely, Sometimes, Often

11. Physical activity per day (minutes) (Short answer, number 0–180)

Section D: Optional comments

12. One positive and one negative effect you have noticed (Paragraph)

How I got the CSV file from Google Forms

•

Open the form → Responses tab

•

Click Download responses (.csv) (or open in Sheets → File → Download → CSV)

3) Sample responses (data I used)

I created a sample dataset of 30 students (like a real class survey) and used it for Pandas analysis + charts (you can see the “Project III – Sample Survey Responses” table shown above in chat).

Main columns in my CSV/DataFrame:

•

Age, Class

•

DailyHours (numeric hours/day)

•

MainPlatform, Purpose

•

SleepHours

•

MoodScore (1–5), ConcentrationScore (1–5)

•

AnxietyFrequency (Never/Rarely/Sometimes/Often)

•

ScreenBeforeBed (Yes/No)

•

PhysicalActivityMin

4) Data analysis questions (what I expected to answer)

These are the “analysis questions” based on the survey (as required):

What is the average (mean) and median daily time spent on social networking sites?

Which platform is used the most?

Do students who spend more time tend to sleep less?

Compare mood and concentration between:

•

High usage group (≥ 3 hours/day)

•

Low usage group (< 3 hours/day)

How is anxiety/stress frequency distributed?

5) My results (from the sample data)

A) Mean / Median (statistical computation)

•

Mean DailyHours = 2.75 hours/day

•

Median DailyHours = 2.7 hours/day

•

Mean SleepHours = 6.98 hours/night

•

Median SleepHours = 7.0 hours/night

•

Mean MoodScore = 3.20 / 5

•

Mean ConcentrationScore = 3.03 / 5

B) Platform most used (counts)

•

Instagram: 10

•

WhatsApp: 9

•

YouTube: 8

•

Snapchat: 3

C) High vs Low usage comparison (mean values)

High (≥3h/day)

4.01

6.12

2.31

2.23

Low (<3h/day)

1.79

7.64

3.88

3.65

Observation (student conclusion): In this sample, the high-usage group sleeps less and has lower mood + concentration on average.

D) Anxiety frequency (counts)

•

Never: 7

•

Rarely: 7

•

Sometimes: 8

•

Often: 8

6) Charts/Graphs I generated (as required)

I generated and displayed these charts (using Pandas + Matplotlib):

Daily hours distribution (bar chart)

Most used platform (bar chart)

Daily hours vs sleep hours (scatter plot)

Mood score by usage group (bar chart of averages)

Anxiety frequency distribution (bar chart)

7) Python program used for analysis + graphs

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# -----------------------------
# Step 1: Read CSV into DataFrame
# -----------------------------
# Example filename exported from Google Forms:
df = pd.read_csv("social_media_survey_responses.csv")

# (Optional) If DailyHours is stored as ranges like "2–3 hours",
# convert it into numeric hours (approx midpoint) before analysis.

# -----------------------------
# Step 2: Statistical computations (mean, median, etc.)
# -----------------------------
print("Mean DailyHours:", df["DailyHours"].mean())
print("Median DailyHours:", df["DailyHours"].median())
print("Mean SleepHours:", df["SleepHours"].mean())
print("Median SleepHours:", df["SleepHours"].median())

# -----------------------------
# Step 3: Prepare analysis groups
# -----------------------------
df["UsageGroup"] = np.where(df["DailyHours"] >= 3, "High (>=3h/day)", "Low (<3h/day)")
group_stats = df.groupby("UsageGroup")[["DailyHours","SleepHours","MoodScore","ConcentrationScore"]].mean()
print("\nHigh vs Low usage comparison:\n", group_stats)

# -----------------------------
# Step 4: Visualise findings using charts
# -----------------------------

# Chart 1: Daily hours distribution (bins)
bins = [0, 1, 2, 3, 4, 5, 6]
labels = ["0-1", "1-2", "2-3", "3-4", "4-5", "5-6"]
df["HoursBin"] = pd.cut(df["DailyHours"], bins=bins, labels=labels, include_lowest=True, right=False)
hours_dist = df["HoursBin"].value_counts().reindex(labels).fillna(0)

plt.figure()
hours_dist.plot(kind="bar")
plt.title("Daily Time on Social Networking Sites (Hours/Day)")
plt.xlabel("Hours (binned)")
plt.ylabel("Number of students")
plt.tight_layout()
plt.show()

# Chart 2: Most used platform
plt.figure()
df["MainPlatform"].value_counts().plot(kind="bar")
plt.title("Most Used Social Networking Platform")
plt.xlabel("Platform")
plt.ylabel("Number of students")
plt.tight_layout()
plt.show()

# Chart 3: Daily hours vs sleep hours (scatter)
plt.figure()
plt.scatter(df["DailyHours"], df["SleepHours"])
plt.title("Daily Social Media Hours vs Sleep Hours")
plt.xlabel("Daily social media time (hours)")
plt.ylabel("Sleep per night (hours)")
plt.tight_layout()
plt.show()

# Chart 4: Mood by usage group
plt.figure()
df.groupby("UsageGroup")["MoodScore"].mean().plot(kind="bar")
plt.title("Average Mood Score by Usage Group")
plt.xlabel("Usage group")
plt.ylabel("Average Mood Score (1-5)")
plt.tight_layout()
plt.show()

# Chart 5: Anxiety frequency distribution
order = ["Never","Rarely","Sometimes","Often"]
plt.figure()
df["AnxietyFrequency"].value_counts().reindex(order).fillna(0).plot(kind="bar")
plt.title("Anxiety Frequency (Self-reported)")
plt.xlabel("Frequency")
plt.ylabel("Number of students")
plt.tight_layout()
plt.show()

The following is the content of the csv file (social_media_survey_responses.csv) used in the above code:

2.0

Chatting

7.5

Rarely

3.5

Instagram

Entertainment

6.5

Sometimes

Yes

1.0

YouTube

Learning

8.0

Never

4.5

Instagram

Socialising

6.0

Often

Yes

2.5

Study groups

7.0

Rarely

Yes

3.0

YouTube

Entertainment

6.5

Sometimes

Yes

5.0

Snapchat

Socialising

5.5

Often

Yes

1.5

Chatting

8.0

Never

2.0

Instagram

Entertainment

7.0

Sometimes

Yes

0.8

Chatting

8.5

Never

4.0

Instagram

Entertainment

6.0

Often

Yes

3.2

YouTube

Learning

6.8

Sometimes

Yes

1.2

YouTube

Learning

8.2

Never

2.8

Instagram

Socialising

7.0

Sometimes

Yes

3.8

Snapchat

Socialising

6.2

Often

Yes

2.2

Study groups

7.2

Rarely

1.7

YouTube

Learning

7.8

Rarely

4.8

Instagram

Entertainment

5.8

Often

Yes

2.1

Chatting

7.6

Rarely

3.6

Instagram

Socialising

6.1

Sometimes

Yes

1.0

YouTube

Learning

8.0

Never

2.9

Chatting

7.0

Sometimes

Yes

0.5

Chatting

8.7

Never

5.5

Instagram

Entertainment

5.2

Often

Yes

3.1

Snapchat

Socialising

6.6

Sometimes

Yes

2.3

YouTube

Entertainment

7.2

Rarely

Yes

1.4

Study groups

8.1

Never

4.2

Instagram

Socialising

6.0

Often

Yes

2.6

YouTube

Learning

7.1

Rarely

3.9

Instagram

Entertainment

6.3

Often

Yes

Bar Chart depicting the Survey: Daily Time on Social Networking Sites (Hours/Day)

Bar Chart depicting the Survey: Most used Social Networking Platform

Survey: Daily Social Media Hours vs Sleep Hours

Bar Chart depicting the Survey: Average Mood Score by Usage Group

Bar Chart depicting the Survey: Anxiety Frequency (Self-reported)

Description

Open Government Data (OGD) Platform India www.data.gov.in is a platform for supporting open data initiative of Government of India. From this platform, let us consider the dataset “Special Tabulation on Adolescent and youth population classified by various parameters for India, States and Union Territories, 2011”. The dataset was contributed by the Ministry of Home Affairs, Government of India, and released under National Data Sharing and Accessibility Policy (NDSAP). The dataset was published on portal on 07/09/2015.

Statistics of the Data Set:

Number of rows: 12168

Number of columns: 123

Descriptions of some of the columns are given below:

State: Serial numbers given to states

Area Name: Name of the states and union territories

Total/Rural/Urban: Data about the total, rural or urban areas of a state or UT.

Adolescent and youth: Data for different age groups

Total Male: Total number of males

Total Female: Total number of females

SC-M: Total number of males of Scheduled Castes(SC)

SC-F: Total number of females of Scheduled Castes(SC)

ST-M: total number of males of Scheduled Tribes(ST)

ST-F: total number of females of Scheduled Tribes(ST)

Literates-M: total number of literate males

Literates-F: total number of literate females

LiteratesSC-M: total number of literate males of Scheduled Castes(SC)

LiteratesSC-F: total number of literate females of Scheduled Castes(SC)

LiteratesST-M: total number of literate males of Scheduled Tribes(ST)

LiteratesST-F: total number of literate females of Scheduled Tribes(ST)

Illiterates-M: total number of illiterate males

Illiterates-F: total number of illiterate females

IlliteratesSC-M: total number of illiterate males of Scheduled Castes(SC)

IlliteratesSC-F: total number of illiterate females of Scheduled Castes(SC)

IlliteratesST-M: total number of illiterate males of Scheduled Tribes(ST)

IlliteratesST-F: total number of illiterate females of Scheduled Tribes(ST)

MainWorker-M:total number of main worker males

MainWorker-F:total number of main worker females

MainWorkerSC-M: total number of main worker males of Scheduled Castes(SC)

MainWorkerSC-F: total number of main worker females of Scheduled Castes(SC)

MainWorkerST-M: total number of main worker males of Scheduled Tribes(ST)

MainWorkerST-F: total number of main worker females of Scheduled Tribes(ST)

MarginalWorker-M: total number of marginal worker males

MarginalWorker-F: total number of marginal worker females

MarginalWorkerSC-M: total number of marginal worker males of Scheduled Castes(SC)

MarginalWorkerSC-F: total number of marginal worker females of Scheduled Castes(SC)

MarginalWorkerST-M: total number of marginal worker males of Scheduled Tribes(ST)

MarginalWorkerST-F: total number of marginal worker females of Scheduled Tribes(ST)

Specifications

On such a large dataset, various types of questions can be answered by doing different analysis of data. Following is a list of some of the possible queries that can be answered by analysing the dataset:

What is the total population, total male population and total female population aged 10 to 24 in India?

Which State or Union Territory in India has the maximum number of illiterates in the youth ages?

What is the percentage of people working as a marginal worker?

List the top 5 states or union territories which have the maximum population working as a marginal worker.

Compare the sex ratio of urban areas and rural areas using appropriate graph.

Which state has the highest and the lowest percentage of literate Scheduled Tribes and Scheduled Castes?

For each state, compare the no. of female marginal workers with no. of male marginal workers. Use appropriate graphs.

What percentage of Scheduled Tribes lives in urban areas? Draw a pie chart showing the proportion of literate and illiterates scheduled tribes living in urban areas.

What is the state wise ratio of literates vs. illiterates in all age groups?

10.

Which state is home to the maximum no. of ST in India? Which state has the minimum no. of ST in India?

11.

For each state, find the no. of literate females and no. of literate males. Draw a bar graph for the same.
Which state has the highest ratio of literate female vs literate male and which state has the minimum?

A project work can be carried out by taking any 4–5 of the above questions and any other similar questions, and solving them step-by-step, with detailed explanation and documentation. As an example, in the following pages, we will solve the first question. This will give us an idea about how the other questions are to be answered.

Task 1: What is the total population, total male population and total female population aged 10 to 24 in India?

Solution:

Prerequisite: we need to first download the CSV file through the QR code given at the beginning of this chapter.

Step 1:

Read the CSV file in a DataFrame

Step 2:

Check the shape of the DataFrame

Step 3:

View the columns

Step 4:

Filter data

Identify the columns that you wants to use for plotting

Identify the number of rows required for plotting

Step 5:

Create a new DataFrame containing the filtered data

Step 6:

Rename the columns for ease of use

Step 7:

Group data as per the requirement

Step 8:

Plot data as a barchart for the DataFrame obtained in Step 7.

Task 1: What is the total population, total male population and total female population aged 10 to 24 in India?

Step 0: Import required libraries.

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the CSV file in a DataFrame

Read the csv file in a DataFrame.

# add correct path on your computer
data = pd.read_csv("PCA_AY_2011_Revised.csv")
df = pd.DataFrame(data)

Step 2: Check the shape of the DataFrame

Check the shape of the DataFrame.

print(df.shape)

Output:

12168 rows and 123 columns.

Step 3: Display the columns.

View the coloumns

print(df.columns.values)

Output:

['Table No.'
'State Code'
'District Code'
'Area Name'
'Total/ Rural/ Urban'
'Adolescent and youth categories'
'Total Population - Persons'
'Total Population - Males'
'Total Population - Females'
'Scheduled Caste - Persons'
'Scheduled Caste - Males'
........................
........................
........................
'Scheduled Tribe Marginal Worker - Household Industry - Males'
'Scheduled Tribe Marginal Worker - Household Industry - Females'
'Scheduled Tribe Marginal Worker - Other Workers - Persons'
'Scheduled Tribe Marginal Worker - Other Workers - Males'
'Scheduled Tribe Marginal Worker - Other Workers - Females']

Note: Output contains 123 column names. Only few are shown here.

Step 4: Filter data

4 (a) Identify columns required for plotting: We consider only these columns:

•

Area Name

•

Total/ Rural/ Urban

•

Adolescent and youth categories

•

Total Population - Persons (and related male/female columns are used later)

4 (b) Rows Choice: We check values in Area Name to decide filtering (example output shows repeated INDIA and then district names).

Code:

print(df['Area Name'])

Output:

0            INDIA
1            INDIA
2            INDIA
3            INDIA
4            INDIA
...................
...................
...................
12163       District - South Andaman (03)
12164       District - South Andaman (03)
12165       District - South Andaman (03)
12166       District - South Andaman (03)
12167       District - South Andaman (03)
Name: Area Name, Length: 12168, dtype: object

Step 5: Create a new DataFrame containing the filtered data.

Suppose, we want to consider data for ‘Area Name’ = ‘INDIA’ only, Therefore, we shall create a new DataFrame df1 containing only the filtered data, using the following syntax:

df.loc[row selection, column selection]

df1 = df.loc[(df["Area Name"] == "INDIA"),
    "Area Name": "Total Population - Females"]

Step 6: Rename the columns for ease of use

Column names are too long, so rename them.

Code:

df1.columns = ["Area",
    "Class",
    "Category",
    "TotalPop",
    "MalePop",
    "FemalePop"]

Step 7: Group data as per the requirement

We want TotalPop, MalePop, FemalePop with respect to Category.

But Category has 6 types (like 10-14, 15-19, 20-24, etc.), so we group by Category and sum. Then we keep only 10-14, 15-19, 20-24 and remove the categories ‘Adolescent (10-19)’, ‘All Ages’, ‘Youth (15-24)’

Code:

d = df1.groupby("Category").sum(numeric_only=True)

# keep only the age groups needed for 10–24
d = d.drop(["Adolescent (10-19)", "All Ages", "Youth (15-24)"], axis=0)

The following data is obtained after Step 7 and this is the data used for graph.

The following is the final grouped data for the required age categories:

| Category |    TotalPop |     MalePop |   FemalePop |
| -------- | ----------- | ----------- | ----------- |
| 10-14    | 26,54,18,424 | 13,88,37,670 | 12,65,80,754 |
| 15-19    | 24,10,52,898 | 12,79,64,792 | 11,30,88,106 |
| 20-24    | 22,28,48,444 | 11,51,69,386 | 10,76,79,058 |

Step 8: Plot data as a barchart

Plot the grouped DataFrame from Step 7 as a bar chart.

Code:

d.plot(kind="bar")
plt.show()

Python program used (complete, clean version)

All the above steps are compiled into a single pythong program below:

import pandas as pd
import matplotlib.pyplot as plt

# Step 1
data = pd.read_csv("PCA_AY_2011_Revised.csv")   # add correct path on your computer
df = pd.DataFrame(data)

# Step 2
print(df.shape)

# Step 3
print(df.columns.values)

# Step 4 (understanding data)
print(df["Area Name"])

# Step 5 (filter India rows + required columns)
df1 = df.loc[(df["Area Name"] == "INDIA"), "Area Name":"Total Population - Females"]

# Step 6 (rename)
df1.columns = ["Area", "Class", "Category", "TotalPop", "MalePop", "FemalePop"]

# Step 7 (group and keep only 10–24 categories)
d = df1.groupby("Category").sum(numeric_only=True)
d = d.drop(["Adolescent (10-19)", "All Ages", "Youth (15-24)"], axis=0)

print(d)

# Step 8 (plot)
d.plot(kind="bar")
plt.xlabel("Adolescent and youth categories")
plt.ylabel("Population (in crores)")
plt.title("Population (10–24 years) in India: Total, Male, Female")
plt.tight_layout()
plt.show()

Bar Chart depicting Population (10–24 years) in India: Total, Male, Female

Task 2: Which State or Union Territory in India has the maximum number of illiterates in the youth ages?

Step 0: Import required libraries

Code:

import pandas as pd

Step 1: Read the data file into a DataFrame

Code:

# add correct path on your computer
df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter only the rows needed for this question

What we need (logic):

•

Only State/UT totals → District Code == 0

•

Only Total (not rural/urban split) → Total/ Rural/ Urban == “Total”

•

Only Youth (15-24) → Adolescent and youth categories == “Youth (15-24)”

•

Exclude overall INDIA total → Area Name != “INDIA”

Code:

df1 = df.loc[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA"),
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "Youth (15-24)")
].copy()

Step 3: Compute “Illiterates” for Youth (15–24)

Illiterates (Persons) = Total Population – Persons − Literates – Total – Persons

Code:

df1["Illiterates - Youth - Persons"] = (
    df1["Total Population - Persons"] - df1["Literates - Total - Persons"]
)

Step 4: Find the State/UT with the maximum illiterates

Code:

state_ill = df1.groupby("Area Name")["Illiterates - Youth - Persons"].sum().sort_values(ascending=False)
max_state = state_ill.index[0]
max_value = int(state_ill.iloc[0])

max_state, max_value

Final Answer (from your file)

✅

State/UT with maximum illiterates (Youth 15–24): State – UTTAR PRADESH (09)

✅

Number of illiterates (persons): 7,484,152

Task 3: What is the percentage of people working as a marginal worker?

Step 0: Import required libraries:

Code:

import pandas as pd

Step 1: Step 1: Read the data file in a DataFrame

Code:

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Check the shape of the DataFrame

Code:

print(df.shape)

Output:

12168 rows × 123 columns

Step 3: View the columns

View the columns to identify what we need.

Step result (analysis)

For marginal worker, we will use these columns (they exist in the dataset and match the “MarginalWorker” concept listed in the chapter).

•

Area Name

•

Total/ Rural/ Urban

•

Adolescent and youth categories

•

Main Worker – Total – Persons

•

Marginal Worker – Total – Persons

Step 4: Filter data (choose correct rows)

Step explanation

Filter the data by selecting required columns and required rows.

Step result (analysis + code)

Because the question asks for “people” (overall), we take INDIA, Total, All Ages.

df_india = df[
    (df["Area Name"] == "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Main Worker - Total - Persons", "Marginal Worker - Total - Persons"]].copy()

Step 5–7: Compute the required percentage (final calculation)

Step explanation

After filtering, compute the required result from the selected data. (In Task 1 the book groups/sums; here we just compute the percentage from the INDIA totals.)

Step result (calculation + answer)

Data extracted (INDIA, Total, All Ages):

•

Main workers (Persons) = 36,25,65,571

•

Marginal workers (Persons) = 11,93,23,297

Now, “percentage of people working as a marginal worker” (most meaningful in this context) =

\dfrac{\text{Marginal Workers}}{\text{Main Workers} + \text{Marginal Workers}} × 100

main_workers = int(df_india.iloc[0]["Main Worker - Total - Persons"])
marg_workers = int(df_india.iloc[0]["Marginal Worker - Total - Persons"])
total_workers = main_workers + marg_workers

marginal_worker_percent = (marg_workers / total_workers) * 100
print(round(marginal_worker_percent, 2))

✅ Final Answer (INDIA):

Percentage of people (workers) working as marginal workers = 24.76%

import pandas as pd

df = pd.read_csv("PCA_AY_2011_Revised.csv")

# Filter INDIA, Total, All Ages
df_india = df[
    (df["Area Name"] == "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Main Worker - Total - Persons", "Marginal Worker - Total - Persons"]].copy()

main_workers = int(df_india.iloc[0]["Main Worker - Total - Persons"])
marg_workers = int(df_india.iloc[0]["Marginal Worker - Total - Persons"])
total_workers = main_workers + marg_workers

marginal_worker_percent = (marg_workers / total_workers) * 100

print("Main workers (Persons):", main_workers)
print("Marginal workers (Persons):", marg_workers)
print("Total workers (Persons):", total_workers)
print("Marginal worker percentage:", round(marginal_worker_percent, 2), "%")

Task 4: List the top 5 states or union territories which have the maximum population working as a marginal worker.

Step 0: Import required libraries

Code:

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the dataset into a DataFrame

Step explanation

Read the file in a DataFrame.

Code:

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter data (only what’s needed for Task 4)

Step explanation

Filter the data by choosing the correct rows/columns required for the question.

Step result (analysis + code)

To find “maximum population working as marginal worker” across states/UTs, we take:

•

State/UT totals only → District Code == 0

•

Exclude INDIA (because we want states/UTs) → Area Name != "INDIA"

•

Total (not rural/urban split) → Total/ Rural/ Urban == "Total"

•

Overall population → Adolescent and youth categories == "All Ages"

•

Required column → Marginal Worker - Total - Persons

df_state = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Marginal Worker - Total - Persons"]].copy()

Step 3: Sort and select Top 5 (final list)

Step explanation

Group/arrange the data as per the requirement. (Here we just sort to get top 5.)

Step result (answer)

✅ Top 5 States/UTs by Marginal Workers (Persons):

•

State – UTTAR PRADESH (09) — 2,11,79,223

•

State – BIHAR (10) — 1,33,65,376

•

State – WEST BENGAL (19) — 90,69,725

•

State – MADHYA PRADESH (23) — 88,72,014

•

State – RAJASTHAN (08) — 88,28,287

Step 4: Plot a graph (useful for comparison)

Step explanation

Plot the required output using an appropriate graph.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("PCA_AY_2011_Revised.csv")

df_state = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Marginal Worker - Total - Persons"]].copy()

top5 = df_state.sort_values("Marginal Worker - Total - Persons", ascending=False).head(5)

print("Top 5 States/UTs by Marginal Workers (Persons):")
print(top5)

# Bar chart (helpful visual)
plt.figure()
plt.bar(top5["Area Name"], top5["Marginal Worker - Total - Persons"])
plt.title("Top 5 States/UTs: Marginal Workers (All Ages, Total)")
plt.xlabel("State / Union Territory")
plt.ylabel("Marginal Workers (Persons)")
plt.xticks(rotation=60, ha="right")
plt.tight_layout()
plt.show()

Bar Chart depicting the Top 5 States/UTs: Population Working as Marginal Workers (All Ages, Total)

Task 5: Compare the sex ratio of urban areas and rural areas using appropriate graph.

Sex Ratio formula used

Sex Ratio

{= \dfrac{\text{Male Population}}{\text{Female Population}} × 1000}

(= females per 1000 males)

Step 0: Import required libraries

Code:

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the dataset in a DataFrame

Code:

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter only the required rows and columns

What we need (logic)

To compare Urban vs Rural sex ratio (India), we take:

•

Area Name = INDIA

•

Adolescent and youth categories = All Ages

•

Total/ Rural/ Urban = Rural and Urban

•

Columns used: Total Population – Males, Total Population – Females

Code:

df_ru = df[
    (df["Area Name"] == "INDIA") &
    (df["Adolescent and youth categories"] == "All Ages") &
    (df["Total/ Rural/ Urban"].isin(["Rural", "Urban"]))
][["Total/ Rural/ Urban", "Total Population - Males", "Total Population - Females"]].copy()

df_ru.columns = ["Class", "MalePop", "FemalePop"]

Step 3: Compute sex ratio for Rural and Urban

Result (calculation + output data points):

g = df_ru.groupby("Class")[["MalePop", "FemalePop"]].sum()
g["SexRatio"] = (g["FemalePop"] / g["MalePop"]) * 1000
result = g.reset_index()
print(result)

Data points used for comparison

Rural

42,77,81,058

40,59,67,794

949.01

Urban

19,54,89,200

18,16,16,925

929.04

✅ Conclusion (from your dataset):

•

Rural sex ratio: 949.01

•

Urban sex ratio: 929.04

Step 4: Plot the comparison graph (required by the question)

Code:

plt.figure()
plt.bar(result["Class"], result["SexRatio"])
plt.title("Sex Ratio (Females per 1000 Males): Urban vs Rural (INDIA, All Ages)")
plt.xlabel("Area type")
plt.ylabel("Sex Ratio")
plt.tight_layout()
plt.show()

Here is the consolidated code:

# Project IV - Task 5 (Consolidated Code)
# Compare sex ratio of Urban vs Rural areas (INDIA) using a graph

import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset (use the .xlsx-converted version for reliable reading)
df = pd.read_csv("PCA_AY_2011_Revised.csv")

# Filter: INDIA + All Ages + Rural/Urban only
df_ru = df[
    (df["Area Name"] == "INDIA") &
    (df["Adolescent and youth categories"] == "All Ages") &
    (df["Total/ Rural/ Urban"].isin(["Rural", "Urban"]))
][["Total/ Rural/ Urban", "Total Population - Males", "Total Population - Females"]].copy()

# Rename columns for ease
df_ru.columns = ["Class", "MalePop", "FemalePop"]

# Group (safety) and compute Sex Ratio = (Female/Male)*1000
g = df_ru.groupby("Class")[["MalePop", "FemalePop"]].sum()
g["SexRatio"] = (g["FemalePop"] / g["MalePop"]) * 1000

# Final data points
result = g.reset_index()
print("Data points used (Urban vs Rural):")
print(result)

# Plot (required by the question)
plt.figure()
plt.bar(result["Class"], result["SexRatio"])
plt.title("Sex Ratio (Females per 1000 Males): Urban vs Rural (INDIA, All Ages)")
plt.xlabel("Area type")
plt.ylabel("Sex Ratio")
plt.tight_layout()
plt.show()

Bar Chart depicting the Sex Ratio (Females per 1000 Males): Urban vs Rural (INDIA, All Ages)

Task 6: Which state has the highest and the lowest percentage of literate Scheduled Tribes and Scheduled Castes?

Step 0: Import required libraries

Step explanation

Import the required libraries.

Code:

import pandas as pd
import numpy as np

Step 1: Read the dataset into a DataFrame

Step explanation

Read the file into a DataFrame.

Code:

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter the required rows (State/UT + Total + All Ages)

Step explanation

Filter data by selecting the correct rows and columns needed for the query.

Step result (logic + code)

For this question we need:

•

State/UT totals only → District Code == 0

•

Exclude INDIA → Area Name != “INDIA”

•

Total (not Rural/Urban separately) → Total/ Rural/ Urban == “Total”

•

All Ages → Adolescent and youth categories == “All Ages”

And these columns:

•

Scheduled Tribe – Persons, Literates – Scheduled Tribe – Persons

•

Scheduled Caste – Persons, Literates – Scheduled Caste – Persons

state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][[
    "Area Name",
    "Scheduled Tribe - Persons",
    "Literates - Scheduled Tribe - Persons",
    "Scheduled Caste - Persons",
    "Literates - Scheduled Caste - Persons"
]].copy()

Step 3: Compute literacy percentage for ST and SC

Step explanation

After filtering, compute the required values from the selected columns.

Step result (formula + code)

\text{Percentage literate} = \left(\dfrac{\text{Literate persons}}{\text{Total persons}}\right) \times 100

state_df["ST_Literate_Pct"] = np.where(
    state_df["Scheduled Tribe - Persons"] > 0,
    (state_df["Literates - Scheduled Tribe - Persons"] / state_df["Scheduled Tribe - Persons"]) * 100,
    np.nan
)

state_df["SC_Literate_Pct"] = np.where(
    state_df["Scheduled Caste - Persons"] > 0,
    (state_df["Literates - Scheduled Caste - Persons"] / state_df["Scheduled Caste - Persons"]) * 100,
    np.nan
)

(If a state/UT has 0 ST or 0 SC population, the percentage is not defined, so we skip it.)

Step 4: Find the highest and lowest state/UT (final answer)

Step explanation

Arrange results to identify the maximum and minimum values.

Step result (code)

st_valid = state_df.dropna(subset=["ST_Literate_Pct"])
sc_valid = state_df.dropna(subset=["SC_Literate_Pct"])

st_high = st_valid.loc[st_valid["ST_Literate_Pct"].idxmax()]
st_low  = st_valid.loc[st_valid["ST_Literate_Pct"].idxmin()]

sc_high = sc_valid.loc[sc_valid["SC_Literate_Pct"].idxmax()]
sc_low  = sc_valid.loc[sc_valid["SC_Literate_Pct"].idxmin()]

Final Answer (from our dataset)

Scheduled Tribes (ST) – Literate %

•

Highest: State – LAKSHADWEEP (31) → 81.11%

(Total ST persons = 61,120; Literate ST persons = 49,576)

•

Lowest: State – JAMMU & KASHMIR (01) → 40.29%

(Total ST persons = 14,93,299; Literate ST persons = 6,01,703)

Scheduled Castes (SC) – Literate %

•

Highest: State – DAMAN & DIU (25) → 83.33%

(Total SC persons = 6,124; Literate SC persons = 5,103)

•

Lowest: State – BIHAR (10) → 38.42%

(Total SC persons = 1,65,67,325; Literate SC persons = 63,65,150)

Consolidated code (Task 6 in one place)

import pandas as pd
import numpy as np

df = pd.read_csv("PCA_AY_2011_Revised.csv")

state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][[
    "Area Name",
    "Scheduled Tribe - Persons",
    "Literates - Scheduled Tribe - Persons",
    "Scheduled Caste - Persons",
    "Literates - Scheduled Caste - Persons"
]].copy()

state_df["ST_Literate_Pct"] = np.where(
    state_df["Scheduled Tribe - Persons"] > 0,
    (state_df["Literates - Scheduled Tribe - Persons"] / state_df["Scheduled Tribe - Persons"]) * 100,
    np.nan
)

state_df["SC_Literate_Pct"] = np.where(
    state_df["Scheduled Caste - Persons"] > 0,
    (state_df["Literates - Scheduled Caste - Persons"] / state_df["Scheduled Caste - Persons"]) * 100,
    np.nan
)

st_valid = state_df.dropna(subset=["ST_Literate_Pct"])
sc_valid = state_df.dropna(subset=["SC_Literate_Pct"])

st_high = st_valid.loc[st_valid["ST_Literate_Pct"].idxmax()]
st_low  = st_valid.loc[st_valid["ST_Literate_Pct"].idxmin()]
sc_high = sc_valid.loc[sc_valid["SC_Literate_Pct"].idxmax()]
sc_low  = sc_valid.loc[sc_valid["SC_Literate_Pct"].idxmin()]

print("ST Highest:", st_high["Area Name"], round(st_high["ST_Literate_Pct"], 2))
print("ST Lowest :", st_low["Area Name"], round(st_low["ST_Literate_Pct"], 2))
print("SC Highest:", sc_high["Area Name"], round(sc_high["SC_Literate_Pct"], 2))
print("SC Lowest :", sc_low["Area Name"], round(sc_low["SC_Literate_Pct"], 2))

Task 7: For each state, compare the no. of female marginal workers with no. of male marginal workers. Use appropriate graphs.

Step 0: Import required libraries

Result (code):

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the dataset into a DataFrame

Result (action/code):

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter the required rows (State/UT + Total + All Ages)

Result (logic + code):

To compare for each State/UT, we take:

•

State/UT totals: District Code == 0

•

Exclude INDIA total: Area Name != “INDIA”

•

Overall totals: Total/ Rural/ Urban == “Total”

•

All ages: Adolescent and youth categories == “All Ages”

state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][[
    "Area Name",
    "Marginal Worker - Total - Males",
    "Marginal Worker - Total - Females"
]].copy()

Step 3: Arrange data “state-wise” for comparison

Result (analysis):

I prepared the full state-wise table (this is the data used for the graph).

Quick view (Top 5 by total marginal workers)

(These 5 rows appeared in the output right after the chart.)

•

State – UTTAR PRADESH (09): Male 1,24,26,463, Female 87,52,760

•

State – BIHAR (10): Male 79,51,499, Female 54,13,877

•

State – WEST BENGAL (19): Male 50,37,768, Female 40,31,957

•

State – MADHYA PRADESH (23): Male 37,84,905, Female 50,87,109

•

State – RAJASTHAN (08): Male 30,53,539, Female 57,74,748

Step 4: Plot the comparison graph (required)

Result: ✅ I generated and displayed the graph in the chat:

•

Stacked bar chart for all States/UTs (Male + Female marginal workers per state)

This is appropriate because it lets us compare male and female marginal workers across many states in one view, as asked.

Consolidated Python code (Task 7 in one place)

import pandas as pd
import matplotlib.pyplot as plt

# Read dataset (converted from the uploaded .xls)
df = pd.read_csv("PCA_AY_2011_Revised.csv")

# Filter: State/UT totals, Total, All Ages; exclude INDIA
state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][[
    "Area Name",
    "Marginal Worker - Total - Males",
    "Marginal Worker - Total - Females"
]].copy()

state_df = state_df.rename(columns={
    "Area Name": "State/UT",
    "Marginal Worker - Total - Males": "MaleMarginalWorkers",
    "Marginal Worker - Total - Females": "FemaleMarginalWorkers"
})

# Sort for cleaner plotting
state_df["TotalMarginalWorkers"] = state_df["MaleMarginalWorkers"] + state_df["FemaleMarginalWorkers"]
state_df = state_df.sort_values("TotalMarginalWorkers", ascending=False).reset_index(drop=True)

# Plot (stacked bar, good for many states/UTs)
plt.figure(figsize=(16, 7))
x = range(len(state_df))
plt.bar(x, state_df["MaleMarginalWorkers"], label="Male marginal workers")
plt.bar(
    x,
    state_df["FemaleMarginalWorkers"],
    bottom=state_df["MaleMarginalWorkers"],
    label="Female marginal workers"
)

plt.title("State/UT-wise Marginal Workers: Male vs Female (All Ages, Total)")
plt.xlabel("State / Union Territory")
plt.ylabel("Number of marginal workers (persons)")
plt.xticks(list(x), state_df["State/UT"], rotation=90)
plt.tight_layout()
plt.show()

# Optional: print top 10 for quick discussion
print(state_df.head(10)[["State/UT","MaleMarginalWorkers","FemaleMarginalWorkers","TotalMarginalWorkers"]])

The following is the data that will be ultimately used by the graph:

State – UTTAR PRADESH (09)

1,24,26,463

87,52,760

2,11,79,223

State – BIHAR (10)

79,51,499

54,13,877

1,33,65,376

State – WEST BENGAL (19)

50,37,768

40,31,957

90,69,725

State – MADHYA PRADESH (23)

37,84,905

50,87,109

88,72,014

State – RAJASTHAN (08)

30,53,539

57,74,748

88,28,287

State – ODISHA (21)

31,08,242

37,25,804

68,34,046

State – ANDHRA PRADESH (28)

27,25,514

36,60,014

63,85,528

State – JHARKHAND (20)

31,90,327

30,89,352

6,279,679

State – MAHARASHTRA (27)

26,27,561

30,37,427

56,64,988

State – TAMIL NADU (33)

24,73,784

24,68,716

49,42,500

State – KARNATAKA (29)

19,20,279

25,55,137

44,75,416

State – GUJARAT (24)

14,33,219

29,69,154

44,02,373

State – CHHATTISGARH (22)

15,36,412

24,02,099

39,38,511

State – ASSAM (18)

15,06,918

17,75,649

32,82,567

State – KERALA (32)

12,71,741

10,17,575

22,89,316

State – HARYANA (06)

9,46,036

9,55,189

19,01,225

State – JAMMU & KASHMIR (01)

8,89,302

7,89,262

16,78,564

State – HIMACHAL PRADESH (02)

6,04,384

8,92,537

14,96,921

State – PUNJAB (03)

8,09,526

6,36,900

14,46,426

State – UTTARAKHAND (05)

4,81,161

5,20,490

10,01,651

State – TRIPURA (16)

1,57,445

2,35,057

3,92,502

State – MANIPUR (14)

1,24,273

2,06,174

3,30,447

State – NCT OF DELHI (07)

1,99,316

80,404

2,79,720

State – MEGHALAYA (17)

1,18,189

1,45,855

2,64,044

State – NAGALAND (13)

1,05,153

1,27,790

2,32,943

State – ARUNACHAL PRADESH (12)

49,164

59,772

1,08,936

State – GOA (30)

62,569

38,626

1,01,195

State – SIKKIM (11)

33,845

43,896

77,741

State – MIZORAM (15)

27,435

44,240

71,675

State – PUDUCHERRY (34)

26,522

18,757

45,279

State – DADRA & NAGAR HAVELI (26)

10,168

16,694

26,862

State – ANDAMAN & NICOBAR ISLANDS (35)

17,270

9,355

26,625

State – CHANDIGARH (04)

10,969

7,238

18,207

State – LAKSHADWEEP (31)

6,181

1,768

7,949

State – DAMAN & DIU (25)

2,820

2,016

4,836

Bar Chart depicting the State/UT-wise Marginal Workers: Male vs Female (All Ages, Total)

Task 8: What percentage of Scheduled Tribes lives in urban areas? Draw a pie chart showing the proportion of literate and illiterates scheduled tribes living in urban areas.

Step 0: Import required libraries

Result (code):

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the dataset in a DataFrame

Result (code):

df = pd.read_excel("PCA_AY_2011_Revised.csv")

Step 2: Filter the required rows (INDIA, All Ages, Total + Urban)

We need:

•

Area Name = INDIA

•

Adolescent and youth categories = All Ages

•

Total/ Rural/ Urban = Total and Urban

•

Columns needed: Scheduled Tribe – Persons, Literates – Scheduled Tribe – Persons

Result (code):

df_india = df[
    (df["Area Name"] == "INDIA") &
    (df["Adolescent and youth categories"] == "All Ages") &
    (df["Total/ Rural/ Urban"].isin(["Total", "Urban"]))
][[
    "Total/ Rural/ Urban",
    "Scheduled Tribe - Persons",
    "Literates - Scheduled Tribe - Persons"
]].copy()

df_india.columns = ["Class", "ST_Persons", "ST_Literate"]
df_india["ST_Illiterate"] = df_india["ST_Persons"] - df_india["ST_Literate"]

Step 3: Calculate the percentage of Scheduled Tribes living in urban areas

Formula:

{\text{\% ST in Urban} = \dfrac{\text{Urban ST Persons}}{\text{Total ST Persons}} × 100}

Data (from our file):

•

Total ST Persons (INDIA, Total, All Ages) = 10,45,45,716

•

Urban ST Persons (INDIA, Urban, All Ages) = 1,04,61,872

Result:

{\text{\%} = \dfrac{1,04,61,872}{10,45,45,716} × 100}

✅ Answer (Part 1): 10.01% of Scheduled Tribes live in urban areas.

Step 4: Pie chart for Literate vs Illiterate ST living in urban areas

Task says: Draw a pie chart showing the proportion of literate and illiterates ST living in urban areas.

Urban ST (INDIA, Urban, All Ages):

•

Literate ST (Urban) = 70,03,778

•

Illiterate ST (Urban) = 34,58,094

So the pie chart proportions are:

•

Literate: 66.9%

•

Illiterate: 33.1%

✅ Pie chart has been generated and shown in the chat above.

Result (code):

urban_lit = 7003778
urban_illit = 3458094

plt.figure()
plt.pie(
    [urban_lit, urban_illit],
    labels=["Literate ST (Urban)", "Illiterate ST (Urban)"],
    autopct="%1.1f%%"
)
plt.title("Scheduled Tribes in Urban Areas (INDIA, All Ages): Literate vs Illiterate")
plt.tight_layout()
plt.show()

Consolidated Python code (Task 8 in one place)

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("PCA_AY_2011_Revised.csv")

# Filter: INDIA, All Ages, Total + Urban
df_india = df[
    (df["Area Name"] == "INDIA") &
    (df["Adolescent and youth categories"] == "All Ages") &
    (df["Total/ Rural/ Urban"].isin(["Total", "Urban"]))
][[
    "Total/ Rural/ Urban",
    "Scheduled Tribe - Persons",
    "Literates - Scheduled Tribe - Persons"
]].copy()

df_india.columns = ["Class", "ST_Persons", "ST_Literate"]
df_india["ST_Illiterate"] = df_india["ST_Persons"] - df_india["ST_Literate"]

pivot = df_india.set_index("Class")

# Part 1: % ST living in Urban areas
st_total = float(pivot.loc["Total", "ST_Persons"])
st_urban = float(pivot.loc["Urban", "ST_Persons"])
urban_pct = (st_urban / st_total) * 100

print("Total ST (INDIA):", int(st_total))
print("Urban ST (INDIA):", int(st_urban))
print("% ST living in Urban areas:", round(urban_pct, 2), "%")

# Part 2: Pie chart (Urban ST literate vs illiterate)
urban_lit = float(pivot.loc["Urban", "ST_Literate"])
urban_illit = float(pivot.loc["Urban", "ST_Illiterate"])

plt.figure()
plt.pie([urban_lit, urban_illit],
        labels=["Literate ST (Urban)", "Illiterate ST (Urban)"],
        autopct="%1.1f%%")
plt.title("Scheduled Tribes in Urban Areas (INDIA, All Ages): Literate vs Illiterate")
plt.tight_layout()
plt.show()

Bar Chart depicting the Scheduled Tribes in Urban Areas (INDIA, All Ages): Literate vs Illiterate

Task 9: What is the state wise ratio of literates vs. illiterates in all age groups?

Note: A graph isn’t strictly mandatory (the question doesn’t say “draw”), but because it’s state-wise (many states/UTs), a bar graph makes the comparison much clearer, so I’m including it.

Step 0: Import required libraries (as per the book’s method)

Result (code):

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Read the data into a DataFrame

Result (code):

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter the required rows (State/UT totals + Total + All Ages)

What we select (student logic):

•

Only State/UT totals (not districts) → `District Code = 0`

•

Only Total (not Rural/Urban separately)

•

Only All Ages

•

Use these columns (dataset has literates/illiterates info):

•

`Total Population – Persons`

•

`Literates – Total – Persons`

Result (code):

state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][[
    "Area Name", 
    "Total Population - Persons", 
    "Literates - Total - Persons"
]].copy()

Step 3: Compute illiterates and the ratio (Literate vs Illiterate)

Formula used:

•

Illiterates (Persons) = Total Persons − Literates (Persons)

•

Ratio (L vs I) = Literates / Illiterates

Result (code):

state_df["IlliteratePersons"] = state_df["Total Population - Persons"] - state_df["Literates - Total - Persons"]
state_df["Literate_to_Illiterate_Ratio"] = state_df["Literates - Total - Persons"] / state_df["IlliteratePersons"]

Step 4: Arrange state-wise ratios and identify extremes

✅

Highest L:I ratio: Kerala = 5.34

•

Literates: 2,81,35,824

•

Illiterates: 52,70,237

✅

Lowest L:I ratio: Bihar = 1.02

•

Literates: 5,25,04,553

•

Illiterates: 5,15,94,899

(Interpretation: Kerala has ~5.34 literates for every 1 illiterate, while Bihar is close to 1:1.)

Quick view

•

Top 5 ratios: Kerala (5.34), Lakshadweep (4.41), Goa (3.98), Daman & Diu (3.44), Andaman & Nicobar (3.41)

•

Bottom 5 ratios: Jammu & Kashmir (1.29), Rajasthan (1.26), Jharkhand (1.25), Arunachal Pradesh (1.24), Bihar (1.02)

Step 5: Graph (recommended for state-wise comparison)

Data used to generate the graph is as follows:

State – KERALA (32)

3,34,06,061

2,81,35,824

52,70,237

5.34

533.86

State – LAKSHADWEEP (31)

64,473

52,553

11,920

4.41

440.88

State – GOA (30)

14,58,545

11,65,487

2,93,058

3.98

397.70

State – DAMAN & DIU (25)

2,43,247

1,88,406

54,841

3.44

343.55

State – ANDAMAN & NICOBAR ISLANDS (35)

3,80,581

2,94,281

86,300

3.41

341.00

State – MIZORAM (15)

10,97,206

8,48,175

2,49,031

3.41

340.59

State – PUDUCHERRY (34)

12,47,953

9,57,309

2,90,644

3.29

329.38

State – TRIPURA (16)

36,73,917

28,04,783

8,69,134

3.23

322.71

State – CHANDIGARH (04)

10,55,450

8,05,438

2,50,012

3.22

322.16

State – NCT OF DELHI (07)

1,67,87,941

1,27,37,767

40,50,174

3.14

314.50

State – HIMACHAL PRADESH (02)

68,64,602

50,39,736

18,24,866

2.76

276.17

State – SIKKIM (11)

6,10,577

4,44,952

1,65,625

2.69

268.65

State – MAHARASHTRA (27)

11,23,74,333

8,15,54,290

3,08,20,043

2.65

264.61

State – TAMIL NADU (33)

7,21,47,030

5,18,37,507

2,03,09,523

2.55

255.24

State – UTTARAKHAND (05)

1,00,86,292

68,80,953

32,05,339

2.15

214.67

State – GUJARAT (24)

6,04,39,692

4,10,93,358

1,93,46,334

2.12

212.41

State – NAGALAND (13)

19,78,502

13,42,434

6,36,068

2.11

211.05

State – PUNJAB (03)

2,77,43,338

1,87,07,137

90,36,201

2.07

207.02

State – WEST BENGAL (19)

9,12,76,115

6,15,38,281

2,97,37,834

2.07

206.94

State – MANIPUR (14)

28,55,794

19,08,476

9,47,318

2.01

201.46

State – KARNATAKA (29)

6,10,95,297

4,06,47,322

2,04,47,975

1.99

198.78

State – HARYANA (06)

2,53,51,462

1,65,98,988

87,52,474

1.90

189.65

State – DADRA & NAGAR HAVELI (26)

3,43,709

2,23,230

1,20,479

1.85

185.29

State – ODISHA (21)

4,19,74,218

2,67,42,595

1,52,31,623

1,76

175.57

State – ASSAM (18)

3,12,05,576

1,91,77,977

1,20,27,599

1.59

159.45

State – CHHATTISGARH (22)

2,55,45,198

1,53,79,922

1,01,65,276

1.51

151.30

State – MEGHALAYA (17)

29,66,889

17,85,005

11,81,884

1.51

151.03

State – ANDHRA PRADESH (28)

8,45,80,777

5,05,56,760

3,40,24,017

1.49

148.59

State – MADHYA PRADESH (23)

7,26,26,809

4,28,51,169

2,97,75,640

1.44

143.91

State – UTTAR PRADESH (09)

19,98,12,341

11,43,97,555

8,54,14,786

1.34

133.93

State – JAMMU & KASHMIR (01)

1,25,41,302

70,67,233

54,74,069

1.29

129.10

State – RAJASTHAN (08)

6,85,48,437

3,82,75,282

3,02,73,155

1.26

126.43

State – JHARKHAND (20)

3,29,88,134

1,83,28,069

1,46,60,065

1.25

125.02

State – ARUNACHAL PRADESH (12)

13,83,727

7,66,005

6,17,722

1.24

124.00

State – BIHAR (10)

10,40,99,452

5,25,04,553

5,15,94,899

1.02

101.76

The following code is used to generate the graph:

# Sort by ratio for a clean comparison chart
state_df_sorted = state_df.sort_values("Literate_to_Illiterate_Ratio", ascending=False).reset_index(drop=True)

# Bar chart (state-wise ratios)
plt.figure(figsize=(16, 7))
x = range(len(state_df_sorted))
plt.bar(x, state_df_sorted["Literate_to_Illiterate_Ratio"])
plt.title("State/UT-wise Ratio of Literates to Illiterates (All Ages, Total)")
plt.xlabel("State / Union Territory (sorted by ratio)")
plt.ylabel("Literate : Illiterate (L/I) ratio")
plt.xticks(list(x), state_df_sorted["State/UT"], rotation=90)
plt.tight_layout()
plt.show()

Consolidated Code:

# State-wise ratio of literates vs illiterates in all age groups (+ graph)

import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Read dataset
df = pd.read_csv("PCA_AY_2011_Revised.csv")

# Step 2: Filter State/UT totals, Total, All Ages; exclude INDIA
state_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Total Population - Persons", "Literates - Total - Persons"]].copy()

# Rename for ease
state_df = state_df.rename(columns={
    "Area Name": "State/UT",
    "Total Population - Persons": "TotalPersons",
    "Literates - Total - Persons": "LiteratePersons"
})

# Step 3: Compute Illiterates + Literate:Illiterate ratio
state_df["IlliteratePersons"] = state_df["TotalPersons"] - state_df["LiteratePersons"]
state_df["Literate_to_Illiterate_Ratio"] = state_df["LiteratePersons"] / state_df["IlliteratePersons"]

# Step 4: Sort for reporting + find highest/lowest
state_df_sorted = state_df.sort_values("Literate_to_Illiterate_Ratio", ascending=False).reset_index(drop=True)

highest = state_df_sorted.iloc[0]
lowest = state_df_sorted.iloc[-1]

print("Highest L:I ratio:", highest["State/UT"], round(highest["Literate_to_Illiterate_Ratio"], 3))
print("  Literates:", int(highest["LiteratePersons"]), "Illiterates:", int(highest["IlliteratePersons"]))

print("Lowest L:I ratio:", lowest["State/UT"], round(lowest["Literate_to_Illiterate_Ratio"], 3))
print("  Literates:", int(lowest["LiteratePersons"]), "Illiterates:", int(lowest["IlliteratePersons"]))

# Step 5: Graph (recommended for state-wise comparison)
plt.figure(figsize=(16, 7))
x = range(len(state_df_sorted))
plt.bar(x, state_df_sorted["Literate_to_Illiterate_Ratio"])
plt.title("State/UT-wise Ratio of Literates to Illiterates (All Ages, Total)")
plt.xlabel("State / Union Territory (sorted by ratio)")
plt.ylabel("Literate : Illiterate (L/I) ratio")
plt.xticks(list(x), state_df_sorted["State/UT"], rotation=90)
plt.tight_layout()
plt.show()

# (Optional) Save output for reuse
state_df_sorted.to_csv("task9_literate_vs_illiterate_ratio_statewise.csv", index=False)
print("Saved: task9_literate_vs_illiterate_ratio_statewise.csv")

Bar Chart depicting the State/UT-wise Ratio of Literates to Illiterates (All Ages, Total)

Task 10: Which state is home to the maximum no. of ST in India? Which state has the minimum no. of ST in India?

This task does not need a graph (it’s just “maximum” and “minimum”).

Step 1: Read the dataset in a DataFrame

Result (code):

import pandas as pd
        
df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter only State/UT totals (Total, All Ages)

Result (code):

st_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Scheduled Tribe - Persons"]].copy()

st_df.columns = ["State/UT", "ST_Persons"]

Step 3: Find maximum and minimum ST population

Result (code):

st_sorted = st_df.sort_values("ST_Persons", ascending=False).reset_index(drop=True)

max_state = st_sorted.iloc[0]["State/UT"]
max_value = int(st_sorted.iloc[0]["ST_Persons"])

min_state = st_sorted.iloc[-1]["State/UT"]
min_value = int(st_sorted.iloc[-1]["ST_Persons"])

Final Answer (from your dataset)

✅

Maximum ST population: State – MADHYA PRADESH (23) → 1,53,16,784

✅

Minimum ST population (including 0): State – PUDUCHERRY (34) → 0

Note (student-style): In this dataset, Puducherry has 0 ST population, so it becomes the “minimum”.

If your teacher expects “minimum but not zero”, then:

✅

Minimum ST population (non-zero): State – DAMAN & DIU (25) → 15,363

Consolidated code

import pandas as pd

df = pd.read_csv("PCA_AY_2011_Revised.csv")

st_df = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Scheduled Tribe - Persons"]].copy()

st_df.columns = ["State/UT", "ST_Persons"]

st_sorted = st_df.sort_values("ST_Persons", ascending=False).reset_index(drop=True)

max_state = st_sorted.iloc[0]["State/UT"]
max_value = int(st_sorted.iloc[0]["ST_Persons"])

min_state = st_sorted.iloc[-1]["State/UT"]
min_value = int(st_sorted.iloc[-1]["ST_Persons"])

print("Maximum ST:", max_state, max_value)
print("Minimum ST (including 0):", min_state, min_value)

# Optional: Minimum non-zero ST
nonzero = st_sorted[st_sorted["ST_Persons"] > 0]
if len(nonzero) > 0:
    min_nz_state = nonzero.iloc[-1]["State/UT"]
    min_nz_value = int(nonzero.iloc[-1]["ST_Persons"])
    print("Minimum ST (non-zero):", min_nz_state, min_nz_value)

Tabular Data (for reference)

State – MADHYA PRADESH (23)

1,53,16,784

State – MAHARASHTRA (27)

1,05,10,213

State – ODISHA (21)

95,90,756

State – RAJASTHAN (08)

92,38,534

State – GUJARAT (24)

89,17,174

State – JHARKHAND (20)

86,45,042

State – CHHATTISGARH (22)

78,22,902

State – ANDHRA PRADESH (28)

59,18,073

State – WEST BENGAL (19)

52,96,953

State – KARNATAKA (29)

42,48,987

State – ASSAM (18)

38,84,371

State – MEGHALAYA (17)

25,55,861

State – NAGALAND (13)

17,10,973

State – JAMMU & KASHMIR (01)

14,93,299

State – BIHAR (10)

13,36,573

State – MANIPUR (14)

11,67,422

State – TRIPURA (16)

11,66,813

State – UTTAR PRADESH (09)

11,34,273

State – MIZORAM (15)

10,36,115

State – ARUNACHAL PRADESH (12)

9,51,821

State – TAMIL NADU (33)

7,94,697

State – KERALA (32)

4,84,839

State – HIMACHAL PRADESH (02)

3,92,126

State – UTTARAKHAND (05)

2,91,903

State – SIKKIM (11)

2,06,360

State – DADRA & NAGAR HAVELI (26)

1,78,564

State – GOA (30)

1,49,275

State – LAKSHADWEEP (31)

61,120

State – ANDAMAN & NICOBAR ISLANDS (35)

28,530

State – DAMAN & DIU (25)

15,363

State – PUNJAB (03)

State – CHANDIGARH (04)

State – HARYANA (06)

State – NCT OF DELHI (07)

State – PUDUCHERRY (34)

Task 11: For each state, find the no. of literate females and no. of literate males. Draw a bar graph for the same. Which state has the highest ratio of literate female vs literate male and which state has the minimum?

Step 1: Read the dataset

Result (code):

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("PCA_AY_2011_Revised.csv")

Step 2: Filter the data (State/UT totals, Total, All Ages)

What I filter (student logic):

•

State/UT totals only → `District Code = 0`

•

Only “Total” (not Rural/Urban separately)

•

Only “All Ages”

•

Exclude “INDIA” because we need state-wise data

Result (code):

state_lit = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Literates - Total - Males", "Literates - Total - Females"]].copy()

state_lit = state_lit.rename(columns={
    "Area Name": "State/UT",
    "Literates - Total - Males": "LiterateMales",
    "Literates - Total - Females": "LiterateFemales"
})

Step 3: Calculate the ratio (Literate Females vs Literate Males)

Formula used:

{\text{Female-to-Male Literate Ratio} = \dfrac{\text{Literate Females}}{\text{Literate Males}}}

Result (code):

state_lit["Female_to_Male_Literate_Ratio"] = (
    state_lit["LiterateFemales"] / state_lit["LiterateMales"]
)

state_lit_sorted = state_lit.sort_values(
    "Female_to_Male_Literate_Ratio", ascending=False
).reset_index(drop=True)

Step 4: Identify the highest and lowest ratio states

Result (final answer from our dataset):

✅ Highest female:male literate ratio: State – KERALA (32)

•

Literate Females = 1,44,30,921

•

Literate Males = 1,37,04,903

•

Ratio = 1.0530

✅ Lowest female:male literate ratio: State – DAMAN & DIU (25)

•

Literate Females = 63,763

•

Literate Males = 1,24,643

•

Ratio = 0.5116

Step 5: Draw the bar graph (required)

✅ I generated and displayed the bar graph (male vs female literates for each state/UT, sorted by female/male ratio).

Result (graph code):

plt.figure(figsize=(18, 7))
x = range(len(state_lit_sorted))
bar_width = 0.45

plt.bar([i - bar_width / 2 for i in x], state_lit_sorted["LiterateMales"],
        width=bar_width, label="Literate males")
plt.bar([i + bar_width / 2 for i in x], state_lit_sorted["LiterateFemales"],
        width=bar_width, label="Literate females")

plt.title("State/UT-wise Literates (All Ages, Total): Male vs Female")
plt.xlabel("State / Union Territory (sorted by Female/Male ratio)")
plt.ylabel("Number of literates (persons)")
plt.xticks(list(x), state_lit_sorted["State/UT"], rotation=90)
plt.legend()
plt.tight_layout()
plt.show()

Data used for Task 11 (CSV)

Here is the extracted state-wise dataset used for Task 11:

State – KERALA (32)

1,37,04,903

1,44,30,921

1.05

State – MEGHALAYA (17)

9,13,879

8,71,126

0.95

State – MIZORAM (15)

4,38,529

4,09,646

0.93

State – PUDUCHERRY (34)

4,97,378

4,59,931

0.92

State – GOA (30)

6,15,823

5,49,664

0.89

State – LAKSHADWEEP (31)

28,023

24,530

0.88

State – TRIPURA (16)

15,01,369

1,30,34,14

0.87

State – NAGALAND (13)

7,23,957

6,18,477

0.85

State – TAMIL NADU (33)

2,80,40,491

2,37,97,016

0.85

State – MANIPUR (14)

10,39,858

8,68,618

0.84

State – HIMACHAL PRADESH (02)

27,52,590

22,87,146

0.83

State – WEST BENGAL (19)

3,38,18,810

2,77,19,471

0.82

State – ASSAM (18)

1,05,68,639

86,09,338

0.81

State – KARNATAKA (29)

2,25,08,471

1,81,38,851

0.81

State – MAHARASHTRA (27)

4,52,57,584

3,62,96,706

0.80

State – PUNJAB (03)

1,04,36,056

82,71,081

0.79

State – ANDAMAN & NICOBAR ISLANDS (35)

1,64,377

1,29,904

0.79

State – ANDHRA PRADESH (28)

2,82,51,243

2,23,05,517

0.79

State – UTTARAKHAND (05)

38,63,708

30,17,245

0.78

State – ODISHA (21)

1,50,89,681

1,16,52,914

0.77

State – SIKKIM (11)

2,51,269

1,93,683

0.77

State – NCT OF DELHI (07)

71,94,856

55,42,911

0.77

State – GUJARAT (24)

2,34,74,873

1,76,18,485

0.75

State – CHHATTISGARH (22)

88,07,893

65,72,029

0.75

State – ARUNACHAL PRADESH (12)

4,39,868

3,26,137

0.74

State – CHANDIGARH (04)

4,65,346

3,40,092

0.73

State – MADHYA PRADESH (23)

2,51,74,328

1,76,76,841

0.70

State – HARYANA (06)

97,94,067

68,04,921

0.69

State – JHARKHAND (20)

1,08,82,519

74,45,550

0.68

State – UTTAR PRADESH (09)

6,82,34,964

4,61,62,591

0.68

State – BIHAR (10)

3,16,08,023

2,08,96,530

0.66

State – JAMMU & KASHMIR (01)

42,64,671

28,02,562

0.66

State – RAJASTHAN (08)

2,36,88,412

1,45,86,870

0.62

State – DADRA & NAGAR HAVELI (26)

1,42,521

80,709

0.57

State – DAMAN & DIU (25)

1,24,643

63,763

0.51

Consolidated code for Task 11 (one place)

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("PCA_AY_2011_Revised.csv")

state_lit = df[
    (df["District Code"] == 0) &
    (df["Area Name"] != "INDIA") &
    (df["Total/ Rural/ Urban"] == "Total") &
    (df["Adolescent and youth categories"] == "All Ages")
][["Area Name", "Literates - Total - Males", "Literates - Total - Females"]].copy()

state_lit = state_lit.rename(columns={
    "Area Name": "State/UT",
    "Literates - Total - Males": "LiterateMales",
    "Literates - Total - Females": "LiterateFemales"
})

state_lit["Female_to_Male_Literate_Ratio"] = (
    state_lit["LiterateFemales"] / state_lit["LiterateMales"]
)

state_lit_sorted = state_lit.sort_values(
    "Female_to_Male_Literate_Ratio", ascending=False
).reset_index(drop=True)

highest = state_lit_sorted.iloc[0]
lowest = state_lit_sorted.iloc[-1]

print("Highest ratio:", highest["State/UT"],
      "Females:", int(highest["LiterateFemales"]),
      "Males:", int(highest["LiterateMales"]),
      "Ratio:", round(float(highest["Female_to_Male_Literate_Ratio"]), 4))

print("Lowest ratio:", lowest["State/UT"],
      "Females:", int(lowest["LiterateFemales"]),
      "Males:", int(lowest["LiterateMales"]),
      "Ratio:", round(float(lowest["Female_to_Male_Literate_Ratio"]), 4))

plt.figure(figsize=(18, 7))
x = range(len(state_lit_sorted))
bar_width = 0.45

plt.bar([i - bar_width / 2 for i in x], state_lit_sorted["LiterateMales"],
        width=bar_width, label="Literate males")
plt.bar([i + bar_width / 2 for i in x], state_lit_sorted["LiterateFemales"],
        width=bar_width, label="Literate females")

plt.title("State/UT-wise Literates (All Ages, Total): Male vs Female")
plt.xlabel("State / Union Territory (sorted by Female/Male ratio)")
plt.ylabel("Number of literates (persons)")
plt.xticks(list(x), state_lit_sorted["State/UT"], rotation=90)
plt.legend()
plt.tight_layout()
plt.show()

state_lit_sorted.to_csv("task11_statewise_literate_male_female.csv", index=False)
print("Saved: task11_statewise_literate_male_female.csv")

State/UT-wise Literates (All Ages, Total): Male vs Female