This page contains the NCERT Informatics Practicesclass 12 chapter 2 Data Handling Using Pandas – I from the book Informatics Practices. You can find the solutions for the chapter 2 of NCERT class 12 Informatics Practices, for the Short Answer Questions, Long Answer Questions and Projects/Assignments Questions in this page. So is the case if you are looking for NCERT class 12 Informatics Practices related topic Data Handling Using Pandas – I question and answers.
EXERCISE
1.
What is a Series and how is it different from a 1-D array, a list and a dictionary?
A Series in Pandas is a one-dimensional array that can contain a sequence of values of any data type. These values are associated with data labels called indices. A Series can be seen as a column in a spreadsheet where the index is on the left and the data value is on the right.
The differences between a Series and other data structures like a 1-D array, a list, and a dictionary are as follows:
Characteristic
Series
1-D Array
List
Dictionary
Indexing
Supports custom labeled indexing.
Accessed by integer position using numbers only.
Indexed by integer position.
Accessed by key; does not have integer-based indexing.
Data Alignment
Automatically aligns data based on labels.
No automatic data alignment.
No concept of data alignment.
No automatic data alignment.
Handling of Missing Values
Handles NaN values. Missing values in operations are filled with NaN.
Does not support NaN; no concept of missing values.
Does not support NaN; no concept of missing values.
Does not support NaN; no concept of missing values.
Flexibility
More flexible due to labeled indices. Allows for complex operations and data manipulations.
Less flexible; mainly for numerical operations.
Flexible but less functional than a Series.
Very flexible in terms of data storage but lacks numerical computation capabilities of a Series.
Data Type Homogeneity
Can hold different data types.
Usually homogenous (elements of the same type).
Can contain different data types.
Values can be of different types. Keys are usually of one type.
These differences reflect the unique characteristics of a Series as mentioned in the document, compared with the general characteristics of 1-D arrays, lists, and dictionaries.
2.
What is a DataFrame and how is it different from a 2-D array?
A DataFrame in Pandas is a two-dimensional labeled data structure, resembling a table in SQL or a spreadsheet. It consists of rows and columns, with each column potentially having a different type of value (numeric, string, boolean, etc.). DataFrames provide an intuitive way to organize and manipulate tabular data where both the rows and columns can be labeled.
Differences between DataFrame and 2-D Array:
Characteristic
DataFrame
2-D Array
Dimensionality
Two-dimensional, with rows and columns.
Two-dimensional, but often considered as a collection of 1-D arrays.
Data Types
Each column can have a different data type.
Usually requires homogeneous data types (all elements of the same type).
Indexing
Supports custom row and column labels.
Accessed by integer positions; does not support custom labels.
Flexibility
More flexible for data manipulation, can perform complex operations like merging, joining.
Primarily used for mathematical operations and transformations.
Missing Values
Can handle missing values (NaN).
Typically does not handle missing values natively.
Use Case
Ideal for data analysis and manipulation in tabular format.
Commonly used in scientific computing, engineering, and other numerical analyses.
Memory Usage
Potentially higher memory usage due to labeled indices and diverse data types.
Generally more memory-efficient, especially for large arrays of homogenous data.
This comparison highlights the fundamental differences between Pandas DataFrames and 2-D arrays, demonstrating the specialized use cases and capabilities of each.
3.
How are DataFrames related to Series?
Relation of DataFrame to Series:
1.
Creation of DataFrame from Series: A DataFrame can be created using one or more Series. When a single Series is used, the DataFrame will have the same number of rows as the elements in the Series, but only one column. If multiple Series are used, each Series becomes a row in the DataFrame, with the Series’ labels turning into column names.
2.
Columns as Series: Each column in a DataFrame can be considered as a Series. When a DataFrame is created from a Dictionary of Series, every column in the DataFrame is essentially a Series. The row labels of the DataFrame are a union of all Series indexes used in its creation.
3.
Handling Different Labels: When creating a DataFrame from multiple Series that do not have the same set of labels, the DataFrame will include all distinct labels as columns. If a particular Series lacks a value for a label, NaN is inserted in the DataFrame for that column.
4.
Versatility in Data Types: Like Series, DataFrames can handle various data types, allowing for a mix of numeric, string, boolean, etc., in different columns.
In summary, the relationship between DataFrames and Series is significant, with Series acting as the building blocks for DataFrames. A DataFrame can be viewed as a collection of Series aligned together to form a table-like structure.
4.
What do you understand by the size of
(i)
a Series,
(ii)
a DataFrame?
Size of a Series: The size of a Series refers to the number of elements it contains. Each element in a Series is associated with an index, which can be either a default numeric value starting from zero or user-defined labels. The size is essentially the count of these elements. For Example, consider a Series
pd.Series([10, 20, 30, 40, 50])
. This Series has 5 elements. Therefore, its size is 5.Size of a DataFrame: The size of a DataFrame is determined by its shape, which includes the number of rows and columns. Each column in a DataFrame can be considered as a Series, and the DataFrame itself is a collection of these Series aligned together. The size, therefore, can be understood as the total number of cells, obtained by multiplying the number of rows by the number of columns. Example: Suppose we have a DataFrame created as
pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
. This DataFrame has 3 rows and 2 columns. So, its size is 3 (rows) x 2 (columns) = 6 cells.5.
Create the following Series and do the specified operations:
a)
EngAlph, having 26 elements with the alphabets as values and default index values.
b)
Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero. Check if it is an empty series.
c)
Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys.
d)
MTseries, an empty Series. Check if it is an empty series.
e)
MonthDays, from a numpy array having the number of days in the 12 months of a year. The labels should be the month numbers from 1 to 12.
5 a) To crate the series EngAlph, having 26 elements with the alphabets as values and default index values:
>>> import pandas as pd
>>> EngAlph = pd.Series(['A', 'B', 'C', ..., 'X', 'Y', 'Z'])
>>> EngAlph
0 A
1 B
2 C
...
23 X
24 Y
25 Z
dtype: object
5 b) To create the series Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero and to check if it is an empty series:
>>> import pandas as pd
>>> Vowels = pd.Series([0, 0, 0, 0, 0], index=['a', 'e', 'i', 'o', 'u'])
>>> Vowels
a 0
e 0
i 0
o 0
u 0
dtype: int64
>>> Vowels.empty
False
5 c) To create the series Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys:
>>> import pandas as pd
>>> Friends = pd.Series({'Rahul': 101, 'Priya': 102, 'Amit': 103, 'Anjali': 104, 'Vikas': 105})
>>> Friends
Rahul 101
Priya 102
Amit 103
Anjali 104
Vikas 105
dtype: int64
5 d) To create the series MTseries, an empty Series and to Check if it is an empty series:
>>> import pandas as pd
>>> MTseries = pd.Series()
>>> MTseries
Series([], dtype: float64)
>>> MTseries.empty
True
5 e) To create the series MonthDays, from a numpy array having the number of days in the 12 months of a year with the labels being the month numbers from 1 to 12:
>>> import pandas as pd
>>> MonthDays = pd.Series([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], index=[1, 2, 3, ..., 12])
>>> MonthDays
1 31
2 28
3 31
...
12 31
dtype: int64
Note: If you’re using all the above commands continuously, you can use
import pandas as pd
only once. You can skip it for subsequenty commands.6.
Using the Series created in Question 5, write commands for the following:
a)
Set all the values of Vowels to 10 and display the Series.
b)
Divide all values of Vowels by 2 and display the Series.
c)
Create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.
d)
Add Vowels and Vowels1 and assign the result to Vowels3.
e)
Subtract, Multiply and Divide Vowels by Vowels1.
f)
Alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].
6 a) Using the Series created in Question 5, write commands to set all the values of Vowels to 10 and display the Series.
>>> Vowels[:] = 10
>>> Vowels
a 10
e 10
i 10
o 10
u 10
dtype: int64
6. b) Using the Series created in Question 5, write commands to divide all values of Vowels by 2 and display the Series.
>>> Vowels = Vowels / 2
>>> Vowels
a 5.0
e 5.0
i 5.0
o 5.0
u 5.0
dtype: float64
6. c) Using the Series created in Question 5, write commands to create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.
>>> Vowels1 = pd.Series([2, 5, 6, 3, 8], index=['a', 'e', 'i', 'o', 'u'])
>>> Vowels1
a 2
e 5
i 6
o 3
u 8
dtype: int64
6. d) Using the Series created in Question 5, write commands to add Vowels and Vowels1 and assign the result to Vowels3.
>>> Vowels3 = Vowels + Vowels1
>>> Vowels3
a 7.0
e 10.0
i 11.0
o 8.0
u 13.0
dtype: float64
6. e) Using the Series created in Question 5, write commands to subtract, Multiply and Divide Vowels by Vowels1.
# Subtract
>>> Vowels - Vowels1
a 3.0
e 0.0
i -1.0
o 2.0
u -3.0
dtype: float64
# Multiply
>>> Vowels * Vowels1
a 10.0
e 25.0
i 30.0
o 15.0
u 40.0
dtype: float64
# Divide
>>> Vowels / Vowels1
a 2.5
e 1.0
i 0.833333
o 1.666667
u 0.625
dtype: float64
6. f) Using the Series created in Question 5, write commands to alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].
>>> Vowels1.index = ['A', 'E', 'I', 'O', 'U']
>>> Vowels1
A 2
E 5
I 6
O 3
U 8
dtype: int64
7.
Using the Series created in Question 5, write commands for the following:
a)
Find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, MonthDays.
b)
Rename the Series MTseries as SeriesEmpty.
c)
Name the index of the Series MonthDays as monthno and that of Series Friends as Fname. Notes Exercise
d)
Display the 3rd and 2nd value of the Series Notes Friends, in that order.
e)
Display the alphabets ‘e’ to ‘p’ from the Series EngAlph.
f)
Display the first 10 values in the Series EngAlph.
g)
Display the last 10 values in the Series EngAlph.
h)
Display the MTseries.
7. Using the Series created in Question 5, write commands to find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, MonthDays.
# For EngAlph
>>> EngAlph.shape
(26,)
>>> EngAlph.size
26
>>> EngAlph.values
array(['A', 'B', 'C', ..., 'X', 'Y', 'Z'], dtype=object)
# For Vowels
>>> Vowels.shape
(5,)
>>> Vowels.size
5
>>> Vowels.values
array([10, 10, 10, 10, 10], dtype=int64)
# For Friends
>>> Friends.shape
(5,)
>>> Friends.size
5
>>> Friends.values
array([101, 102, 103, 104, 105], dtype=int64)
# For MTseries
>>> MTseries.shape
(0,)
>>> MTseries.size
0
>>> MTseries.values
array([], dtype=float64)
# For MonthDays
>>> MonthDays.shape
(12,)
>>> MonthDays.size
12
>>> MonthDays.values
array([31, 28, 31, 30, ..., 31], dtype=int64)
7. b) Using the Series created in Question 5, write commands to rename the Series MTseries as SeriesEmpty.
>>> SeriesEmpty = MTseries.rename("SeriesEmpty")
>>> SeriesEmpty
Series([], Name: SeriesEmpty, dtype: float64)
7. c) Using the Series created in Question 5, write commands to name the index of the Series MonthDays as monthno and that of Series Friends as Fname.
>>> MonthDays.index.name = 'monthno'
>>> Friends.index.name = 'Fname'
7. d) Using the Series created in Question 5, write commands to display the 3rd and 2nd value of the Series Friends, in that order.
>>> Friends[[2, 1]]
Fname
Amit 103
Priya 102
dtype: int64
7. e) Using the Series created in Question 5, write commands to display the alphabets ‘e’ to ‘p’ from the Series EngAlph.
>>> EngAlph[4:16]
E E
F F
G G
H H
I I
J J
K K
L L
M M
N N
O O
P P
dtype: object
7. f) Using the Series created in Question 5, write commands to display the first 10 values in the Series EngAlph.
>>> EngAlph.head(10)
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
8 I
9 J
dtype: object
7. g) Using the Series created in Question 5, write commands to display the last 10 values in the Series EngAlph.
>>> EngAlph.tail(10)
16 Q
17 R
18 S
19 T
20 U
21 V
22 W
23 X
24 Y
25 Z
dtype: object
7. h) Using the Series created in Question 5, write commands to display the MTseries.
>>> MTseries
Series([], dtype: float64)
8.
Using the Series created in Question 5, write commands for the following:
a)
Display the names of the months 3 through 7 from the Series MonthDays.
b)
Display the Series MonthDays in reverse order.
8. a) Using the Series created in Question 5, write commands to display the names of the months 3 through 7 from the Series MonthDays:
>>> MonthDays[2:7]
3 31
4 30
5 31
6 30
7 31
dtype: int64
8. b) Using the Series created in Question 5, write commands to display the Series MonthDays in reverse order:
>>> MonthDays[::-1]
12 31
11 30
10 31
9 30
8 31
7 31
6 30
5 31
4 30
3 31
2 28
1 31
dtype: int64
9.
Create the following DataFrame Sales containing year wise sales figures for five sales persons in INR. Use the years as column labels, and sales person names as row labels.
2014
2015
2016
2017
Madhu
100.5
12000
20000
50000
Kusum
150.8
18000
50000
60000
Kinshuk
200.9
22000
70000
70000
Ankit
30000
30000
100000
80000
Shruti
40000
45000
125000
90000
>>> import pandas as pd
# Creating the DataFrame with row labels
>>> sales_data = {
'2014': [100.5, 150.8, 200.9, 30000, 40000],
'2015': [12000, 18000, 22000, 30000, 45000],
'2016': [20000, 50000, 70000, 100000, 125000],
'2017': [50000, 60000, 70000, 80000, 90000]
}
>>> Sales = pd.DataFrame(sales_data, index=['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'])
# Displaying the DataFrame
>>> Sales
2014 2015 2016 2017
Madhu 100.5 12000 20000 50000
Kusum 150.8 18000 50000 60000
Kinshuk 200.9 22000 70000 70000
Ankit 30000 30000 100000 80000
Shruti 40000 45000 125000 90000
10.
Use the DataFrame created in Question 9 above to do the following:
a)
Display the row labels of Sales.
b)
Display the column labels of Sales.
c)
Display the data types of each column of Sales.
d)
Display the dimensions, shape, size and values of Sales.
e)
Display the last two rows of Sales.
f)
Display the first two columns of Sales.
g)
Create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.
2018
Madhu
160000
Kusum
110000
Kinshuk
500000
Ankit
340000
Shruti
900000
h)
Check if Sales2 is empty or it contains data.
10. a) Use the DataFrame created in Question 9 above to display the row labels of Sales.
>>> Sales.index
Index(['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'], dtype='object')
10. b) Use the DataFrame created in Question 9 above to display the column labels of Sales.
>>> Sales.columns
Index(['2014', '2015', '2016', '2017'], dtype='object')
10. c) Use the DataFrame created in Question 9 above to display the data types of each column of Sales.
>>> Sales.dtypes
2014 float64
2015 int64
2016 int64
2017 int64
dtype: object
10. d) Use the DataFrame created in Question 9 above to do display the dimensions, shape, size and values of Sales:
>>> Sales.ndim
2
>>> Sales.shape
(5, 4)
>>> Sales.size
20
>>> Sales.values
[[ 100.5 12000 20000 50000]
[ 150.8 18000 50000 60000]
[ 200.9 22000 70000 70000]
[ 30000 30000 100000 80000]
[ 40000 45000 125000 90000]]
10. e) Use the DataFrame created in Question 9 above to do display the last two rows of Sales:
>>> Sales.tail(2)
2014 2015 2016 2017
Ankit 30000 30000 100000 80000
Shruti 40000 45000 125000 90000
10. f) Use the DataFrame created in Question 9 above to display the first two columns of Sales.
>>> Sales[['2014', '2015']]
2014 2015
Madhu 100.5 12000
Kusum 150.8 18000
Kinshuk 200.9 22000
Ankit 30000 30000
Shruti 40000 45000
10. g) Use the DataFrame created in Question 9 above to create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.
2018
Madhu
160000
Kusum
110000
Kinshuk
500000
Ankit
340000
Shruti
900000
>>> sales_data_2018 = {'2018': [160000, 110000, 500000, 340000, 900000]}
>>> Sales2 = pd.DataFrame(sales_data_2018, index=['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'])
>>> Sales2
2018
Madhu 160000
Kusum 110000
Kinshuk 500000
Ankit 340000
Shruti 900000
10. h) Use the DataFrame created in Question 9 above to check if Sales2 is empty or it contains data:
>>> Sales2.empty
False
11.
Use the DataFrame created in Question 9 above to do the following:
a)
Append the DataFrame Sales2 to the DataFrame Sales.
b)
Change the DataFrame Sales such that it becomes its transpose.
c)
Display the sales made by all sales persons in the year 2017.
d)
Display the sales made by Madhu and Ankit in the year 2017 and 2018.
e)
Display the sales made by Shruti 2016.
f)
Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively.
g)
Delete the data for the year 2014 from the DataFrame Sales.
h)
Delete the data for sales man Kinshuk from the DataFrame Sales.
i)
Change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh.
j)
Update the sale made by Shailesh in 2018 to 100000.
k)
Write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels.
l)
Read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.
11. a) Use the DataFrame created in Question 9 above to append the DataFrame Sales2 to the DataFrame Sales:
# Append Sales2 to Sales
>>> Sales = pd.concat([Sales, Sales2], axis=1)
# Print the modified Sales DataFrame
>>> Sales
2014 2015 2016 2017 2018
Madhu 100.5 12000 20000 50000 160000
Kusum 150.8 18000 50000 60000 110000
Kinshuk 200.9 22000 70000 70000 500000
Ankit 30000 30000 100000 80000 340000
Shruti 40000 45000 125000 90000 900000
11. b) Use the DataFrame created in Question 9 above to change the DataFrame Sales such that it becomes its transpose:
# Transpose the Sales DataFrame
>>> Sales = Sales.T
# Print the transposed Sales DataFrame
>>> Sales
Madhu Kusum Kinshuk Ankit Shruti
2014 100.5 150.8 200.9 30000 40000
2015 12000 18000 22000 30000 45000
2016 20000 50000 70000 100000 125000
2017 50000 60000 70000 80000 90000
2018 160000 110000 500000 340000 900000
11. c) Use the DataFrame created in Question 9 above to display the sales made by all sales persons in the year 2017:
>>> Sales.loc['2017']
Madhu 50000
Kusum 60000
Kinshuk 70000
Ankit 80000
Shruti 90000
dtype: int64
11. d) Use the DataFrame created in Question 9 above to display the sales made by Madhu and Ankit in the year 2017 and 2018:
>>> Sales.loc[['2017', '2018'], ['Madhu', 'Ankit']]
Madhu Ankit
2017 50000 80000
2018 160000 340000
11. e) Use the DataFrame created in Question 9 above to display the sales made by Shruti 2016:
>>> Sales.loc['2016', 'Shruti']
125000
11. f) Use the DataFrame created in Question 9 above to add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively:
>>> Sales['Sumeet'] = [196.2, 37800, 52000, 78438, 38852]
>>> Sales
Madhu Kusum Kinshuk Ankit Shruti Sumeet
2014 100.5 150.8 200.9 30000 40000 196.2
2015 12000 18000 22000 30000 45000 37800
2016 20000 50000 70000 100000 125000 52000
2017 50000 60000 70000 80000 90000 78438
2018 160000 110000 500000 340000 900000 38852
11. g) Use the DataFrame created in Question 9 above to delete the data for the year 2014 from the DataFrame Sales:
>>> Sales.drop('2014', inplace=True)
>>> Sales
Madhu Kusum Kinshuk Ankit Shruti Sumeet
2015 12000 18000 22000 30000 45000 37800
2016 20000 50000 70000 100000 125000 52000
2017 50000 60000 70000 80000 90000 78438
2018 160000 110000 500000 340000 900000 38852
11. h) Use the DataFrame created in Question 9 above to delete the data for sales man Kinshuk from the DataFrame Sales:
>>> Sales.drop('Kinshuk', axis=1, inplace=True)
>>> Sales
Madhu Kusum Ankit Shruti Sumeet
2015 12000 18000 30000 45000 37800
2016 20000 50000 100000 125000 52000
2017 50000 60000 80000 90000 78438
2018 160000 110000 340000 900000 38852
11. i) Use the DataFrame created in Question 9 above to change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh:
>>> Sales.rename(columns={'Ankit': 'Vivaan', 'Madhu': 'Shailesh'}, inplace=True)
>>> Sales
Shailesh Kusum Vivaan Shruti Sumeet
2015 12000 18000 30000 45000 37800
2016 20000 50000 100000 125000 52000
2017 50000 60000 80000 90000 78438
2018 160000 110000 340000 900000 38852
11. j) Use the DataFrame created in Question 9 above to update the sale made by Shailesh in 2018 to 100000:
>>> Sales.loc['2018', 'Shailesh'] = 100000
>>> Sales
Shailesh Kusum Vivaan Shruti Sumeet
2015 12000 18000 30000 45000 37800
2016 20000 50000 100000 125000 52000
2017 50000 60000 80000 90000 78438
2018 100000 110000 340000 900000 38852
11. k) Use the DataFrame created in Question 9 above to write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels:
>>> Sales.to_csv('C:/NCERT/ResultData.csv', header=False, index=False)
Note: The above answer is assuming that you’re using Windows Operating System. If you’re using a different operating system, you need to use the appropriate file name. But for most of you the above answer should work fine. For instance if you’re using Linux OS, you might be using
'/Users/user_name/NCERT/ResultData.csv'
for the file name. The rest of it remains the same.11. l) Use the DataFrame created in Question 9 above to read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.
# Read data from SalesFigures.csv into a DataFrame
>>> SalesRetrieved = pd.read_csv('C:/NCERT/SalesFigures.csv', header=None)
# Display the DataFrame
>>> SalesRetrieved
# Update the row labels and column labels of SalesRetrieved to match Sales
>>> SalesRetrieved.columns = Sales.columns
>>> SalesRetrieved.index = Sales.index
# Display the updated DataFrame
>>> SalesRetrieved
Note: The above answer is assuming that you’re using Windows Operating System. If you’re using a different operating system, you need to use the appropriate file name. But for most of you the above answer should work fine. For instance if you’re using Linux OS, you might be using
'/Users/user_name/NCERT/ResultData.csv'
for the file name. The rest of it remains the same.