Data Handling Using Pandas – I

This page contains the NCERT Informatics Practicesclass 12 chapter 2 Data Handling Using Pandas – I from the book Informatics Practices. You can find the solutions for the chapter 2 of NCERT class 12 Informatics Practices, for the Short Answer Questions, Long Answer Questions and Projects/Assignments Questions in this page. So is the case if you are looking for NCERT class 12 Informatics Practices related topic Data Handling Using Pandas – I question and answers.

Buy Class 12 Informatics Practices Books NOW!

EXERCISE

What is a Series and how is it different from a 1-D array, a list and a dictionary?

A Series in Pandas is a one-dimensional array that can contain a sequence of values of any data type. These values are associated with data labels called indices. A Series can be seen as a column in a spreadsheet where the index is on the left and the data value is on the right.

The differences between a Series and other data structures like a 1-D array, a list, and a dictionary are as follows:

Characteristic

Series

1-D Array

List

Dictionary

Indexing

Supports custom labeled indexing.

Accessed by integer position using numbers only.

Indexed by integer position.

Accessed by key; does not have integer-based indexing.

Data Alignment

Automatically aligns data based on labels.

No automatic data alignment.

No concept of data alignment.

No automatic data alignment.

Handling of Missing Values

Handles NaN values. Missing values in operations are filled with NaN.

Does not support NaN; no concept of missing values.

Flexibility

More flexible due to labeled indices. Allows for complex operations and data manipulations.

Less flexible; mainly for numerical operations.

Flexible but less functional than a Series.

Very flexible in terms of data storage but lacks numerical computation capabilities of a Series.

Data Type Homogeneity

Can hold different data types.

Usually homogenous (elements of the same type).

Can contain different data types.

Values can be of different types. Keys are usually of one type.

These differences reflect the unique characteristics of a Series as mentioned in the document, compared with the general characteristics of 1-D arrays, lists, and dictionaries.

What is a DataFrame and how is it different from a 2-D array?

A DataFrame in Pandas is a two-dimensional labeled data structure, resembling a table in SQL or a spreadsheet. It consists of rows and columns, with each column potentially having a different type of value (numeric, string, boolean, etc.). DataFrames provide an intuitive way to organize and manipulate tabular data where both the rows and columns can be labeled.

Differences between DataFrame and 2-D Array:

Characteristic

DataFrame

2-D Array

Dimensionality

Two-dimensional, with rows and columns.

Two-dimensional, but often considered as a collection of 1-D arrays.

Data Types

Each column can have a different data type.

Usually requires homogeneous data types (all elements of the same type).

Indexing

Supports custom row and column labels.

Accessed by integer positions; does not support custom labels.

Flexibility

More flexible for data manipulation, can perform complex operations like merging, joining.

Primarily used for mathematical operations and transformations.

Missing Values

Can handle missing values (NaN).

Typically does not handle missing values natively.

Use Case

Ideal for data analysis and manipulation in tabular format.

Commonly used in scientific computing, engineering, and other numerical analyses.

Memory Usage

Potentially higher memory usage due to labeled indices and diverse data types.

Generally more memory-efficient, especially for large arrays of homogenous data.

This comparison highlights the fundamental differences between Pandas DataFrames and 2-D arrays, demonstrating the specialized use cases and capabilities of each.

How are DataFrames related to Series?

Relation of DataFrame to Series:

Creation of DataFrame from Series: A DataFrame can be created using one or more Series. When a single Series is used, the DataFrame will have the same number of rows as the elements in the Series, but only one column. If multiple Series are used, each Series becomes a row in the DataFrame, with the Series’ labels turning into column names.

Columns as Series: Each column in a DataFrame can be considered as a Series. When a DataFrame is created from a Dictionary of Series, every column in the DataFrame is essentially a Series. The row labels of the DataFrame are a union of all Series indexes used in its creation.

Handling Different Labels: When creating a DataFrame from multiple Series that do not have the same set of labels, the DataFrame will include all distinct labels as columns. If a particular Series lacks a value for a label, NaN is inserted in the DataFrame for that column.

Versatility in Data Types: Like Series, DataFrames can handle various data types, allowing for a mix of numeric, string, boolean, etc., in different columns.

In summary, the relationship between DataFrames and Series is significant, with Series acting as the building blocks for DataFrames. A DataFrame can be viewed as a collection of Series aligned together to form a table-like structure.

What do you understand by the size of

(i)

a Series,

(ii)

a DataFrame?

Size of a Series: The size of a Series refers to the number of elements it contains. Each element in a Series is associated with an index, which can be either a default numeric value starting from zero or user-defined labels. The size is essentially the count of these elements. For Example, consider a Series pd.Series([10, 20, 30, 40, 50]). This Series has 5 elements. Therefore, its size is 5.

Size of a DataFrame: The size of a DataFrame is determined by its shape, which includes the number of rows and columns. Each column in a DataFrame can be considered as a Series, and the DataFrame itself is a collection of these Series aligned together. The size, therefore, can be understood as the total number of cells, obtained by multiplying the number of rows by the number of columns. Example: Suppose we have a DataFrame created as pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}). This DataFrame has 3 rows and 2 columns. So, its size is 3 (rows) x 2 (columns) = 6 cells.

Create the following Series and do the specified operations:

EngAlph, having 26 elements with the alphabets as values and default index values.

Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero. Check if it is an empty series.

Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys.

MTseries, an empty Series. Check if it is an empty series.

MonthDays, from a numpy array having the number of days in the 12 months of a year. The labels should be the month numbers from 1 to 12.

5 a) To crate the series EngAlph, having 26 elements with the alphabets as values and default index values:

>>> import pandas as pd
>>> EngAlph = pd.Series(['A', 'B', 'C', ..., 'X', 'Y', 'Z'])
>>> EngAlph
0   A
1   B
2   C
...
23  X
24  Y
25  Z
dtype: object

5 b) To create the series Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero and to check if it is an empty series:

>>> import pandas as pd
>>> Vowels = pd.Series([0, 0, 0, 0, 0], index=['a', 'e', 'i', 'o', 'u'])
>>> Vowels
a   0
e   0
i   0
o   0
u   0
dtype: int64
>>> Vowels.empty
False

5 c) To create the series Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys:

>>> import pandas as pd
>>> Friends = pd.Series({'Rahul': 101, 'Priya': 102, 'Amit': 103, 'Anjali': 104, 'Vikas': 105})
>>> Friends
Rahul   101
Priya   102
Amit    103
Anjali  104
Vikas   105    
dtype: int64

5 d) To create the series MTseries, an empty Series and to Check if it is an empty series:

>>> import pandas as pd
>>> MTseries = pd.Series()
>>> MTseries
Series([], dtype: float64)
>>> MTseries.empty
True

5 e) To create the series MonthDays, from a numpy array having the number of days in the 12 months of a year with the labels being the month numbers from 1 to 12:

>>> import pandas as pd
>>> MonthDays = pd.Series([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31], index=[1, 2, 3, ..., 12])
>>> MonthDays
1     31
2     28
3     31
...
12    31
dtype: int64

Note: If you’re using all the above commands continuously, you can use import pandas as pd only once. You can skip it for subsequenty commands.

Using the Series created in Question 5, write commands for the following:

Set all the values of Vowels to 10 and display the Series.

Divide all values of Vowels by 2 and display the Series.

Create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.

Add Vowels and Vowels1 and assign the result to Vowels3.

Subtract, Multiply and Divide Vowels by Vowels1.

Alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].

6 a) Using the Series created in Question 5, write commands to set all the values of Vowels to 10 and display the Series.

>>> Vowels[:] = 10
>>> Vowels
a    10
e    10
i    10
o    10
u    10
dtype: int64

6. b) Using the Series created in Question 5, write commands to divide all values of Vowels by 2 and display the Series.

>>> Vowels = Vowels / 2
>>> Vowels
a    5.0
e    5.0
i    5.0
o    5.0
u    5.0
dtype: float64

6. c) Using the Series created in Question 5, write commands to create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.

>>> Vowels1 = pd.Series([2, 5, 6, 3, 8], index=['a', 'e', 'i', 'o', 'u'])
>>> Vowels1
a    2
e    5
i    6
o    3
u    8
dtype: int64

6. d) Using the Series created in Question 5, write commands to add Vowels and Vowels1 and assign the result to Vowels3.

>>> Vowels3 = Vowels + Vowels1
>>> Vowels3
a     7.0
e    10.0
i    11.0
o     8.0
u    13.0
dtype: float64

6. e) Using the Series created in Question 5, write commands to subtract, Multiply and Divide Vowels by Vowels1.

# Subtract
>>> Vowels - Vowels1
a    3.0
e    0.0
i   -1.0
o    2.0
u   -3.0
dtype: float64
# Multiply
>>> Vowels * Vowels1
a    10.0
e    25.0
i    30.0
o    15.0
u    40.0
dtype: float64
# Divide
>>> Vowels / Vowels1
a    2.5
e    1.0
i    0.833333
o    1.666667
u    0.625
dtype: float64

6. f) Using the Series created in Question 5, write commands to alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].

>>> Vowels1.index = ['A', 'E', 'I', 'O', 'U']
>>> Vowels1
A    2
E    5
I    6
O    3
U    8
dtype: int64

Using the Series created in Question 5, write commands for the following:

Find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, MonthDays.

Rename the Series MTseries as SeriesEmpty.

Name the index of the Series MonthDays as monthno and that of Series Friends as Fname. Notes Exercise

Display the 3rd and 2nd value of the Series Notes Friends, in that order.

Display the alphabets ‘e’ to ‘p’ from the Series EngAlph.

Display the first 10 values in the Series EngAlph.

Display the last 10 values in the Series EngAlph.

Display the MTseries.

7. Using the Series created in Question 5, write commands to find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, MonthDays.

# For EngAlph
>>> EngAlph.shape
(26,)
>>> EngAlph.size
26
>>> EngAlph.values
array(['A', 'B', 'C', ..., 'X', 'Y', 'Z'], dtype=object)

# For Vowels
>>> Vowels.shape
(5,)
>>> Vowels.size
5
>>> Vowels.values
array([10, 10, 10, 10, 10], dtype=int64)

# For Friends
>>> Friends.shape
(5,)
>>> Friends.size
5
>>> Friends.values
array([101, 102, 103, 104, 105], dtype=int64)

# For MTseries
>>> MTseries.shape
(0,)
>>> MTseries.size
0
>>> MTseries.values
array([], dtype=float64)

# For MonthDays
>>> MonthDays.shape
(12,)
>>> MonthDays.size
12
>>> MonthDays.values
array([31, 28, 31, 30, ..., 31], dtype=int64)

7. b) Using the Series created in Question 5, write commands to rename the Series MTseries as SeriesEmpty.

>>> SeriesEmpty = MTseries.rename("SeriesEmpty")
>>> SeriesEmpty
Series([], Name: SeriesEmpty, dtype: float64)

7. c) Using the Series created in Question 5, write commands to name the index of the Series MonthDays as monthno and that of Series Friends as Fname.

>>> MonthDays.index.name = 'monthno'
>>> Friends.index.name = 'Fname'

7. d) Using the Series created in Question 5, write commands to display the 3rd and 2nd value of the Series Friends, in that order.

>>> Friends[[2, 1]]
Fname
Amit      103
Priya     102
dtype: int64

7. e) Using the Series created in Question 5, write commands to display the alphabets ‘e’ to ‘p’ from the Series EngAlph.

>>> EngAlph[4:16]
E    E
F    F
G    G
H    H
I    I
J    J
K    K
L    L
M    M
N    N
O    O
P    P
dtype: object

7. f) Using the Series created in Question 5, write commands to display the first 10 values in the Series EngAlph.

>>> EngAlph.head(10)
0    A
1    B
2    C
3    D
4    E
5    F
6    G
7    H
8    I
9    J
dtype: object

7. g) Using the Series created in Question 5, write commands to display the last 10 values in the Series EngAlph.

>>> EngAlph.tail(10)
16    Q
17    R
18    S
19    T
20    U
21    V
22    W
23    X
24    Y
25    Z
dtype: object

7. h) Using the Series created in Question 5, write commands to display the MTseries.

>>> MTseries
Series([], dtype: float64)

Using the Series created in Question 5, write commands for the following:

Display the names of the months 3 through 7 from the Series MonthDays.

Display the Series MonthDays in reverse order.

8. a) Using the Series created in Question 5, write commands to display the names of the months 3 through 7 from the Series MonthDays:

>>> MonthDays[2:7]
3    31
4    30
5    31
6    30
7    31
dtype: int64

8. b) Using the Series created in Question 5, write commands to display the Series MonthDays in reverse order:

>>> MonthDays[::-1]
12    31
11    30
10    31
9     30
8     31
7     31
6     30
5     31
4     30
3     31
2     28
1     31
dtype: int64

Create the following DataFrame Sales containing year wise sales figures for five sales persons in INR. Use the years as column labels, and sales person names as row labels.

2014

2015

2016

2017

Madhu

100.5

12000

20000

50000

Kusum

150.8

18000

50000

60000

Kinshuk

200.9

22000

70000

Ankit

30000

100000

80000

Shruti

40000

45000

125000

90000

>>> import pandas as pd
# Creating the DataFrame with row labels
>>> sales_data = {
        '2014': [100.5, 150.8, 200.9, 30000, 40000],
        '2015': [12000, 18000, 22000, 30000, 45000],
        '2016': [20000, 50000, 70000, 100000, 125000],
        '2017': [50000, 60000, 70000, 80000, 90000]
    }
>>> Sales = pd.DataFrame(sales_data, index=['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'])

# Displaying the DataFrame
>>> Sales
        2014   2015    2016    2017
Madhu   100.5  12000  20000  50000
Kusum   150.8  18000  50000  60000
Kinshuk 200.9  22000  70000  70000
Ankit   30000  30000 100000  80000
Shruti  40000  45000 125000  90000

10.

Use the DataFrame created in Question 9 above to do the following:

Display the row labels of Sales.

Display the column labels of Sales.

Display the data types of each column of Sales.

Display the dimensions, shape, size and values of Sales.

Display the last two rows of Sales.

Display the first two columns of Sales.

Create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.

2018

Madhu

160000

Kusum

110000

Kinshuk

500000

Ankit

340000

Shruti

900000

Check if Sales2 is empty or it contains data.

10. a) Use the DataFrame created in Question 9 above to display the row labels of Sales.

>>> Sales.index
Index(['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'], dtype='object')

10. b) Use the DataFrame created in Question 9 above to display the column labels of Sales.

>>> Sales.columns
Index(['2014', '2015', '2016', '2017'], dtype='object')

10. c) Use the DataFrame created in Question 9 above to display the data types of each column of Sales.

>>> Sales.dtypes
2014    float64
2015      int64
2016      int64
2017      int64
dtype: object

10. d) Use the DataFrame created in Question 9 above to do display the dimensions, shape, size and values of Sales:

>>> Sales.ndim
2
>>> Sales.shape
(5, 4)
>>> Sales.size
20
>>> Sales.values
   [[   100.5  12000   20000   50000]
    [   150.8  18000   50000   60000]
    [   200.9  22000   70000   70000]
    [ 30000    30000  100000   80000]
    [ 40000    45000  125000   90000]]

10. e) Use the DataFrame created in Question 9 above to do display the last two rows of Sales:

>>> Sales.tail(2)
        2014   2015    2016   2017
Ankit  30000  30000  100000  80000
Shruti 40000  45000  125000  90000

10. f) Use the DataFrame created in Question 9 above to display the first two columns of Sales.

>>> Sales[['2014', '2015']]
         2014   2015
Madhu   100.5  12000
Kusum   150.8  18000
Kinshuk 200.9  22000
Ankit   30000  30000
Shruti  40000  45000

10. g) Use the DataFrame created in Question 9 above to create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.

2018

Madhu

160000

Kusum

110000

Kinshuk

500000

Ankit

340000

Shruti

900000

>>> sales_data_2018 = {'2018': [160000, 110000, 500000, 340000, 900000]}
>>> Sales2 = pd.DataFrame(sales_data_2018, index=['Madhu', 'Kusum', 'Kinshuk', 'Ankit', 'Shruti'])
>>> Sales2
          2018
Madhu   160000
Kusum   110000
Kinshuk 500000
Ankit   340000
Shruti  900000

10. h) Use the DataFrame created in Question 9 above to check if Sales2 is empty or it contains data:

>>> Sales2.empty
False

11.

Use the DataFrame created in Question 9 above to do the following:

Append the DataFrame Sales2 to the DataFrame Sales.

Change the DataFrame Sales such that it becomes its transpose.

Display the sales made by all sales persons in the year 2017.

Display the sales made by Madhu and Ankit in the year 2017 and 2018.

Display the sales made by Shruti 2016.

Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively.

Delete the data for the year 2014 from the DataFrame Sales.

Delete the data for sales man Kinshuk from the DataFrame Sales.

Change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh.

Update the sale made by Shailesh in 2018 to 100000.

Write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels.

Read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.

11. a) Use the DataFrame created in Question 9 above to append the DataFrame Sales2 to the DataFrame Sales:

# Append Sales2 to Sales
    >>> Sales = pd.concat([Sales, Sales2], axis=1)
    # Print the modified Sales DataFrame
    >>> Sales
    2014   2015   2016   2017    2018
    Madhu   100.5  12000  20000  50000  160000
    Kusum   150.8  18000  50000  60000  110000
    Kinshuk 200.9  22000  70000  70000  500000
    Ankit   30000  30000 100000  80000  340000
    Shruti  40000  45000 125000  90000  900000

11. b) Use the DataFrame created in Question 9 above to change the DataFrame Sales such that it becomes its transpose:

# Transpose the Sales DataFrame
>>> Sales = Sales.T
# Print the transposed Sales DataFrame
>>> Sales
       Madhu   Kusum  Kinshuk  Ankit  Shruti
2014   100.5   150.8    200.9  30000   40000
2015   12000   18000    22000  30000   45000
2016   20000   50000    70000 100000  125000
2017   50000   60000    70000  80000   90000
2018  160000  110000   500000 340000  900000

11. c) Use the DataFrame created in Question 9 above to display the sales made by all sales persons in the year 2017:

>>> Sales.loc['2017']
Madhu       50000
Kusum       60000
Kinshuk     70000
Ankit       80000
Shruti      90000
dtype: int64

11. d) Use the DataFrame created in Question 9 above to display the sales made by Madhu and Ankit in the year 2017 and 2018:

>>> Sales.loc[['2017', '2018'], ['Madhu', 'Ankit']]
       Madhu   Ankit
2017   50000   80000
2018  160000  340000

11. e) Use the DataFrame created in Question 9 above to display the sales made by Shruti 2016:

>>> Sales.loc['2016', 'Shruti']
125000

11. f) Use the DataFrame created in Question 9 above to add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively:

>>> Sales['Sumeet'] = [196.2, 37800, 52000, 78438, 38852]
>>> Sales
       Madhu   Kusum  Kinshuk  Ankit  Shruti  Sumeet
2014    100.5   150.8    200.9  30000   40000   196.2
2015   12000   18000    22000  30000   45000  37800
2016   20000   50000    70000 100000  125000  52000
2017   50000   60000    70000  80000   90000  78438
2018  160000  110000   500000 340000  900000  38852

11. g) Use the DataFrame created in Question 9 above to delete the data for the year 2014 from the DataFrame Sales:

>>> Sales.drop('2014', inplace=True)
>>> Sales
       Madhu   Kusum  Kinshuk  Ankit  Shruti Sumeet
2015   12000   18000    22000  30000   45000  37800
2016   20000   50000    70000 100000  125000  52000
2017   50000   60000    70000  80000   90000  78438
2018  160000  110000   500000 340000  900000  38852

11. h) Use the DataFrame created in Question 9 above to delete the data for sales man Kinshuk from the DataFrame Sales:

>>> Sales.drop('Kinshuk', axis=1, inplace=True)
>>> Sales
       Madhu   Kusum  Ankit  Shruti  Sumeet
2015   12000   18000  30000   45000   37800
2016   20000   50000 100000  125000   52000
2017   50000   60000  80000   90000   78438
2018  160000  110000 340000  900000   38852

11. i) Use the DataFrame created in Question 9 above to change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh:

>>> Sales.rename(columns={'Ankit': 'Vivaan', 'Madhu': 'Shailesh'}, inplace=True)
>>> Sales
        Shailesh  Kusum  Vivaan  Shruti  Sumeet
2015   12000     18000  30000   45000   37800
2016   20000     50000 100000  125000   52000
2017   50000     60000  80000   90000   78438
2018  160000    110000 340000  900000   38852

11. j) Use the DataFrame created in Question 9 above to update the sale made by Shailesh in 2018 to 100000:

>>> Sales.loc['2018', 'Shailesh'] = 100000
>>> Sales
         Shailesh     Kusum Vivaan  Shruti  Sumeet
2015        12000     18000  30000   45000   37800
2016        20000     50000 100000  125000   52000
2017        50000     60000  80000   90000   78438
2018       100000    110000 340000  900000   38852

11. k) Use the DataFrame created in Question 9 above to write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels:

>>> Sales.to_csv('C:/NCERT/ResultData.csv', header=False, index=False)

Note: The above answer is assuming that you’re using Windows Operating System. If you’re using a different operating system, you need to use the appropriate file name. But for most of you the above answer should work fine. For instance if you’re using Linux OS, you might be using '/Users/user_name/NCERT/ResultData.csv' for the file name. The rest of it remains the same.

11. l) Use the DataFrame created in Question 9 above to read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.

# Read data from SalesFigures.csv into a DataFrame
>>> SalesRetrieved = pd.read_csv('C:/NCERT/SalesFigures.csv', header=None)

# Display the DataFrame
>>> SalesRetrieved

# Update the row labels and column labels of SalesRetrieved to match Sales
>>> SalesRetrieved.columns = Sales.columns
>>> SalesRetrieved.index = Sales.index

# Display the updated DataFrame
>>> SalesRetrieved