This page contains the NCERT Informatics Practices class 11 chapter 6 Introduction To NumPy Case Study. You can find the solutions for the chapter 6 of NCERT class 11 Informatics Practices Case study. So is the case if you are looking for NCERT class 11 Informatics Practices related topic Introduction to NumPy case study solutions. If you’re looking exercise solutions in this chapter, you can find them at Exercise Solutions
Case Study Solutions
We have already learnt that a data set (or dataset) is a collection of data. Usually a data set corresponds to the contents of a database table, or a statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a member or an item etc. A data set lists values for each of the variables, such as height and weight of a student, for each row (item) of the data set. Open data refers to information released in a publicly accessible repository.
The Iris flower data set is an example of an open data. It is also called Fisher’s Iris data set as this data set was introduced by the British statistician and biologist Ronald Fisher in 1936. The Iris data set consists of 50 samples from each of the three species of the flower Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured for each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a model to distinguish one species from each other. The full data set is freely available on UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/iris.
We shall use the following smaller section of this data set having 30 rows (10 rows for each of the three species). We shall include a column for species number that has a value 1 for Iris setosa, 2 for Iris virginica and 3 for Iris versicolor.
Sepal
Length
Length
Sepal
Width
Width
Petal
Length
Length
Petal
Width
Width
Iris
Species
No
No
5.1
3.5
1.4
0.2
Iris-setosa
1
4.9
3
1.4
0.2
Iris-setosa
1
4.7
3.2
1.3
0.2
Iris-setosa
1
4.6
3.1
1.5
0.2
Iris-setosa
1
5
3.6
1.4
0.2
Iris-setosa
1
5.4
3.9
1.7
0.4
Iris-setosa
1
4.6
3.4
1.4
0.3
Iris-setosa
1
5
3.4
1.5
0.2
Iris-setosa
1
4.4
2.9
1.4
0.2
Iris-setosa
1
4.9
3.1
1.5
0.1
Iris-setosa
1
5.5
2.6
4.4
1.2
Iris-versicolor
2
6.1
3
4.6
1.4
Iris-versicolor
2
5.8
2.6
4
1.2
Iris-versicolor
2
5
2.3
3.3
1
Iris-versicolor
2
5.6
2.7
4.2
1.3
Iris-versicolor
2
5.7
3
4.2
1.2
Iris-versicolor
2
5.7
2.9
4.2
1.3
Iris-versicolor
2
6.2
2.9
4.3
1.3
Iris-versicolor
2
5.1
2.5
3
1.1
Iris-versicolor
2
5.7
2.8
4.1
1.3
Iris-versicolor
2
6.9
3.1
5.4
2.1
Iris-virginica
3
6.7
3.1
5.6
2.4
Iris-virginica
3
6.9
3.1
5.1
2.3
Iris-virginica
3
5.8
2.7
5.1
1.9
Iris-virginica
3
6.8
3.2
5.9
2.3
Iris-virginica
3
6.7
3.3
5.7
2.5
Iris-virginica
3
6.7
3
5.2
2.3
Iris-virginica
3
6.3
2.5
5
1.9
Iris-virginica
3
6.5
3
5.2
2
Iris-virginica
3
6.2
3.4
5.4
2.3
Iris-virginica
3
You may type this using any text editor (Notepad, gEdit or any other) in the way as shown below and store the file with a name called Iris.txt. (In case you wish to work with the entire dataset you could download a .csv file for the same from the Internet and save it as Iris.txt). The headers are:
sepal length, sepal width, petal length, petal width, iris, Species No
5.1, 3.5, 1.4, 0.2, Iris-setosa, 1
4.9, 3, 1.4, 0.2, Iris-setosa, 1
4.7, 3.2, 1.3, 0.2, Iris-setosa, 1
4.6, 3.1, 1.5, 0.2, Iris-setosa, 1
5, 3.6, 1.4, 0.2, Iris-setosa, 1
5.4, 3.9, 1.7, 0.4, Iris-setosa, 1
4.6, 3.4, 1.4, 0.3, Iris-setosa, 1
5, 3.4, 1.5, 0.2, Iris-setosa, 1
4.4, 2.9, 1.4, 0.2, Iris-setosa, 1
4.9, 3.1, 1.5, 0.1, Iris-setosa, 1
5.5, 2.6, 4.4, 1.2, Iris-versicolor, 2
6.1, 3, 4.6, 1.4, Iris-versicolor, 2
5.8, 2.6, 4, 1.2, Iris-versicolor, 2
5, 2.3, 3.3, 1, Iris-versicolor, 2
5.6, 2.7, 4.2, 1.3, Iris-versicolor, 2
5.7, 3, 4.2, 1.2, Iris-versicolor, 2
5.7, 2.9, 4.2, 1.3, Iris-versicolor, 2
6.2, 2.9, 4.3, 1.3, Iris-versicolor, 2
5.1, 2.5, 3, 1.1, Iris-versicolor, 2
5.7, 2.8, 4.1, 1.3, Iris-versicolor, 2
6.9, 3.1, 5.4, 2.1, Iris-virginica, 3
6.7, 3.1, 5.6, 2.4, Iris-virginica, 3
6.9, 3.1, 5.1, 2.3, Iris-virginica, 3
5.8, 2.7, 5.1, 1.9, Iris-virginica, 3
6.8, 3.2, 5.9, 2.3, Iris-virginica, 3
6.7, 3.3, 5.7, 2.5, Iris-virginica, 3
6.7, 3, 5.2, 2.3, Iris-virginica, 3
6.3, 2.5, 5, 1.9, Iris-virginica, 3
6.5, 3, 5.2, 2, Iris-virginica, 3
6.2, 3.4, 5.4, 2.3, Iris-virginica, 3
1. Load the data in the file
Iris.txt
in a 2-D array called iris
.Note: For all the following questions, we should
import numpy
. Import should be only once (and is usually enough for the question 1). But if you’re directly working with any other question they you shoud import numpy
before you execute any commands.>>> import numpy as np
>>> iris = np.genfromtxt('C:/NCERT/Iris.txt', skip_header=1, delimiter=',', dtype=float)
Note: If you have noticed, the fourth column has ‘text’ data. But
numpy arrays
can only hold data of same type. Don’t we get error? Print the iris
variable and check yourself.2. Drop column whose index = 4 from the array
iris
.>>> # Note that we have 30 rows of data in iris.txt
>>> # The following statement means that
>>> # starting from row index 0 to 30 (30 is not included)
>>> # Consider all the rows
>>> # and consider the columns with indices 0, 1, 2, 3, 5
>>> # drop column 4
>>> iris = iris[0:30, [0, 1, 2, 3, 5]]
3. Display the shape, dimensions and size of
iris
.>>> iris.shape
(30, 5)
>>> iris.ndim
2
>>> iris.size
150
4. Split iris into three 2-D arrays, each array for a different species. Call them
iris1
, iris2
, iris3
.>>> # Split into three arrays,
>>> # each array for a different species
>>> iris1, iris2, iris3 = np.split(iris, [10,20], axis=0)
5. Print the three arrays
iris1
, iris2
, iris3
>>> # Print the three arrays
>>> iris1
array([[5.1, 3.5, 1.4, 0.2, 1. ],
[4.9, 3. , 1.4, 0.2, 1. ],
[4.7, 3.2, 1.3, 0.2, 1. ],
[4.6, 3.1, 1.5, 0.2, 1. ],
[5. , 3.6, 1.4, 0.2, 1. ],
[5.4, 3.9, 1.7, 0.4, 1. ],
[4.6, 3.4, 1.4, 0.3, 1. ],
[5. , 3.4, 1.5, 0.2, 1. ],
[4.4, 2.9, 1.4, 0.2, 1. ],
[4.9, 3.1, 1.5, 0.1, 1. ]])
>>> iris2
array([[5.5, 2.6, 4.4, 1.2, 2. ],
[6.1, 3. , 4.6, 1.4, 2. ],
[5.8, 2.6, 4. , 1.2, 2. ],
[5. , 2.3, 3.3, 1. , 2. ],
[5.6, 2.7, 4.2, 1.3, 2. ],
[5.7, 3. , 4.2, 1.2, 2. ],
[5.7, 2.9, 4.2, 1.3, 2. ],
[6.2, 2.9, 4.3, 1.3, 2. ],
[5.1, 2.5, 3. , 1.1, 2. ],
[5.7, 2.8, 4.1, 1.3, 2. ]])
>>> iris3
array([[6.9, 3.1, 5.4, 2.1, 3. ],
[6.7, 3.1, 5.6, 2.4, 3. ],
[6.9, 3.1, 5.1, 2.3, 3. ],
[5.8, 2.7, 5.1, 1.9, 3. ],
[6.8, 3.2, 5.9, 2.3, 3. ],
[6.7, 3.3, 5.7, 2.5, 3. ],
[6.7, 3. , 5.2, 2.3, 3. ],
[6.3, 2.5, 5. , 1.9, 3. ],
[6.5, 3. , 5.2, 2. , 3. ],
[6.2, 3.4, 5.4, 2.3, 3. ]])
6. Create a 1-D array header having elements “sepal length”, “sepal width”, “petal length”, “petal width”, “Species No” in that order.
>>> header = np.array(["sepal length", "sepal width", "petal length", "petal width", "Species No"])
7. Display the array
header
.>>> print(header)
['sepal length' 'sepal width' 'petal length' 'petal width' 'Species No']
8. Find the max, min, mean and standard deviation for the columns of the iris and store the results in the arrays
iris_max
, iris_min
, iris_avg
, iris_std
, iris_var
respectively. The results must be rounded to not more than two decimal places.>>> # Stats for array iris
>>> # Finds the max of the data for sepal length, sepal width, petal length, petal width, Species No
>>> iris_max = iris.max(axis=0)
>>> iris_max
array([6.9, 3.9, 5.9, 2.5, 3. ])
>>> # Finds the min of the data for sepal length, sepal
>>> # width, petal length, petal width, Species No
>>> iris_min = iris.min(axis=0)
>>> iris_min
array([4.4, 2.3, 1.3, 0.1, 1. ])
>>> # Finds the mean of the data for sepal length, sepal
>>> # width, petal length, petal width, Species No
>>> iris_avg = iris.mean(axis=0).round(2)
>>> iris_avg
array([5.68, 3.03, 3.61, 1.22, 2. ])
>>> # Finds the standard deviation of the data for sepal
>>> # length, sepal width, petal length, petal width,
>>> # Species No
>>> iris_std = iris.std(axis=0).round(2)
>>> iris_std
array([0.76, 0.35, 1.65, 0.82, 0.82])
9. Similarly find the max, min, mean and standard deviation for the columns of the
iris1
, iris2
and iris3
and store the results in the arrays with appropriate names.>>> iris1_max = iris1.max(axis=0)
>>> iris1_max
array([5.4, 3.9, 1.7, 0.4, 1. ])
>>> iris2_max = iris2.max(axis=0)
>>> iris2_max
array([6.2, 3. , 4.6, 1.4, 2. ])
>>> iris3_max = iris3.max(axis=0)
>>> iris3_max
array([6.9, 3.4, 5.9, 2.5, 3. ])
>>> iris1_min = iris1.min(axis=0)
>>> iris1_min
array([4.4, 2.9, 1.3, 0.1, 1. ])
>>> iris2_min = iris2.min(axis=0)
>>> iris2_min
array([5. , 2.3, 3. , 1. , 2. ])
>>> iris3_min = iris3.min(axis=0)
>>> iris3_min
array([5.8, 2.5, 5. , 1.9, 3. ])
>>> iris1_avg = iris1.mean(axis=0)
>>> iris1_avg
array([4.86, 3.31, 1.45, 0.22, 1. ])
>>> iris2_avg = iris2.mean(axis=0)
>>> iris2_avg
array([5.64, 2.73, 4.03, 1.23, 2. ])
>>> iris3_avg = iris3.mean(axis=0)
>>> iris3_avg
array([6.55, 3.04, 5.36, 2.2 , 3. ])
>>> iris1_std = iris1.std(axis=0).round(2)
>>> iris1_std
array([0.28, 0.29, 0.1 , 0.07, 0. ])
>>> iris2_std = iris2.std(axis=0).round(2)
>>> iris2_std
array([0.36, 0.22, 0.47, 0.11, 0. ])
>>> iris3_std = iris3.std(axis=0).round(2)
>>> iris3_std
array([0.34, 0.25, 0.28, 0.2 , 0. ])
10. Check the minimum value for sepal length, sepal width, petal length and petal width of the three species in comparison to the minimum value of sepal length, sepal width, petal length and petal width for the data set as a whole and fill the table below with True if the species value is greater than the dataset value and False otherwise.
Iris setosa
Iris virginica
Iris versicolor
sepal length
sepal width
petal length
petal width
>>> # min sepal length of each species
# Vs the min sepal length in the data set
>>> iris1_min[0] > iris_min[0] #sepal length
False
>>> iris2_min[0] > iris_min[0]
True
>>> iris3_min[0] > iris_min[0]
True
>>> iris1_min[1] > iris_min[1] #sepal width
True
>>> iris2_min[1] > iris_min[1]
False
>>> iris3_min[1] > iris_min[1]
True
>>> iris1_min[2] > iris_min[2] #petal length
False
>>> iris2_min[2] > iris_min[2]
True
>>> iris3_min[2] > iris_min[2]
True
>>> iris1_min[3] > iris_min[3] #petal width
False
>>> iris2_min[3] > iris_min[3]
True
>>> iris3_min[3] > iris_min[3]
True
The following is the filled in table:
Iris setosa
Iris virginica
Iris versicolor
sepal length
False
True
True
sepal width
True
False
True
petal length
False
True
True
petal width
False
True
True
11. Compare Iris setosa’s average sepal width to that of Iris virginica.
>>> #Compare Iris setosa and Iris virginica
>>> iris1_avg[1] > iris2_avg[1] #sepal width
True
12. Compare Iris setosa’s average petal length to that of Iris virginica.
>>> iris1_avg[2] > iris2_avg[2] #petal length
False
13. Compare Iris setosa’s average petal width to that of Iris virginica.
>>> iris1_avg[3] > iris2_avg[3] #petal width
False
14. Save the array
iris_avg
in a comma separated file named IrisMeanValues.txt
on the hard disk.>>> np.savetxt('C:/NCERT/IrisMeanValues.txt', iris_avg, delimiter = ',')
15. Save the arrays
iris_max
, iris_avg
, iris_min
in a comma separated file named IrisStat.txt
on the hard disk.>>> np.savetxt('C:/NCERT/IrisStat.txt', (iris_max, iris_avg, iris_min), delimiter=',')