Understanding Data

This page contains the NCERT Informatics Practices class 11 chapter 5 Understanding Data. You can find the solutions for the chapter 5 of NCERT class 11 Informatics Practices Exercise. So is the case if you are looking for NCERT class 11 Informatics Practices related topic Understanding Data questions and answers for the Exercise
Exercise
1. Identify data required to be maintained to perform the following services:
a)
Declare exam results and print e-certificates
b)
Register participants in an exhibition and issue biometric ID cards
c)
To search for an image by a search engine
d)
To book an OPD appointment with a hospitalina a specific deprpartment
The data required to be maintained to perform the given services is as follows:
S.No.
Service
Data Required
a)
Declare exam results and print e-certificates
List of student names and roll numbers
Marks obtained by each student in different subjects
Pass marks and maximum marks for each subject
Certificate template design and format
Student Photographs

b)
Register participants in an exhibition and issue biometric ID cards
Participant’s name, contact details and address. Also, information about whether they’re any VIP visitor.
Exhibition details such as location, date and time
Biometric information such as fingerprint or facial recognition
ID card format and design

c)
To search for an image by a search engine
An index for all images available for search
Tags and labels associated with each image
Search queries entered by the user
User’s search history and preferences
Information about the user’s device and location

d)
To book and OPD appointment with a hospital in a specific department.
Patient’s personal details such as name, age, contact details etc.
Patient’s medical history, current condition and diagnosis
Doctor’s availabilty and schedule
Hospital’s department, location and contact details
Appointment booking confirmation and reminders

2. A school having 500 students wants to identify beneficiaries of the merit-cum means scholarship, achieving more than 75% for two consecutive years and having family income less than 5 lakh per annum.
Briefly describe data processing steps to be taken by the to beneficial prepare the list of school.
To prepare the list of students who are eligible for the merit-cum means scholarship, the school would need to perform the following data processing steps:
i.
Collect and maintain data on all students’ academic performance for the last two years, including their grades and attendance records.
ii.
Collect and maintain data on each student’s family income,which may involve requesting income-related documents from parents/guardians.
iii.
Identify students who have achieved more than 75% for two consecutive years and have a family income less than 5 lakh per annum.
iv.
Compile a list of eligible students and verify the accuracy of the data.
v.
Notify eligible students and their parents/guardians about the scholarship, and request any additional documentation or information as required.
vi.
Prepare the necessary paperwork for disbursing the scholarship, including creating a payment schedule and verifying payment information.
3. A bank ‘xyz’ wants to know about its popularity among the residents of a city ‘ABC’ on the basis of number of bank accounts each family has and the average monthly account balance of each person. Briefly describe the steps to be taken for collecting data and what results can be checked through processing of the collected data.
To collect data on the bank’s popularity among the residents of the city, the following steps can be taken:
1.
Sampling: A representative sample of the population in the city can be selected randomly to collect the required data.
2.
Data collection: The data can be collected by conducting surveys or questionnaires among the selected sample. The surveys can include questions about the number of bank accounts each family has in the bank ‘xyz’ and the average monthly account balance of each person.
3.
Data entry: The collected data can be entered into a database or spreadsheet for further processing.
4.
Data processing: The collected data can be analyzed using statistical tools and techniques to identify trends and patterns. The bank can check the following results from the processing of the collected data:
The percentage of families in the city who have accounts with bank ‘xyz’.
The average number of bank accounts per family in the city.
The average monthly account balance of each person with bank ‘xyz’.
The distribution of accounts and balances across different age groups, income levels, and other demographic factors.
The growth rate of accounts and balances over time.
Based on the results, the bank can make strategic decisions to improve its popularity among the residents of the city. For example, if the bank finds that it has a low percentage of accounts among families in the city, it can launch promotional campaigns to attract more customers. If the bank finds that the average monthly balance of each person is low, it can introduce new savings products or introduce additional benefits to the account holders to encourage customers to save more.
4. Identify type of data being collected/generated in the following scenarios:
a)
Recording a video
b)
Marking attendance by teacher
c)
Writing tweets
d)
Filling an application form online
The following is the type of data collected/generated in the given scenarios:
S.No.
Scenario
Type of data collected/generated
a)
Recording a video
Audio/Visual or Multimedia Data
b)
Marking attendance by teacher
Student attendance data
c)
Writing tweets
Social media text data
d)
Filling an application form online
User Input which is text/image/document data
5. Consider the temperature (in Celsius) of 7 days of a week as 34, 34, 27, 28, 27, 34, 34. Identify the appropriate statistical technique to be used to calculate the following:
a)
Find the average temperature.
b)
Find the temperature Range of that week.
c)
Find the standard deviation temperature.
a. Finding the average temperature:
To find the average temperature, we can use the measure of central tendency ‘mean’, also known as ‘average’.
Mean can be found using the formula
Mean
{= \dfrac{\text{Sum of Values}}{\text{No. of Values}}}
{= \dfrac{34 + 34 + 27 + 28 + 27 + 34 + 34}{7}}
{= \dfrac{218}{7}}
31.14°C
b. Finding the temperature Range of that week:
To find the temperature range of that week, we need to use the measure of variability/dispersion ‘Range’, which is defined as the difference between the maximum and minimum values of the data. So,
Temperature Range
=
Maximum Temperature – Minium Temperature
=
34°C – 27°C
=
7°C
To find the standard deviation temperature:
The standard deviation can be calculated as follows:
Temperature (x)
in °C
{x - \bar{x}}
{(x - \bar{x})^2}
34
2.86
8.1796
34
2.86
8.1796
27
-4.14
17.1396
28
-3.14
9.8596
27
-4.14
17.1396
34
2.86
8.1796
34
2.86
8.1796
{n = 7}
{\bar{x} = 31.14°\text{C}}
{\sum(x - \bar{x})^2 = 76.8572}
Now, the Standard Deviation σ can be calculated as follows:
σ
{= \sqrt{\dfrac{\sum(x - \bar{x})^2}{n}}}
{= \sqrt{\dfrac{76.8572}{7}}}
{= \sqrt{10.9796}}
3.31
6. A school teacher wants to analyse results. Identify the appropriate statistical technique to be used along with its justification for the following cases:
a)
Teacher wants to compare performance in terms of division secured by students in Class XII A and Class XII B where each class strength is same.
b)
Teacher has conducted five unit tests for that class in months July to November and wants to compare the class performance in these five months.
a)
To compare the performance interms of division secured by students in Class XII A and Class XII B where each class strength is the same, the appropriate statistical technique would be the mean of the division secured. This is because, the mean provides the average value of the data and can help in comparing the central tendencies of the two classes.
b)
In this case also, the appropriate statistical technique to be used is the calculation of the mean for the class performance in the five months, as it provides the average value of hte data and can help in comparing the central tendencies of the class perforance over the five months.
Along with the mean, the teacher can also use the Standard Deviation to see that which class has minimum deviation from the mean. A lower deviation from the mean indicates that the students are close to the mean value. A higher standard deviation indicates that there are more students who are either securing either far less or far better marks than average.
7. Suppose annual day of your school is to be celebrated. The school has decided to felicitate those parents of the students studying in classes XI and XII, who are the alumni of the same school. In this context, answer the following questions:
a)
Which statistical technique should be used to find out the number of students whose both parents are alumni of this school?
b)
How varied are the age of parents of the students of that school?
a)
For finding out the number of students whose both parents are alumni of the school, the appropriate statistical technique would be to calculate the mode of the number of alumni parents. This would give the most commonly occuring number of parents who are alumni of the school.
b)
To determine the variability of the age of parents of the students of that school, the appropriate statistical technique would be to calculate the standard deviation of the ages. This would give an idea of how spread out the ages are from the average age.
Additional Explanation for case (a): The mode is the statistical measure that identifies the most frequent observation or value in a dataset. In this case, we can assume that the number of students whose both parents are alumni of the same school will be relatively small compared to the total number of students in class XI and XII. Therefore, we can expect that there will be only few students whose both parents are alumni.
Using the mode to find the number of such students is appropriate because it allows us to identify the most frequently occurring value in the dataset. In this case, the most frequently occurring value will be “0” becasue most of the students’ parents will not be alumni of the same school. The mode will help us to identify the number of students whose both parents are alumni of the same school as the count of observations with a value of “2” (i.e., both parents are alumni),assuing that there are not many outliers or unusual values in the dataset.
8. For the annual day celebrations, the teacher is looking for an anchor in a class of 42 students. The teacher would make selection of an anchor on the basis of singing skill, writing skill, as well as monitoring skill.
a)
Which mode of data collection should be used?
b)
How would you represent the skill of students as data?
a)
The mode of data collection that can be used in this case is direct observation or self-reporting. Direct observation can be done by the teacher or any other expert who can judge the students’ skills in singing, writing and monitoring. Self-reporting can be done by asking the students to rate their own skills in these areas.
b)
The skills of the students can be represented as quantitative data. The singing and writing skills can be represented on a scale of 1 to 10, where 1 represents the lowest level of skill and 10 represents the highest level of skill. The monitoring skill can be represented on a scale of 1 to 5, where 1 represents poor monitoring skills and 5 represents excellent monitoring skills. The data for each student can be recorded in a table or a spreadsheet with columns for name, singing skill, writing skill, and monitoring skill. This data can then be used to compare the students and select the anchor for the annual day celebrations.
9. Differentiate between structured and unstructed data giving one example.
The following is the differentiation between structured and unstructured data.
Characteristic
Structed Data
Unstructured Data
Definition
It is the data which is organised and can be recorded in a well defined format.
It is the data which is neither organized nor have a pre-defined format. Due to this nature, it sometimes requires to use metadata to describe the unstructured data.
Organization
Organized in a tabular or hierarchical or networked format.
Not organized in any specific format.
Access
Easy to access and query using SQL or other programming language
Difficult to access and query due to lack of structure.
Scalability
Limited scalability due to pre-defined schema.
Highly scalable due to lack of structure.
Analysis
Suitable for quantitative analysis and statistical processing.
Suitable for qualitative analysis and natural language processing
Storage
Requires less storage space as data is organized and compressed
Requires more storage space as data is unorganized and uncompressed
Pocessing
Efficient for machine processing and automation.
Inefficient for machine processing and requires human intervention.
Accuracy
Highly accurate due to predefined structure and validation rules.
Less accurate due to lack of structure and potential for errors.
Security
Easier to secure as data is organized and access can be controlled at a granular level.
Difficult to secure as data is unstructured and access is mroe difficult to control.
Examples
Relational databases, spreadsheets, XML files etc.
Emails, Web Pages, Social Media posts, Images, audio and video files etc.
10. The principal of a school wants to do following analysis on the basis of food items procured and sold in the canteen:
a)
Compare the purchase and sale price of fruit juice and biscuits.
b)
Compare sales of fruit juice, biscuits and samosa.
c)
Variation in sale price of fruit juices of different companies for same quantity (in ml).
Create an appropriate dataset for these items (fruit juice, biscuits, samosa) by listing their purchase price and sale price. Apply basic statistical techniques to make the comparisons.
The following is the dataset for these items (fruit juice, biscuits, samosa) where in their purchase price and sales price are listed:
Food Item
Company
Purchase Price
(per unit)
Sale Price
(per unit)
No. of Units sold.
Fruit Juice
Fresh
20
30
120
Fruit Juice
Tasty
12
22
150
Fruit Juice
Pure
8
18
100
Biscuits
Britannia
5
10
300
Biscuits
Parle
4
9
250
Biscuits
Sunfeast
6
12
200
Samosa
Homemade
15
25
80
Samosa
Haldiram
18
28
120
Samosa
Bikanervala
20
30
100
a) To Compare the purchase and sale price of fruit juice and biscuits:
To compare the purchase and sale price of fruit juice and biscuits, we need to consider their mean or average.
Average Purchase Price of Fruit Juice
{= \dfrac{(20 × 120) + (12 × 150) + (8 × 100)}{120 + 150 + 100}}
{= \dfrac{2,400 + 1,800 + 800}{370}}
{= \dfrac{5,000}{370}}
= 13.51
Average Sale Price of Fruit Juice
{= \dfrac{(30 × 120) + (22 × 150) + (18 × 100)}{120 + 150 + 100}}
{= \dfrac{3,600 + 3,300 + 1,800}{370}}
{= \dfrac{8,700}{370}}
= 23.51
Average Purchase Price of Biscuits
{= \dfrac{(5 × 300) + (4 × 250) + (6 × 200)}{300 + 250 + 200}}
{= \dfrac{1,500 + 1,000 + 1,200}{750}}
{= \dfrac{3,700}{750}}
= 4.93
Average Sale Price of Biscuits
{= \dfrac{(10 × 300) + (9 × 250) + (12 × 200)}{300 + 250 + 200}}
{= \dfrac{3,000 + 2,250 + 2,400}{750}}
{= \dfrac{7,650}{750}}
= 10.2
From the above, we see that the average sale price is more than the average purchase price in case of both fruit juices and biscuits. So, we can conclude that the school canteen is able to make profits in both the cases (of selling fruit juices and biscuits).
b) To compare the sale of fruit juiice, biscuits and samosa:
The following is the total number of items sold, in case of each of the foot items:
Fruit Juice
= 120 + 150 + 100
= 370
Biscuits
= 300 + 250 + 200
= 750
Biscuits
= 80 + 120 + 100
= 300
The mode of these three categories of food items is 750 which is for biscuits. So, we can conclude that the canteen is able to sell more biscuits as compared to any other food items in the canteen.
Average Purchase Price of Samosa
{= \dfrac{(15 × 80) + (18 × 120) + (20 × 100)}{80 + 120 + 100}}
{= \dfrac{1,200 + 2,160 + 2,000}{300}}
{= \dfrac{5,360}{300}}
= 17.87
Average Sale Price of Samosa
{= \dfrac{(25 × 80) + (28 × 120) + (30 × 100)}{80 + 120 + 100}}
{= \dfrac{2,000 + 3,360 + 3,000}{300}}
{= \dfrac{8,360}{300}}
= 27.87
The following will be the average profit margin on each of these items.
Fruit Juice
= 23.51 – 13.51
= 10
Biscuits
= 10.20 – 4.93
= 5.27
Samosa
= 27.87 – 17.87
= 10
Thus, the profit margin on the sale of Fruit Juice and Samosa is equal and highest and the profit margin on the sale of Biscuits is lowest.
The following is the total profit in each of the categories:
Item
Profit Margin
No. Items Sold
Profit
Fruit Juice
10
370
3,700.00
Biscuits
5.27
750
3,952.50
Samosa
10
300
3,000.00
If we use mode, we see that the sale of biscuits has earned more profit and the least being Samosa.
c) Variation in sale price of fruit juices of different companies for same quantity (in ml):
From the table, the Range of the sale prices is as follows:
Range
= 22 – 18
= 4
Thus, the variation of the sale price has a range of 4.
The following is the standard deviation of the sale price of fruit juice from various companies.
Company
Sale Price
{x - \bar{x}}
{(x - \bar{x})^2}
Fresh
20
0
0
Tasty
22
2
4
Pure
18
-2
4
{n = 3}
{\bar{x} = 20}
{\sum{(x - \bar{x})^2} = 16}
Now, the Standard Deviation σ can be calculated as follows
σ
{= \sqrt{\dfrac{\sum{(x - \bar{x})^2}}{n}}}
{= \sqrt{\dfrac{16}{3}}}
{= \sqrt{5.33}}
≅ 2.31
Thus, the variation in the sale price of fruit juices from different companies is 2.31.