# Correlation

This page contains the NCERT Statistics for Economicsclass 11 chapter 6 Correlation from the book Statistics for Economics. You can find the solutions for the chapter 6 of NCERT class 11 Statistics for Economics, for the Short Answer Questions, Long Answer Questions and Projects/Assignments Questions in this page. So is the case if you are looking for NCERT class 11 Statistics for Economics related topic Correlation question and answers.
EXERCISES
1.
The unit of correlation coefficient between height in feet and weight in kgs is
(i)
kg/feet
(ii)
percentage
(iii)
non-existent ✔
Explanation: The correlation coefficient, denoted as {r}, measures the strength and direction of a linear relationship between two variables, in this case, height in feet and weight in kilograms. We know that the correlation coefficient has no unit. It is a pure number, implying that units of measurement are not part of {r}. Therefore, the correlation coefficient between height and weight, regardless of their individual units (feet for height and kilograms for weight), is a dimensionless value. Hence, the correct answer is (iii) non-existent.
2.
The range of simple correlation coefficient is
(i)
0 to infinity
(ii)
minus one to plus one ✔
(iii)
minus infinity to infinity
Answer: (ii) minus one to plus one
Explanation: The simple correlation coefficient, often represented as {r}, is a measure used to indicate the strength and direction of a linear relationship between two variables. By definition, the range of the simple correlation coefficient is between minus one and plus one, denoted as {-1 ≤ r ≤ 1}. This range signifies that the correlation coefficient can take any value within this interval, where -1 indicates a perfect negative linear relationship, +1 indicates a perfect positive linear relationship, and values closer to 0 indicate a weaker linear relationship. Therefore, the correct answer is (ii) minus one to plus one.
3.
If {r_{xy}} is positive the relation between {X} and {Y} is of the type
(i)
When {Y} increases {X} increases ✔
(ii)
When {Y} decreases {X} increases
(iii)
When {Y} increases {X} does not change
Answer: (i) When Y increases X increases
Explanation:When the correlation coefficient {r_{xy}} is positive, it indicates a direct or positive relationship between the two variables. This means that as one variable increases, the other variable also tends to increase. Conversely, as one variable decreases, the other tends to decrease as well. Therefore, the correct answer is (i) When {Y} increases {X} increases. This response is directly based on the information from the given passage.
4.
If {r_{xy} = 0} the variable {X} and {Y} are
(i)
linearly related
(ii)
not linearly related ✔
(iii)
independent
Explanation: When {r_{xy} = 0}, it indicates that there is no linear relationship between {X} and {Y}. This means that changes in one variable do not predict or correspond to changes in the other variable in a linear manner. However, it’s important to note that a zero correlation does not necessarily imply that the variables are completely independent in all respects; they may still have a non-linear relationship or be related through another variable. Thus, the correct answer is (ii) not linearly related.
5.
Of the following three measures which can measure any type of relationship
(i)
Karl Pearson’s coefficient of correlation
(ii)
Spearman’s rank correlation
(iii)
Scatter diagram ✔
Explanation: Among the given options:
1.
Karl Pearson’s coefficient of correlation is specifically designed for measuring the strength and direction of a linear relationship between two variables. It does not effectively measure non-linear relationships.
2.
Spearman’s rank correlation is used for measuring the strength and direction of a monotonic relationship between two ranked variables. While it’s more flexible than Karl Pearson’s coefficient, it still focuses on specific types of relationships (monotonic).
3.
Scatter diagram, on the other hand, is a graphical representation that can display a wide range of relationships between two variables, including linear, non-linear, and more complex patterns. It doesn’t provide a numerical measure like the other two, but it’s the most versatile in visually representing any type of relationship.
Therefore, the correct answer is (iii) Scatter diagram, as it can visually represent any type of relationship between two variables, whether linear, non-linear, or otherwise. This response is based on the principles outlined in the given passage.
6.
If precisely measured data are available the simple correlation coefficient is
(i)
more accurate than rank correlation coefficient ✔
(ii)
less accurate than rank correlation coefficient
(iii)
as accurate as the rank correlation coefficient
Answer: (i) more accurate than rank correlation coefficient
Explanation: We know that the simple correlation coefficient (often referred to as Karl Pearson’s coefficient of correlation) and Spearman’s rank correlation coefficient are used for different purposes and types of data.
When data are precisely measured, the simple correlation coefficient is generally more accurate than the rank correlation coefficient. This is because the simple correlation coefficient utilizes the exact values of the data, thereby providing a more accurate measure of the strength and direction of the linear relationship between two variables.
In contrast, Spearman’s rank correlation coefficient is used when the data are not measured precisely or when dealing with ordinal data. It ranks the data and then measures the strength and direction of the relationship based on these ranks. While useful in its context, it does not use the actual data values, making it less accurate than the simple correlation coefficient when precise data are available.
Hence, the correct answer is (i) more accurate than rank correlation coefficient.
7.
Why is {r} preferred to covariance as a measure of association?
The preference for using the correlation coefficient {r} over covariance as a measure of association between two variables can be understood through the following points:
1.
Standardization: The correlation coefficient {r} is a standardized measure. Unlike covariance, which depends on the units of X and Y, {r} is dimensionless/has no units. This standardization allows for a consistent and comparable measure across different datasets and variables.
2.
Range of Values: {r} has a fixed range from -1 to 1, making it easier to interpret. A value of -1 indicates a perfect negative linear relationship, +1 indicates a perfect positive linear relationship, and 0 implies no linear relationship. Covariance, on the other hand, can take any value, making its interpretation less intuitive.
3.
Relative Strength of Association: {r} indicates not only the direction but also the relative strength of the linear relationship between two variables. Covariance only indicates the direction of the relationship (positive or negative) but not the strength.
4.
Comparison Across Different Variables: Because of its standardized nature, {r} allows for the comparison of the strength of association between different pairs of variables, which is not possible with covariance due to its dependence on the scale of the variables.
5.
Wider Applicability: The correlation coefficient {r} is more widely applicable and understood in various fields like economics, statistics, and social sciences for assessing linear relationships.
These points highlight why {r} is often preferred over covariance as a measure of association.
8.
Can {r} lie outside the –1 and 1 range depending on the type of data?
No, the correlation coefficient {r} cannot lie outside the –1 to 1 range regardless of the type of data, due to the following reasons:
Fixed Range: The value of the correlation coefficient, denoted as {r}, is always confined within the range of -1 to +1. This is a fundamental property of {r} and holds true irrespective of the type or nature of the data being analyzed.
Interpretation of Values:
A value of -1 indicates a perfect negative linear relationship between two variables.
A value of +1 indicates a perfect positive linear relationship.
A value of 0 implies no linear relationship.
Calculation Error: If, in any statistical exercise, the calculated value of {r} falls outside this range, it usually indicates an error in the calculation process rather than a property of the data.
Applicable to All Data Types: This range of -1 to +1 applies to all types of data where {r} is used, whether the data are interval, ratio, or ordinal in nature.
Therefore, regardless of the type of data, {r} cannot exceed the limits of -1 and +1.
9.
Does correlation imply causation?
No, correlation does not imply causation. The reasons are as follows:
Correlation is Indicative, Not Conclusive: Correlation measures the strength and direction of a linear relationship between two variables, but it does not establish a cause-and-effect relationship between them.
Possibility of Spurious Correlations: There can be instances where two variables appear to be correlated, but their relationship is due to coincidence, or because they are both influenced by a third variable.
Need for Further Analysis: To establish causation, more in-depth analysis and experimentation are required. Correlation can be a starting point for such analysis but should not be used to draw conclusions about causality.
Example of Non-Causal Relationships: In economics, many correlated variables may not have a causal relationship. For instance, an increase in ice cream sales and an increase in drowning incidents might be correlated due to the season (summer) but eating ice cream doesn’t cause drowning.
Therefore, while correlation is a useful statistical tool to identify patterns and relationships, it should not be interpreted as evidence of causation.
10.
When is rank correlation more precise than simple correlation coefficient?
Rank correlation is more precise than the simple correlation coefficient in certain scenarios as specified below:
1.
Non-Quantitative Data: Rank correlation is more suitable for ordinal data or non-quantitative data where ranking is possible but precise measurement is not. This includes scenarios where data are qualitative in nature, such as preferences, grades, or levels of satisfaction.
2.
Non-Normal Distribution: If the data do not follow a normal distribution, rank correlation can provide a more accurate measure of association than the simple correlation coefficient, which assumes a linear relationship and is most effective with normally distributed data.
3.
Presence of Outliers: Rank correlation is less affected by outliers compared to the simple correlation coefficient. This is because rank correlation depends on the order of values, not their exact magnitude.
4.
Non-Linear Relationships: When the relationship between variables is monotonic but not linear, rank correlation can capture the strength and direction of the relationship more effectively than the simple correlation coefficient.
5.
Small Sample Sizes: For small datasets, rank correlation can sometimes provide a more reliable measure of the relationship between variables.
Example for Better Understanding: Consider a scenario where a psychology researcher is studying the relationship between stress levels (measured as ‘Low’, ‘Medium’, ‘High’) and the number of hours spent on leisure activities per week. Since stress levels are qualitative data and may not follow a normal distribution, using rank correlation would be more precise. This method would accurately reflect the monotonic relationship between stress levels and leisure hours without being affected by the non-quantitative nature of the stress level data.
In summary, rank correlation is more precise than the simple correlation coefficient in situations involving non-quantitative data, non-normal distributions, outliers, non-linear relationships, and small sample sizes.
11.
Does zero correlation mean independence?
No, a zero correlation does not necessarily mean that two variables are independent due to the following reasons:
Zero Correlation: A zero correlation, or {r = 0}, indicates that there is no linear relationship between the two variables. It means that changes in one variable do not predict changes in the other variable in a linear manner.
Independence: Independence is a stronger condition than zero correlation. Two variables are independent if the occurrence or the value of one variable does not affect the occurrence or value of another variable in any way.
Non-Linear Relationships: Even if two variables have zero linear correlation, they may still have some form of non-linear relationship. This means that they could be dependent in ways not captured by the correlation coefficient.
Other Factors: Sometimes, variables may appear to be independent (having zero correlation) due to the presence of other underlying variables affecting their relationship, or due to the nature of the data collected.
In summary, while zero correlation indicates a lack of linear relationship, it does not guarantee that the variables are completely independent of each other.
12.
Can simple correlation coefficient measure any type of relationship?
No, the simple correlation coefficient, commonly known as Karl Pearson’s coefficient of correlation, cannot measure any type of relationship due to the following reasons:
Linear Relationships Only: The simple correlation coefficient is designed to measure the strength and direction of a linear relationship between two variables. It is not effective in identifying or measuring non-linear relationships.
Quantitative Data: It is most suitable for variables that are quantitative and continuous. For qualitative or ordinal data, other types of correlation coefficients, like Spearman’s rank correlation, are more appropriate.
No Causation: The coefficient only measures correlation, not causation. It cannot be used to infer a cause-and-effect relationship between the variables.
Sensitivity to Outliers: This measure can be affected by outliers in the data, which can skew the results and lead to misleading interpretations.
In summary, while the simple correlation coefficient is a useful tool for measuring linear relationships between quantitative variables, it has limitations and cannot measure every type of relationship.
13.
Collect the price of five vegetables from your local market every day for a week. Calculate their correlation coefficients. Interpret the result.
Here is a sample dataset representing the prices (in ₹) per kg of five vegetables – Potatoes, Tomatoes, Onions, Carrots, and Spinach – for each day of a week:
Day
Tomato
Potato
Onion
Cabbage
Carrot
1
101
73
86
96
43
2
98
75
86
98
47
3
97
72
85
98
46
4
101
75
87
95
44
5
98
72
83
98
45
6
101
76
83
95
47
7
97
75
85
99
43
The correlation between the prices of Potatoes and Tomatoes can be calculated as follows:
Day
Price/kg
of
Tomatoes
{(X)}
Price/kg
of
Potatoes
{(Y)}
XY
X^2
Y^2
1
101
73
7373
10201
5329
2
98
75
7350
9604
5625
3
97
72
6984
9409
5184
4
101
75
7575
10201
5625
5
98
72
7056
9604
5184
6
101
76
7676
10201
5776
7
97
75
7275
9409
5625
Total
{∑X = 693}
{∑Y = 518}
{∑XY = 51289}
{∑X^2 = 68629}
{∑Y^2 = 38348}
Now, calculating the correlation coefficient {r} using the formula:
{r}
{= \dfrac{∑XY - \dfrac{(∑X)(∑Y)}{N}}{\sqrt{∑X^2 - \dfrac{(∑X)^2}{N}} × \sqrt{∑Y^2 - \dfrac{(∑Y)^2}{N}}}}
{= \dfrac{51289 - \dfrac{693 × 518}{7}}{\sqrt{68629 - \dfrac{693^2}{7}} \times \sqrt{38348 - \dfrac{518^2}{7}}}}
{= \dfrac{51289 - 51282}{\sqrt{68629 - 68607} × \sqrt{38348 - 38332}}}
{= \dfrac{7}{\sqrt{22} × \sqrt{16}}}
{= \dfrac{7}{4.69 × 4}}
{= \dfrac{1}{0.67 × 4}}
{= \dfrac{1}{2.68}}
≈ 0.373
The calculated correlation coefficient is approximately 0.373, indicating a weak positive linear relationship between the prices of tomatoes and prices of potatoes for this specific dataset.
Similarly, the correlation co-efficient between other pairs of vegetables can also be calculated. Leaving it as an exercise to the students/learners.
14.
Measure the height of your classmates. Ask them the height of their benchmate. Calculate the correlation coefficient of these two variables. Interpret the result.
Here’s the table with the necessary computed values for calculating the correlation coefficient, using heights in inches:
S.No.
Classmate
Height
{X}
Benchmate
Height
{Y}
{XY}
{X^2}
{Y^2}
1
67
60
4020
4489
3600
2
62
74
4588
3844
5476
3
72
67
4824
5184
4489
4
64
70
4480
4096
4900
5
74
62
4588
5476
3844
6
64
66
4224
4096
4356
7
68
65
4420
4624
4225
8
71
67
4757
5041
4489
9
71
63
4473
5041
3969
10
67
66
4422
4489
4356
Total
{∑X = 680}
{∑Y = 660}
{∑XY = 44796}
{∑X^2 = 46380}
{∑Y^2 = 43704}
Now, calculating the correlation coefficient {r} using the formula:
{r}
{= \dfrac{∑XY - \dfrac{(∑X)(∑Y)}{N}}{\sqrt{∑X^2 - \dfrac{(∑X)^2}{N}} × \sqrt{∑Y^2 - \dfrac{(∑Y)^2}{N}}}}
{= \dfrac{44796 - \dfrac{680 × 660}{10}}{\sqrt{46380 - \dfrac{680^2}{10}} \times \sqrt{43704 - \dfrac{660^2}{10}}}}
{= \dfrac{44796 - 44880}{\sqrt{46380 - 46240} × \sqrt{43704 - 43560}}}
{= \dfrac{-84}{\sqrt{140} × \sqrt{144}}}
{= \dfrac{-84}{11.83 × 12}}
{= \dfrac{-84}{141.99}}
≈ -0.59
The calculated correlation coefficient is approximately -0.59, indicating a moderate negative linear relationship between the heights of classmates and their benchmates in inches.
Note to students/learners: The above value of {r} is not universal. It depends on the data. So, if you actually measure the height of your classmates and benchmates, the value of {r} that you get might be totally different.
💡Note: When you’re taking the heights, see that you’re making the measurement of height in inches. If you do it in centimeters, you’ve to deal with bigger numbers in the calculations.
15.
List some variables where accurate measurement is difficult.
While many variables can be measured accurately, there are some where accurate measurement is quite challenging.
Variables with Difficult Accurate Measurement
1.
Psychological Traits: Measuring psychological traits like intelligence, stress levels, or happiness is challenging. These are abstract concepts and can vary greatly from person to person.
2.
Economic Well-being: Accurately measuring the economic well-being of an individual or a household is complex due to the various factors involved, such as income, wealth, living conditions, and personal perceptions.
3.
Quality of Life: This is a broad concept that includes physical health, family, education, employment, wealth, safety, security, freedom, religious beliefs, and the environment. Each aspect contributes differently to different individuals.
4.
Social Status: The social status of a person is influenced by various subjective factors like occupation, education, family background, and community perception, making it hard to measure accurately.
5.
Cultural Impact: The impact of culture on behavior or attitudes is difficult to quantify due to its abstract nature and the wide variations across different cultures.
6.
Customer Satisfaction: While businesses often try to measure customer satisfaction, it can be elusive due to differing expectations, experiences, and personal standards among customers.
7.
Environmental Quality: Measuring the quality of the environment, including air and water quality, biodiversity, and ecosystem health, involves complex factors and interactions that are challenging to quantify accurately.
8.
Happiness and Well-being: These are subjective and influenced by a wide array of factors, from health and income to relationships and personal beliefs.
Conclusion: The measurement of these variables often relies on surveys, questionnaires, and subjective self-reporting, which can introduce biases and inaccuracies. As a student, understanding these challenges helps in appreciating the complexities involved in the field of economics and statistics.
16.
Interpret the values of {r} as 1, –1 and 0.
The correlation coefficient, often represented as {r}, measures the strength and direction of a linear relationship between two variables. The values of {r} range from -1 to 1, and interpreting these values is crucial in understanding the nature of the relationship between the variables. Here’s how we interpret {r} when it is 1, -1, and 0:
Interpretation of Correlation Coefficient Values
1.
{r = 1:}
This indicates a perfect positive linear relationship between the two variables.
As one variable increases, the other variable increases in a perfectly proportional manner.
An example could be the relationship between a person’s height in centimeters and the same height measured in inches.
2.
{r = -1:}
This signifies a perfect negative linear relationship.
As one variable increases, the other variable decreases in a perfectly proportional manner.
For instance, the relationship between the speed of a vehicle and the time it takes to reach a fixed destination (assuming a constant path) can be an example. As speed increases, travel time decreases.
3.
{r = 0:}
This value indicates no linear relationship between the two variables.
The variables do not affect each other, and the changes in one variable do not predict any changes in the other.
An example could be the relationship between a person’s shoe size and their favorite food; there’s no linear correlation between these two variables.
Conclusion:

Understanding these interpretations is key to analyzing data in economics and statistics. It helps in making inferences about how one variable might change in response to another and in identifying relationships that are either perfectly aligned, inversely related, or completely independent.

17.
Why does rank correlation coefficient differ from Pearsonian correlation coefficient?
The rank correlation coefficient differ from Pearsonian correlation coefficient with respect to the following factors:
Factor
Rank Correlation Coefficient
Pearsonian Correlation Coefficient
Definition
Measures the correlation based on the ranks of data.
Measures the linear relationship between two variables.
Appropriate Data Type
Used for ordinal data or when only ranks are available.
Suitable for interval or ratio scale data.
Usage Scenario
Useful for qualitative data, non-linear relationships, or when the exact numbers are not critical.
Best suited for quantitative data with a linear relationship.
Sensitivity to Outliers
Less sensitive to outliers.
More sensitive to outliers.
Computation Method
Based on the difference in ranks of observations.
Involves actual data values, using mean and standard deviation.
Assumptions
Does not assume a linear relationship.
Assumes a linear relationship between variables.
This table highlights how each method is suited to different types of data and scenarios, providing us with versatile tools for statistical analysis.
18.
Calculate the correlation coefficient between the heights of fathers in inches {(X)} and their sons {(Y)}
X
65
66
57
67
68
69
70
72
Y
67
56
65
68
72
72
69
71
Here’s the table with the computed values for the correlation coefficient calculation between the heights of fathers {(X)} and their sons {(Y)}, along with the calculation of {r}:
S.No.
Father
Height
{(X)}
Son
Height
{(Y)}
{XY}
{X^2}
{Y^2}
1
65
67
4355
4225
4489
2
66
56
3696
4356
3136
3
57
65
3705
3249
4225
4
67
68
4556
4489
4624
5
68
72
4896
4624
5184
6
69
72
4968
4761
5184
7
70
69
4830
4900
4761
8
72
71
5112
5184
5041
Total
{∑X = 534}
{∑Y = 540}
{∑XY = 36118}
{∑X^2 = 35788}
{∑Y^2 = 36644}
Calculating the correlation coefficient {r} using the formula:
{r}
{= \dfrac{∑XY - \dfrac{(∑X)(∑Y)}{N}}{\sqrt{∑X^2 - \dfrac{(∑X)^2}{N}} × \sqrt{∑Y^2 - \dfrac{(∑Y)^2}{N}}}}
{= \dfrac{36118 - \dfrac{534 × 540}{8}}{\sqrt{35788 - \dfrac{534^2}{8}} × \sqrt{36644 - \dfrac{540^2}{8}}}}
{= \dfrac{36118 - \dfrac{288360}{8}}{\sqrt{35788 - \dfrac{285156}{8}} × \sqrt{36644 - \dfrac{291600}{8}}}}
{= \dfrac{36118 - 36045}{\sqrt{35788 - 35644.5} × \sqrt{36644 - 36450}}}
{= \dfrac{73}{\sqrt{143.5} × \sqrt{194}}}
{= \dfrac{73}{11.98 × 13.93}}
{= \dfrac{73}{166.85}}
≈ 0.44
The calculated correlation coefficient is approximately 0.44, indicating a moderate positive linear relationship between the heights of fathers and their sons. This suggests that there is a tendency for taller fathers to have taller sons, but the relationship is not very strong.
19.
Calculate the correlation coefficient between {X} and {Y} and comment on their relationship:
X
–3
–2
–1
1
2
3
Y
9
4
1
1
4
9
Here’s the calculation table for the correlation coefficient between {X} and {Y}, along with the calculation of {r}:
S.No.
{X}
{Y}
{XY}
{X^2}
{Y^2}
1
-3
9
-27
9
81
2
-2
4
-8
4
16
3
-1
1
-1
1
1
4
1
1
1
1
1
5
2
4
8
4
16
6
3
9
27
9
81
Total
{∑X = 0}
{∑Y = 28}
{∑XY = 0}
{∑X^2 = 28}
{∑Y^2 = 196}
Calculating the correlation coefficient {r} using the formula:
{r}
{= \dfrac{∑XY - \dfrac{(∑X)(∑Y)}{N}}{\sqrt{∑X^2 - \dfrac{(∑X)^2}{N}} × \sqrt{∑Y^2 - \dfrac{(∑Y)^2}{N}}}}
{= \dfrac{0 - \dfrac{0 × 28}{6}}{\sqrt{28 - \dfrac{0^2}{6}} × \sqrt{196 - \dfrac{28^2}{6}}}}
{= \dfrac{0 - 0}{\sqrt{28 - 0} × \sqrt{196 - \dfrac{28^2}{6}}}}
{= \dfrac{0}{\sqrt{28} × \sqrt{196 - \dfrac{28^2}{6}}}}
= 0
The calculated correlation coefficient is 0, indicating that there is no linear relationship between {X} and {Y} in this dataset. In practical terms, changes in {X} do not predict any changes in {Y}, suggesting that these variables are independent of each other.
20.
Calculate the correlation coefficient between {X} and {Y} and comment on their relationship
X
1
3
4
5
7
8
Y
2
6
8
10
14
16
Here’s the calculation table for the correlation coefficient between {X} and {Y}, along with the calculation of {r}:
S.No.
{X}
{Y}
{XY}
{X^2}
{Y^2}
1
1
2
2
1
4
2
3
6
18
9
36
3
4
8
32
16
64
4
5
10
50
25
100
5
7
14
98
49
196
6
8
16
128
64
256
Total
{∑X = 28}
{∑Y = 56}
{∑XY = 328}
{∑X^2 = 164}
{∑Y^2 = 656}
Calculating the correlation coefficient {r} using the formula:
{r}
{= \dfrac{∑XY - \dfrac{(∑X)(∑Y)}{N}}{\sqrt{∑X^2 - \dfrac{(∑X)^2}{N}} × \sqrt{∑Y^2 - \dfrac{(∑Y)^2}{N}}}}
{= \dfrac{328 - \dfrac{28 × 56}{8}}{\sqrt{164 - \dfrac{28^2}{8}} × \sqrt{656 - \dfrac{56^2}{8}}}}
{= \dfrac{328 - \dfrac{1568}{8}}{\sqrt{164 - \dfrac{784}{8}} × \sqrt{656 - \dfrac{3136}{8}}}}
{= \dfrac{328 - 196}{\sqrt{164 - 98} × \sqrt{656 - 392}}}
{= \dfrac{132}{\sqrt{66} × \sqrt{264}}}
{= \dfrac{132}{\sqrt{66 × 264}}}
{= \dfrac{132}{\sqrt{17424}}}
{= \dfrac{132}{132}}
= 1
The calculated correlation coefficient is 1.0, indicating a perfect positive linear relationship between {X} and {Y}. This means that as {X} increases, {Y} increases in a perfectly proportional manner.