Relationships Between Variables

Instructions

Please answer each of the following questions in order. There are two questions in total.

You may write your answers neatly by hand or type them up, or a mix of the two. You will need to submit your

assignment as a PDF file. You can use your phone to scan written work and convert to PDF.

You may use any software or technology to answer the questions below. You may also work by hand. I recommend

(but do not require) that you use Microsoft Excel.

Your assignment must be submitted on Moodle. Emailed submissions will not be accepted.

Questions

1. Is there a relationship between gender and preferred ice cream flavour? A study surveyed 200 high

school students and collected their scores on various tests, as well as their gender and preferred ice

cream flavour out of three options given: chocolate, strawberry, and vanilla. The results are

summarized in the table below.

Chocolate Strawberry Vanilla Total

Female 32 29 48 109

Male 15 29 47 91

Total 47 58 95 200

a. Create a table of sample proportions. It is recommended that you check that the table total

equals 100%.

Chocolate Strawberry Vanilla Total

Female

Male

Total 100%

b. What proportion of female students preferred chocolate ice cream? How does this compare

to the proportion of male students who preferred chocolate?

c. Fill out the table of expected values for the chi-squared statistic. Retain 3 decimal places of

accuracy.

Chocolate Strawberry Vanilla Total

Female 109

Male 91

47 58 95 200

d. What is the value of the chi-squared statistic? Which factor (combination of gender and

flavour) contributed the most to this sum?

e. State the appropriate hypotheses for this chi-squared test.

f. Find the appropriate degrees of freedom and calculate the p-value. At the 𝛼 = 0.05 level,

do you find that there is evidence of a relationship? Give a conclusion that would make

sense to a non-statistician.

2. For this question, use the cereal data set.

a. Create a scatterplot of the data, plotting “calories” on the horizontal (𝑥) axis and “fibre” on

the vertical (𝑦) axis. Make sure that you adhere to best practices for graphs: descriptive title,

axis labels including units, and sensible axis scale.

b. Comment on the strength and direction of this relationship. Calculate the correlation

coefficient 𝑟 (you can use =CORREL in Excel). How does the value of this coefficient affirm

what you said about the strength and direction of the relationship?

c. Consider the data point at (120, 6). Without calculating the residual or the regression line,

do you expect the residual for this data point to be positive or negative? Why?

d. Calculate the equation of the regression line as follows.

i. Find 𝑥̅and 𝑦̅.

ii. Find 𝑠𝑥 and 𝑠𝑦.

iii. Find the slope using the equation 𝑏 = 𝑟 (

𝑠𝑦

𝑠𝑥

). (You found 𝑟 in part (b).)

iv. Find the intercept using the equation 𝑎 = 𝑦̅ − 𝑏𝑥̅.

v. State the equation in the form 𝑦̂ = 𝑎 + 𝑏𝑥.

e. Going back to the data point at (120, 6), find the value of the residual and confirm your