Chapter 8 Lab 8

8.1 Pre-class activity

8.1.1 Discrete or Continuous Datasets

This is a short tutorial with just a couple of quick questions on how the level of measurement can alter the way you tackle probability - i.e. whether the data is discrete or continuous. The four level of measurements are categorical (also called nominal), ordinal, interval and ratio. Discrete data only uses categories or whole numbers and is therefore either nominal or ordinal data. Continuous data can take any value, e.g. 9.00 or 9.999999999, and so is either interval or ratio data.

8.1.1.1 Quickfire Questions

Discrete data can only take integer values (whole numbers). For example, the number of participants in an experiment would be discrete - we can’t have half a participant! Discrete variables can also be further broken down into nominal/categorical and ordinal variables. Watch this video to help you understand the differences a little more and then try the following questions (https://youtu.be/6IdJ1aPFDCs).

Fill in the blanks in the below sentences using the words: ordinal, nominal, categorical.

  • data is based on a set of categories but the ordering doesn’t matter (e.g. left or right handed. This is sometimes referred to as data. For example, you could separate participants according to left or right handedness or by education level.

  • data is a set of ordered categories; you know which is the top/best and which is the worst/lowest, but not the difference between categories. For example, you could ask participants to rate the attractiveness of different faces based on a 5-item Likert scale (very unattractive, unattractive, neutral, attractive, very attractive).

Continuous data on the other hand can take any value. For example, we can measure age on a continuous scale (e.g. we can have an age of 26.55 years), also reaction time or the distance you travel to university every day.

In summary:

*Continuous data can be broken into interval or ratio data.

*Interval data is data which comes in the form of a numerical value where the difference between points is standardised and meaningful. For example temperature, the difference in temperature between 10-20 degrees is the same as the difference in temperature between 20-30 degrees.

*Ratio data is very like interval but has a true zero point. With our interval temperature example above, we have been experiencing negative temperatures (-1,-2 degrees) in Glasgow but with ratio data you don’t see negative values such as these i.e. you can’t be -10 cm tall.

When you read journal articles or when you are working with data in the lab, it is really good practice to take a minute or two to figure out the type of variables you are reading about and/or working with.

We have introduced these concepts in a little more depth as we are moving onto lab 8 and normal distribution probability so this will help with understanding this concept in relation to binomial probability and the different types of data involved.

Probability Recap Binomial distributions are based on scenarios where the outcome can only be 1 of 2 options e.g. heads or tails whereas normal distributions are based on continuous data e.g. height or age where you can score anywhere along a continuum. For example, you could describe someone as 21.5 years old and 176cm tall.

8.1.2 Normal distribution

A normal distribution looks like this:

Fig. 1 Normal Distribution of height

Fig. 1 Normal Distribution of height

8.2 Inclass activity

8.2.1 Normal Distribution Probability

In lab 7 and the pre lab for this lab you recapped and expanded on your understanding of probability, including a number of binom functions as well as some more basic ideas on probability and data types. You will need these skills to complete the following so please make sure you have carried out the pre class on types of data before attempting this. Remember to follow the instructions and if you get stuck at any point to ask the staff member and GTAs to talk you through it.

Staff will talk you through the mean and standard deviations in relation to a normal distribution. A normal distribution looks something like this:

Fig. 1 Normal Distribution of height

Fig. 1 Normal Distribution of height

With the binomial distribution in lab 7 we plotted the number of successes, heads or tails, we achieved with a certain number of throws so we use a bar plot e.g. how many heads we got between 1 and 10 tosses of the coin. This time with our height example above, we are plotting the number of people of each height along a continuum between two points, ~140cm to ~200cm. For this, we use a histogram which is the plot above.

8.2.2 Working with the Normal Distribution

We won’t ask you to create a normal distribution as it is more complicated than the binomial distribution you estimated in lab 7. Unlike coin flips, the outcome in the normal distribution is not just 50/50. Just as with the binomial distribution (and other distributions) there are functions that allow us to estimate the normal distribution and to ask questions about the distribution. These are:

dnorm()-the density function

pnorm()-the probability or distribution function

qnorm()-the quantile function

They work in a similar way to their binomial counterparts. If you are unsure about how a function works you can call the help on it by typing in the console, for example, ?dnorm or ?dnorm(), but the brackets aren’t essential for the help.

In this lab we will focus on pnorm() and qnorm(). You will have become acquainted with these terms in the pre lab materials but let’s quickly revist them together before working our way through some other examples.

So lets start using some code to calculate the probabilities of people being a certain height. The average height of a 16-24 year old Scotsman is 176.2 centimetres, with a standard deviation of 6.748. The average height of a 16-24 year old Scotswoman is 163.8 cm, with a standard deviation of 6.931. (Data from the [Scottish Health Survey, 2008] (http://www.scotland.gov.uk/Resource/Doc/286063/0087158.pdf)

8.2.3 pnorm() - the probability or distribution function

  1. What is the probability of meeting a 16-24 y.o. Scotswoman who is taller than the average 16-24 y.o. Scotsman?
  1. Fiona is a very tall Scotswoman (181.12cm) in the 16-24 y.o. range who refuses to date men who are shorter than her. What proportion of all 16-24 Scotsmen would she be willing to date?
  1. Out of 100,000 16-24 y.o. Scotswomen, how many would you predict would meet the height eligibility requirement to join the Royal Navy (at least 151.5cm tall)?

8.2.4 qnorm() - the quartile function

Let’s use our height example to work with qnorm:

  1. How tall would a 16-24 y.o. Scotsman have to be in order to be in the top 5% of the height distribution for Scotsmen in his age group?