Chapter 2 Lab 2

This is the first lab that the students will be introduced to R. The important point here is that R is introduced as a tool which will support their open science practices such as replicability and follows on from discussion in lab 1 about working with data effectively. We present the large messy data set in lab 1 to highlight the nature of work that students need to start thinking about before introducing R as the solution in lab 2.

2.1 Pre-class activities

2.1.1 Activity 1: Data Management

  • Watch Intro to RStudio video
  • Remember these key points as you work through the data activities on this course
  • RStudio is a tool that enables you to become confident and competent working with data
  • This is essential as a psychologist as you need to know that the data that findings are based on are reliable and valid. Imagine delivering a treatment that was based on research where the analyst was not confident or competent?

See this tweet from Dr Dale Barr on how applicable these skills will be for you in your future careers…

These are skills that you will use beyond our undergraduate course and that are desireable to employers. Just ask our previous students….

Think of a skill you nailed the first time you did it. Not so easy? You will learn to work with data by repeating the essential basic skills over and over. It’s new and will take time. Give yourself credit for trying and don’t stress if it doesn’t work right away. Use the resources we give, that are being provided online among the open science community, and ask questions in class and on the forums. We are here to help!

2.1.2 Activity 2: Setting the working directory.

What this means is that we tell R where the files we need are located. Think of it just like when you have different subjects, and you have seperate folders for each topic e.g. biology, history and so on. When working on R, it’s useful to have all the data sets and files you need in one folder. To set the working directory press session -> set working directory -> choose directory and then select the folder where the data sets we are working on are saved, and save this file in the same folder as well. In other words- make sure your data sets and scripts are all in the same folder. Follow the pre-lab video to help you with this too. In the labs, we recommend that you create a folder for Psychology labs with sub-folders within for each lab in your M: drive. This is your personal area on the University network that is safe and secure so is much better than flashdrives or desktops.

2.1.3 Important information on our homework worksheets

So that we can see your progress in data management skills we are using a worksheet format for the homework only called R Markdown (abbreviated as Rmd) which is a great way to create dynamic documents with embedded chunks of code. These documents are self-contained and fully reproducible which makes it very easy to share. This is an important part of your open science training as one of the reasons we are using RStudio is that it enables us to share open and reproducible information. The reason we are using these worksheets in this format is so that you are given a task and then can fill in the required code.

For more information about R Markdown feel free to have a look at their main webpage sometime http://rmarkdown.rstudio.com. The key advantage of R Markdown is that it allows you to write code into a document, along with regular text, and then ‘knit’ it using the package knitr to create your document as either a webpage (HTML), a PDF, or Word document (.docx). Before submitting your homework file check that it knits OK to html then submit the .Rmd file, not the knitted html version

You won’t be creating webpages or other document types (unless you want to!) just yet but we want to tell you a little more about the format of these worksheets so you know how and why we are using them. It enables you to enter code after class for assessment. We can then use a program to mark these and return feedback to you. This is right in line with our ethos of open and reproducible science enabling us to show you a skill, let you try it, and then enable you to show us how good you are at it!

There are just a couple of important rules we need you to follow to make sure this all runs smoothly.

  1. These worksheets need to you fill in your answers and not change any other information. For example, if we ask you to replace NULL with your answer, only write in the code you are giving as your answer and nothing else. To illustrate -

Task 1 read in your data

The task above is to read in the data file we are using for this task - the correct answer is data <- read_csv(data.csv). You would replace the NULL with:

Solution to Task 1

This means that we can look for your code and if it is in the format we expect to see it in, we can give you the marks! If you decide to get all creative on us then we can’t give you the marks as ‘my_lab_Nov_2018.csv’ isn’t the filename we have given to you to use. So don’t change the file, variable or data frame names as we need these to be consistent.

  1. We will look for your answers within the boxes which start and end with ``` and have {r task name} in them e.g.

```{r tidyverse, messages=FALSE}

```

These are called code chunks and are the part of the worksheet that we can read and pick out your answers. If you change these in any way we can’t read your answer and therefore we can’t give you marks. You can see in the example above that the code chunk (the grey zone), starts and ends with these back ticks (usually found on top left corner of the keyboard). This code chunk has the ticks and text which makes it the part of the worksheet that will contain code. The {r tidyverse} part tells us which task it is (e.g., loading in tidyverse) and therefore what we should be looking for and what we can give marks for - loading in the package called tidyverse in the example above. If this changes then it won’t be read properly, so will impact on your grade.

2.1.4 Take home message

The easiest way to use our worksheets is to think of them as fill-in-the-blanks and keep the file names and names used in the worksheet the same. If you are unsure about anything then use the forums on Moodle and Slack to ask any questions and come along to the practice sessions. We have also made a short video about R Markdown and why we use it for our lab work which you can find in the prep section of lab 2 on Moodle. There is an R Markdown cheatsheet which will support you playing around with R Markdown which you can find here https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf.

2.2 In-class activities

Part of becoming a psychologist is asking questions and gathering data to enable you to answer these questions effectively. It is very important that we understand all aspects of the research process such as experimental design, ethics, data management and visualisation.

In this class, you will be learning how to develop reproducible scripts. This means scripts that completely and transparently perform some analysis from start to finish in a way that yields the same result for different people using the same software on different computers. And transparency is a key value of science, as embodied in the “trust but verify” motto. When you do things reproducibly, others can understand and check your work. This benefits science, but there is a selfish reason, too: the most important person who will benefit from a reproducible script is your future self. When you return to an analysis after two weeks of vacation, you will thank your earlier self for doing things in a transparent, reproducible way, as you can easily pick up right where you left off. This is relevant within the context of open science which is a big debate in the scientific community at the moment. Some classic psychological experiments have been found to not be replicable. See the links below for some discussion on this topic:

Study delivers bleak verdict on validity of psychology experiment results

Low replicability in psychological science

As part of your skill development, it is very important that you work with data so that you can become confident and competent in your management and analysis of data. In the labs, we will work with real data that has been shared by other researchers. When we are working with data we will use software called RStudio . Information about this software can be found here RStudio. R Studio is available on the computers in the lab but is free to download onto your own devices. Guidance on how to do this can be found RStudio download. Remember to watch the prep video on RStudio for a tour around the interface with Heather.

2.2.1 Getting our data ready to work with

Today in the lab we will be working on loading libraries (opening the apps in the prep video) required to work with our data and then loading our data into RStudio before getting it organised into a sensible format that relates to our research question.

OK, so what is our first step? Yup, thats right, we need to tell RStudio what packages we want to use and where to find all the files we need before we pull them in ready for work.

2.2.2 Activity 1: Loading in the package

2.2.4 Activity 3: Join the files together

2.2.5 Activity 4: Pull out variables of interest

Well done! You have started on your journey to become a confident and competent member of the open scientific community! To show us how competent you are you should now complete the assessment for this lab which follows the same instructions as this in-class activity but asks you to work with different variables. Always us the lab prep materials as well as what you do in class to help you complete the class assessments! Success through practice!