1 Welcome and Introducing a Tidy Workflow

Today we are going make sure everyone is set-up with the same software and R packages that will be used across this workshop series. Then, we will work through “tidy” principles for project organisation (including how to set-up an R Project), data management, and coding in R. These are skills that are critical to support open science and facilitate effective and productive collaborations.

A handout of the slides from the presentation for this workshop can be found in the attached PDF:

Download workshop-1-presentation-handout.pdf

1.1 Let’s Get Set Up!

This workshop relies on using software that requires a bit of preparation at the beginning to make sure we are all on the same page. You may have all or some of these installed, but the person next to you may not. Let’s take some time to run through a checklist step-by-step of what needs to be installed. Please be patient as we work to get everyone ready to start this workshop series.

  1. You will need software to work with spreadsheets. I’d recommend Excel, which MtA offers to everyone as part of Office365 available on the Mount Allison website. Sheets for MacOs or Google Sheets are probably fine too if you are more comfortable with these, but our team will have less capacity to help you.

  2. Install the latest version of R for your operating system.

  3. Install R Studio Desktop for your operating system. There are many ways to use R and you might have your own preference. But, RStudio is very user-friendly and will be using it for this workshop series.

  4. Depending on your operating system (Windows), it is likely that you will also need to install RTools to compile new R packages. Please install it now.

  5. The last software we need to install is LaTex, which facilitates the creation of PDF documents in R Markdown. There are a myriad of options, including MiKTeX, MacTeX, and TeX Live. I suggest that you use TinyTeX, which you can install with the R package tinytex (Xie 2024c) by running this code in the Console:

install.packages('tinytex')
tinytex::install_tinytex()

You may run into issues, depending on how your operating system is set-up. The TinyTeX website has instructions for each operating system. It may be possible that you will need to manually install one of types of software that facilitates LaTex too. If that seems to be the case, you can access these for install at this website.

  1. After installation of all these programs, etc. it is very important to restart your computer. A life tip - a new program won’t be initiated to work on your computer if you don’t restart it after it is installed! This is a critical part of the process.

  2. After your computer restarts, now please open R Studio. We are going to be making use of some R packages repeatedly. So, let’s just install them now, using the code below:

install.packages(c('tidyverse', 'rmarkdown', 'ggplot2', 
                   'scales', 'lazyWeave'))


Please note, that we will do our best to help with ‘debugging’ if things are not working correctly on your computer. But, you might have issues you need to work on during your own time. Here are a few steps that you can take to help:

  • Read the documentation. If it is for a package or piece of software be sure to read and follow the instructions carefully.

  • Google it. Google is your best friend. Google the package or piece of software, google the specific error message, and find those message board threads where people go to complain and find answers to their problems (likely GitHub and Stack Overflow). Go and find a solution for yourself!


We will also be making extensive use of R markdown in these workshops. Here are some resources specifically about R markdown, that are always helpful to keep on hand while coding:


1.2 Using TIDY Practices in your Open Science Projects

Now that we are all set-up, we are going to focus on why organisation is important at multiple scales of your project. Although it is likely not the most exciting or complex topic, ensuring your research project files, code, and data are in a consistent, expected format facilitates effective use of your time and productive collaborations. Tidy, organised projects are the foundation of reproducible science. This is why we are starting with this topic because it is the basis on which we will build the rest of our skills in this workshop series.

There are also many online resources that are helpful on these topics, and here is a list of a few of them:

1.2.1 Plan for the Workshop

Today’s workshop is going to be a mix of lecture and activities. You have already downloaded the slides for the lecture, and after I introduce the concept of “tidy projects” we will start working on some activities. These are outlined below (on this website), and also on the presentation slides and within an editable Rmarkdown document. We will set-up that R markdown document together and I encourage you to use that file to take notes and work on the activities.

1.2.2 Tidy Projects: Activities and Discussion

First, let’s make our own R project to work on the activities that will be happening during this workshop series. So, together, let’s open R Studio. Then, we will create new R project and save it somewhere safe on our computers. Last, let’s download the Rmarkdown file below and save it within the R project folder we just created:

Download workshop-1-Rmarkdown-worksheet.Rmd


Awesome! Let’s open up that R markdown document together within our R project. You can see there are notes from today’s workshop in this document. I will walk you through how to knit the document, and view it in a nicely formatted version. You will see there are places for you to fill in your own notes on the activities are going to do today. Feel free to fill it in as we go!

Now, let’s discuss as a group the concept of tidy projects a little.

  1. Have a think and list all stakeholders that may conceivably have an interest in the outcomes of your research. That is, who will possibly benefit from your robust, reproducible science, in what way, and why?

  2. Have a think & jot down down one potential positive consequences and one potential negative consequences of conducting open, reproducible research. Then we will chat about them as a group.

Next, please take a bit of time to create a tidy project template for a real or hypothetical research project you are working on. Take some time to think about the common ‘ingredients’ of your project because that will help you create a flexible, generic project structure. Make sure to include a description for your files and let’s make use of R markdown’s list format to show any nesting of folders or files.

1.2.3 Tidy Code: Activity about Using the R package stylr

Install the R package styler. Refer to the code chunk above where we installed R packages. Please add a new code chunk below, and install styler and load it into the R environment by adding a line of code that says library(styler). If you need help, please put your RED POST-IT on top of your laptop, and try to Google solutions until one of us arrives to help.

Run the chunk of R code below. It works! But it is hard to read and doesn’t follow our ‘tidy code’ ideals.

x <- rnorm(15);mean(x);hist(x)

The R package styler’s default style transformation is the tidyverse_style(), which is what we will use in this workshop. Use it to fix the below R code chunk. To do this, select the text you want to fix. Then, click on the button titled “Addins” within R Studio. In the dropdown menu, click on “Style selection”. Run the chunk of R code again.

The same thing happens in R Studio’s Console! It is just much easier to read and understand.

1.2.4 Tidy Data: Activity about Making Messy Data Tidy

We will be working with a very messy dataset that is data from part of a long-term project on the effects of rodents and ants on the plant community, and has been running for almost 40 years and used in over 100 publications. The rodents are sampled on a series of 24 plots, with different experimental manipulations controlling which rodents are allowed to access which plots. We’ll be working with a subset of the data which has been ‘messed up’ a bit for the purposes of this workshop. The ‘mess’, though artificial in the context of this dataset, is of the sort that I regularly come into contact (and you will to), so it’s very much real in that sense.

Please download your own copy of the dataset before making any changes:

Download workshop-1-messy-survey.xlsx


The data file contains:

  • Three tabs: containing data from samples collected in 2013, 2014, and 2015, respectively
  • Date Collected: date of collection of the sample
  • Species: species identifier
  • Plot: replicate plot identifier
  • Weight: weight of the captured specimen in grams
  • Sex: sex of the captured specimen

The mission here is simple, but will also be a bit challenging. Focus all of your tidy skills on the catastrophe that is workshop-1-messy-survey.xlsx to parse it into its cleanest, tidiest, and most useful self.

You can complete this by a mix of adjustments by hand and also in R. How you do it is up to you! But, because as we are making direct changes to the raw data, be sure to note down every change you make (especially if you are doing this by hand rather than using code in R). You can do this below (with a combination of R code within an R chunk, if you choose that option) or in a separate text file. It is up to you!

We will chat about what you have done as a group before we end the workshop. Thanks for your attention today!

In your R markdown file, document all the changes you made to the data file. This can be a bullet-point list, description, chunks of annotated R code, or a mixture of all three.