Part 1
Tips on R
Data visualization
Analyse a real data set
Impact of bleaching on coral cover in Mo’orea
Programming language
Special language for statistical analyses and visualizations

Interface to R
Provides a layout and functions that make it easier and more efficient to use R
Keeps data, scripts, and plots on one place
Can be moved and shared easily
File → New Project…




data: All data used for you analysis. Keep in it a folder with all raw data that you do not touch
scripts: All scrips for your analysis. You can keep it organised with numbers, e.g.
1_data_exploration.qmd
2_plots.qmd
plots: Plots generated during your analysis
Mix of text, R code, and output (plots, tables, …)
Can be exported to HTML, PDF
For presentations, manuscripts, websites, etc.
File → Quarto Document… → Create Empty Document
Insert R code with
or on Mac Option-Command-IOption-Command-I or Windows Control-Alt-OControl-Alt-O

Task 1.1
Create a project
Create the folders scripts, data, and plots
Create an empty Quarto Document
Try out to write text and some simple R code like print(“Hello”)
tidyverse packageggplot2 package
Workflow
aes())geom_...())ggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageggplot2 packageplot_iris <- ggplot(data = iris,
aes(x = Sepal.Length, y = Petal.Length,
colour = Species))+
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
labs(x = "Sepal length in cm", y = "Petal length in cm",
title = "Iris")+
facet_grid(~Species)+
theme_minimal()+
theme(legend.position = "None")
# to show it
plot_irisPlots can be saved as a variable
ggplot2 packageggsave()ggsave(filename = "plot_iris.pdf", # chose filename and file format (.png, .svg, .jpg, etc.)
plot = plot_iris, # chose plot that should be saved
width = 17 ,height = 8, units = "cm", # chose size of saved plot
scale = 1, # change size of all elements in plot. Smaller number -> larger
path = "../plots") # location where plot should be savedggplot2 packagepalmerpenguins is another R example data set with data on three penguin species


ggplot2 packageTask 1.2
Make a similar plot
iris %>%
mutate(petal_length_mm = Petal.Length * 10) %>% # create a new column with petal lenght in mm
select(-Petal.Length) %>% # remove Petal.Length column
filter(Species == "virginica") %>% # filter for virginica
arrange(petal_length_mm) %>% # sort according to petal_length_mm
head(10) # show first 10 rows| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | petal_length_mm |
|---|---|---|---|---|---|
| 4.9 | 2.5 | 4.5 | 1.7 | virginica | 45 |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica | 48 |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica | 48 |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica | 49 |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica | 49 |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica | 49 |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica | 50 |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica | 50 |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica | 50 |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica | 51 |
dplyr package| Species | mean_petal_length | mean_petal_width |
|---|---|---|
| setosa | 1.462 | 0.246 |
| versicolor | 4.260 | 1.326 |
| virginica | 5.552 | 2.026 |
dplyr packageTask 1.3
Calculate the
of body_mass_g for the different species and sexes in dat_penguins.
| species | sex | mean_body_mass_g | sd_body_mass_g | n |
|---|---|---|---|---|
| Adelie | female | 3368.836 | 269.3801 | 73 |
| Adelie | male | 4043.493 | 346.8116 | 73 |
| Adelie | NA | NA | NA | 6 |
| Chinstrap | female | 3527.206 | 285.3339 | 34 |
| Chinstrap | male | 3938.971 | 362.1376 | 34 |
| Gentoo | female | 4679.741 | 281.5783 | 58 |
| Gentoo | male | 5484.836 | 313.1586 | 61 |
| Gentoo | NA | NA | NA | 5 |
dplyr package| Species | mean_petal_width | sd_petal_width |
|---|---|---|
| setosa | 0.246 | 0.1053856 |
| versicolor | 1.326 | 0.1977527 |
| virginica | 2.026 | 0.2746501 |
dplyr packageggplot(data = irisS, aes(x = Species, colour = Species)) + # Set up ggplot and columns used in all layers
geom_point(data = iris, aes(y = Petal.Width), # Take raw date for point layer
position = position_jitter()) + # Shuffle points along x axis
geom_errorbar(aes(ymin = mean_petal_width - sd_petal_width, # Take summary data for errorbars
ymax = mean_petal_width + sd_petal_width),
width = 0.2) +
geom_point(data = irisS, aes(y = mean_petal_width), # Plot the mean on top
shape = 21, size = 3, fill = "white") + # with a larger point
theme_classic()+
theme(legend.position = "None")Easy to read
| year | best_film | best_soundtrack |
|---|---|---|
| 2015 | Birdman | The Grand Budapest Hotel |
| 2016 | Spotlight | The Hateful Eight |
| 2017 | Moonlight | La La Land |
| 2018 | The Shape of Water | The Shape of Water |
| 2019 | Green Book | Black Panther |
| 2020 | Parasite | Joker |
| 2021 | Nomadland | Soul |
| 2022 | CODA | Dune |
| 2023 | Everything Everywhere All at Once | All Quiet on the Western Front |
| 2024 | Oppenheimer | Oppenheimer |
| 2025 | Anora | The Brutalist |
Easy to use in R
| year | category | winner |
|---|---|---|
| 2015 | best_film | Birdman |
| 2015 | best_soundtrack | The Grand Budapest Hotel |
| 2016 | best_film | Spotlight |
| 2016 | best_soundtrack | The Hateful Eight |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
| 2024 | best_film | Oppenheimer |
| 2024 | best_soundtrack | Oppenheimer |
| 2025 | best_film | Anora |
| 2025 | best_soundtrack | The Brutalist |
df_oscars_W %>% # Take data in wide format
pivot_longer(cols = c("best_film", "best_soundtrack"), # Select columns that will be used as variable names
# Columns not selected (here `year`)
# will be used for values
names_to = "category", # Define name of variable name column
values_to = "winner") # Define name of value column | year | best_film | best_soundtrack |
|---|---|---|
| 2015 | Birdman | The Grand Budapest Hotel |
| 2016 | Spotlight | The Hateful Eight |
| 2017 | Moonlight | La La Land |
| 2018 | The Shape of Water | The Shape of Water |
| 2019 | Green Book | Black Panther |
| 2020 | Parasite | Joker |
| 2021 | Nomadland | Soul |
| 2022 | CODA | Dune |
| 2023 | Everything Everywhere All at Once | All Quiet on the Western Front |
| 2024 | Oppenheimer | Oppenheimer |
| 2025 | Anora | The Brutalist |
| year | category | winner |
|---|---|---|
| 2015 | best_film | Birdman |
| 2015 | best_soundtrack | The Grand Budapest Hotel |
| 2016 | best_film | Spotlight |
| 2016 | best_soundtrack | The Hateful Eight |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
| 2024 | best_film | Oppenheimer |
| 2024 | best_soundtrack | Oppenheimer |
| 2025 | best_film | Anora |
| 2025 | best_soundtrack | The Brutalist |
| year | category | winner |
|---|---|---|
| 2015 | best_film | Birdman |
| 2015 | best_soundtrack | The Grand Budapest Hotel |
| 2016 | best_film | Spotlight |
| 2016 | best_soundtrack | The Hateful Eight |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
| 2024 | best_film | Oppenheimer |
| 2024 | best_soundtrack | Oppenheimer |
| 2025 | best_film | Anora |
| 2025 | best_soundtrack | The Brutalist |
| year | best_film | best_soundtrack |
|---|---|---|
| 2015 | Birdman | The Grand Budapest Hotel |
| 2016 | Spotlight | The Hateful Eight |
| 2017 | Moonlight | La La Land |
| 2018 | The Shape of Water | The Shape of Water |
| 2019 | Green Book | Black Panther |
| 2020 | Parasite | Joker |
| 2021 | Nomadland | Soul |
| 2022 | CODA | Dune |
| 2023 | Everything Everywhere All at Once | All Quiet on the Western Front |
| 2024 | Oppenheimer | Oppenheimer |
| 2025 | Anora | The Brutalist |
Task 1.4
Take the dat_penguins data and
| species | 2007 | 2008 | 2009 |
|---|---|---|---|
| Adelie | NA | 3742.000 | 3664.904 |
| Chinstrap | 3694.231 | 3800.000 | 3725.000 |
| Gentoo | 5070.588 | 5019.565 | NA |
Bonus
Why are there NA values and how can you avoid it?
When preparing data in Excel, don’t merge cells, use empty cells for formatting, or use color as information
Column names should not contain spaces
Start file names with date (format yyyy_mm_dd) for chronological sorting
Avoid spaces in file names