One of the most useful benefits of using R is that you start to write your own functions to automate tasks.
Let’s say I had a few hundred lactation curves I wanted to plot individually. I couldn’t facet them all, so I would need to plot them 1 at a time. But if I ever need to repeat something more than 3 times, it deserves to be in a function.
This time, we will plot each individual animal on own graph. First, let’s plot the raw data in a way that we might want to focus on some individual animals.
```{r}p_raw <- six_SLN %>%ggplot(aes(x = Milking, y = six_SLN_conc, colour = Parity))+geom_point(size =3)+geom_line(aes(group = Cow_ID),linewidth =1)+scale_x_continuous(breaks =seq(from =0, to =14, by =1))+scale_colour_viridis_d(begin =0.2, end =0.8)+coord_cartesian(ylim =c(NA, 500))+xlab("Milking Number")+ylab("6′-Sialyllactose (6′SLN) concentration (μg/mL)")p_raw```
It seems there’s a couple of individuals who are pretty high, and we might want to take a look. One option is to use interactive plots (see below), or we might want to plot one cow at a time.
Let’s filter out 1 cow this time, and then re-plot. Notice that all of this code is identical to above, and we just added in a filter step.
```{r}six_SLN %>%filter(Cow_ID ==5716) %>%ggplot(aes(x = Milking, y = six_SLN_conc, colour = Parity))+geom_point(size =3)+geom_line(aes(group = Cow_ID),linewidth =1)+scale_x_continuous(breaks =seq(from =0, to =14, by =1))+scale_colour_viridis_d(begin =0.2, end =0.8)+coord_cartesian(ylim =c(NA, 500))+xlab("Milking Number")+ylab("6′-Sialyllactose (6′SLN) concentration (μg/mL)")```
Let’s turn the ggplot part into a function to make it easier for us to re-use:
```{r}# create the funciton, this takes a df as input and plots it# The function is saved in the environment to re-use# f_plot_individual <-function(df_in){ p <- df_in %>%ggplot(aes(x = Milking, y = six_SLN_conc, colour = Parity))+geom_point(size =3)+geom_line(aes(group = Cow_ID),linewidth =1)+scale_x_continuous(breaks =seq(from =0, to =14, by =1))+scale_colour_viridis_d(begin =0.2, end =0.8)+coord_cartesian(ylim =c(NA, 500))+xlab("Milking Number")+ylab("6′-Sialyllactose (6′SLN) concentration (μg/mL)")return(p) }#subset 1 cow of data to plotxsubset_data_test <- six_SLN %>%filter(Cow_ID ==5716)# execute the functionf_plot_individual(df_in = subset_data_test)```
Now we can give any dataframe to our function to plot, even the whole dataframe:
Now we are going to go back to a similar idea that we saw in split-apply-combine. This time we will be split our data frame into a list, and tell R to iterate through the list of small data frames and each time run our function.
```{r}# split our dataframe into a list of small dataframeslist_of_dfs <- six_SLN %>%group_by(Cow_ID) %>%group_split()```
Let’s check the structure of our list. If we look at the first element in our list, it is a df with 1 cow_ID.
Making interactive plots is easy now that you’re used to using RNotebooks (which are actually producing a .html file that can accommodate interactive plots in your output). This is made even easier because we only need 1 simple function: ggplotly()
A great package for heatmaps is pheatmap. However it isn’t as well document. See: https://davetang.org/muse/2018/05/15/making-a-heatmap-in-r-with-the-pheatmap-package/
It’s particularly useful for larger datasets and exploratory work. This is one example from some gene expression work: