As previously mentioned, one of the extremely useful and time saving parts of R is manipulating your data without touching your original spreadsheet. Manipulating your data within the R environment allows us to generate entirely new datasets based off our raw data, without modifying the original document. This means no more multi-sheet excel workbooks, no more opening excel to generate a new column, this can all be done in R. This is really beneficial when collaborating with other researchers, group members or supervisors since all we are required to do is send the raw data sheet and the R document. They can follow the code to see what is happening and run everything directly from the original data.
Manipulation of data can be done with the base R language (everything we have done so far) or with packages in the Tidyverse library, such as “dplyr” and “tidyr”. `Tidyverse simplifies the language of coding and offers powerful tools for data manipulation and graphing.
Make sure you have loaded tidyverse with the library()
command before attempting any functions.
For the rest of the course from here on out, there will be many arguments of functions that will be left out. If you want to learn about other customisation options for your code, or are lost at any point, use the “Help” tab in R studio or type ”?” followed by the name of the function. e.g. ?rename.
Otherwise, the internet is a awesome resource for R help.
Rename columns in a data frame while maintaining the data in tact
Create new columns using mathematical or logical calculations of other columns
Filter out data based on logical criteria. E.g. Remove any values < 10
Select specific columns from one dataset to create another or remove columns from your dataset
Join multiple datasets together based on similar or different values of a column. e.g. join environmental and species data based on site name
Remove objects or datasets from the R environment
Use summary functions such as mean and standard deviation on a column
Link multiple tidyverse functions together in a single processing step