To conduct an two-factor ANOVA is pretty straightforward.
weeds.aov2 <- aov(flowers ~ species + soil, data = weeds) # two-factor anova (without interaction)
summary(weeds.aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 2369 1184.3 9.272 0.000436 ***
## soil 1 239 238.5 1.867 0.178720
## Residuals 44 5620 127.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This example constructs an ANOVA with two factors, but does not include the interaction term. If we want the interaction term, simply replace the + sign with an asterisk * .
weeds.aov2 <- aov(flowers ~ species * soil, data = weeds) # two-factor anova (with interaction)
summary(weeds.aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 2369 1184.3 9.102 0.00052 ***
## soil 1 239 238.5 1.833 0.18301
## species:soil 2 155 77.5 0.596 0.55574
## Residuals 42 5465 130.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Including the asterisk tells the formula to multiply both of the factors creating the interaction factor. It will automatically produce the results for factors independantly as well as the interaction term.
Don’t forget to check your assumptions
Everything stays the same for assumptions except the following modifications to Bartlett’s and Levene’s Tests.
bartlett.test(flowers ~ interaction(species, soil), data = weeds) # Add the interaction() argument to correctly analyse an interaction term
##
## Bartlett test of homogeneity of variances
##
## data: flowers by interaction(species, soil)
## Bartlett's K-squared = 5.3304, df = 5, p-value = 0.3769
leveneTest(flowers ~ species * soil, data = weeds) # same syntax as the normal formula
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 5 0.81 0.5492
## 42
There are two methods to transform your response (Y) variable for an analysis.
mutate()
to create a new column; orFor this example, we will be log transforming the flowers column within the weeds dataset.
NOTE: THIS MAKES NO SENSE AS IT IS NORMAL data. IT IS JUST AN EXAMPLE!
## Mutate Option ##
weeds <- mutate(weeds, logflowers = log(flowers)) # create new column called "logflowers"
## Formula option ##
weeds.aov.log <- aov(log(flowers) ~ species * soil, data = weeds) # log(flowers) as our Y variable tells the anova to use a log transformed response.
summary(weeds.aov.log)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 2.842 1.4211 11.158 0.00013 ***
## soil 1 0.239 0.2387 1.874 0.17831
## species:soil 2 0.247 0.1234 0.969 0.38792
## Residuals 42 5.349 0.1274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If you are testing assumptions, you must run the aov()
(or general analysis) again with the new transformation and then extract residuals.
shapiro.test(log(weeds.aov$residuals)) #### DO NOT DO THIS!! ####
##
## Shapiro-Wilk normality test
##
## data: log(weeds.aov$residuals)
## W = 0.95759, p-value = 0.4422
shapiro.test(weeds.aov.log$residuals) # Do this! #
##
## Shapiro-Wilk normality test
##
## data: weeds.aov.log$residuals
## W = 0.97792, p-value = 0.4951
See how those are different? The same thing applies to square root (sqrt) or square/cubic transformations (^2, ^3).
Construct a Two-factor ANOVA (with interaction) on the Insecticide dataset and answer the following:
1. Is the data normal?
2. What is the p-value for the Bartlett’s test?
3. Without transforming to normalise, what is the p-value for the interaction term?