-
Notifications
You must be signed in to change notification settings - Fork 5
Home
The main headings are tabs on the top bar.
This tab has a file loader button and a button for selecting sample data.
- Load any properly formatted csv file or load one of four datasets from agricolae.
- Display the data as a segmented table.
- Only continuous variables and factors are supported.
- Data with missing values is not supported.
- Show the code used to load the data.
These are the types of data we will support for analyses:
- A continuous independent variable and a continuous dependent variable. (single variate linear regression)
- A continuous independent variable + one dependent factor variable from a completely randomized experiment design (CRD). example
- A continuous independent variable + two dependent factor variables from a completely randomized experiment design (CRD). example
- A continuous independent variable + one dependent "treatment" factor variable + one dependent factor block variable from a randomized complete block design (RCBD). example
- A continuous independent variable + two dependent "treatment" factor variables + one dependent factor block variable from a randomized complete block design (RCBD). example
- A continuous independent variable + two "treatment" factor variables + one plot rep factor variable from a split plot completely randomized design. example
- A continuous independent variable + two "treatment" factor variables + one block factor variable from a split plot randomized complete block design. example
- (MIXED EFFECT MODEL) A continuous independent variable + one dependent factor variable + one factor block + one RANDOM factor variable (e.g. location or year) (RCBD for mixed-effects ANOVA) concept example
- (MIXED EFFECT MODEL) A continuous independent variable + one dependent factor variable + one factor block + two RANDOM factor variable (e.g. location and year) (RCBD for mixed-effects ANOVA) concept example
Custom data:
my_data <- read.csv('path/to/data.csv')
Sample data:
library(agricolae) # load "agricolae" package for the sample data
data("plots") # either plots, corn, cotton, etc
my_data <- plots
Each sub-heading here represents a section in the side panel.
The user will select from a drop down the type of experiment design they used:
- Continuous variables (Linear Regression,
LR
) - One Treatment Completely Randomized Design (
CRD1
) - Two Treatment Completely Randomized Design (
CRD2
) - One Treatment Randomized Complete Block Design (
RCBD1
) - Two Treatment Randomized Complete Block Design (
RCBD2
) - Two Treatment Split Plot Completely Randomized Design (
SPCRD2
) - Two Treatment Split Plot Randomized Complete Block Design (
SPRCBD2
) - One Treatment Randomized Complete Block Design with One Random Effect (
RCBDM1
) - One Treatment Randomized Complete Block Design with Two Random Effects (
RCBDM2
)
This is the same for all data types, i.e. the user selects a single dependent
variable. A list of all the variables available in my.data
are shown and the
user selects one. This variable should always be continuous but there will be
not check that it is, the correct choice is up to the user.
The user can select any one of the remaining columns in the data set from a list. This column should be a continuous variable and there is no check for this.
This will create a base formula like:
Y ~ X
The user can select one variable from the remaining columns for the treatment. This should be a factor (it will be coerced into one).
This will create a base formula like:
Y ~ X
The user can select two variable from the remaining columns for the treatment. These should both be factors (they will be coerced into factors).
This will create a base formula like:
Y ~ X + Z + X:Z
The user can select one variable from the remaining columns for the treatment and select one variable for the block. Both should be factors (they will be coerced into factors if not).
This will create a base formula like:
Y ~ X + BLK
The user can select two variables from the remaining columns for the treatments and select one variable for the block. All should be factors (they will be coerced into factors if not).
This will create a base formula like:
Y ~ X + Z + X:Z + BLK
There are three drop downs for variable selection: main plot treatment (X), sub plot treatment (Z), and replication (R). Each of these variables will be coerced into factors.
This will create a base formula like:
Y ~ X + Z + X:Z + Error(X:R)
There are three drop downs for factor variable selection: main plot treatment (X), sub plot treatment (Z), block (BLK). Each of these variables will be coerced into factors.
This will create a base formula like:
Y ~ X + Z + X:Z + BLK + Error(X:BLK)
The user can select one variable from the remaining columns for the treatment, select one variable for the block and one variable as a random effect. All should be factors (they will be coerced into factors if not).
Y ~ X + Z + (1|W/BLK)
The user can select two variables from the remaining columns for the treatments, select one variable for the block and one variable as a random effect. All should be factors (they will be coerced into factors if not).
This will create a base formula like:
Y ~ X + Z + X:Z + BLK + (1|V/W/BLK)
TODO : Make sure this formula is correct.
This will allow the user to apply transformations to the dependent variable in
the model. There will be a drop down to select between None, log10, sqrt, and
power. This will effectively add a new column to my.data
with one of the
three transformations and adjust the formula's used in the analysis to:
y.pow ~ ...
or
y.log10 ~ ...
or
y.sqrt ~ ...
For sqrt and log the transformation is simply:
my.data$y.log10 <- log10(my.data$y)
and:
my.data$y.sqrt <- sqrt(my.data$y)
For the power transformation, an exponent is automatically computed with the following code:
# For one independent variable.
mean.data <- aggregate(Y ~ A, data = my.data, function(x)
c(logmean=log10(mean(x)), logvar=log10(var(x))))
# For two independent variables.
mean.data <- aggregate(Y ~ A + B, data = my.data, function(x)
c(logmean=log10(mean(x)), logvar=log10(var(x))))
power.fit <- lm(logvar ~ logmean, data = as.data.frame(mean.data$Y))
power <- 1 - summary(power.fit)$coefficients[2, 1] / 2
my.data$Y.pow <- my.data$Y^power
The user presses the "Run Analysis" button which then displays the code needed to run the analyses and the assumptions tests. The results from running the analyses code will be displayed in the main window, i.e. text results interspersed with graphs.
The linear regression will find the model fit, show the fit summary, show the results of the Shapiro-Wilks test, and make three plots: fitted vs residuals, Q-Q, and scatter plot with best fit line.
The code produced follows this form:
fit <- lm(formula = Y ~ X, data = my.data)
summary(fit)
shapiro.test(residuals(fit))
plot(fit, c(1, 2))
plot(formula = Y ~ X, data = my.data)
abline(fit)
The CRD analyses will run a one-way ANOVA, show the ANOVA table, show the results of two assumptions tests (Shapiro-Wilk, Levene) and plots a box plot showing the effect of the levels of the independent variable on the dependent variable.
The code produced follows this form:
fit <- aov(formula = Y ~ X, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
xlab = "X", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
The CRD analyses will run a two-way ANOVA, show the ANOVA table, show the results of three assumptions tests (Shapiro-Wilk, Levene) and plots a box plot showing the effect of the levels of the independent variables on the dependent variable along with interaction plots.
The code produced follows this form:
fit <- aov(formula = Y ~ X + Z + X:Z, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
xlab = "X", ylab = "Y")
boxplot(Y ~ Z, data = my.data, main = "Effect of Z on Y",
xlab = "Z", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
leveneTest(Y ~ Z, data = my.data)
my.data$YP.SQ <- predict(fit)^2
library('HH')
intxplot(Y ~ X, groups = Z, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ Z, groups = X, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
A one-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and three plots are produced: fitted vs residuals, Q-Q, box plot showing the effect of the levels on the dependent variables.
fit <- aov(formula = Y ~ X + BLK, data = my.data)
summary(fit)
plot(fit, which = c(1, 2))
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
xlab = "X", ylab = "Y")
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
my.data$YP.SQ <- predict(fit)^2
tukey.one.dof.mod <- lm(formula = Y ~ X + BLK + YP.SQ, data = my.data)
summary(tukey.one.dof.mod)
A three-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and six plots are produced: fitted vs residuals, Q-Q, box plots showing the effect of the levels on the dependent variables, and two interaction plots.
fit <- aov(formula = Y ~ BLK + X + Z + X:Z, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
xlab = "X", ylab = "Y")
boxplot(Y ~ Z, data = my.data, main = "Effect of Z on Y",
xlab = "Z", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
leveneTest(Y ~ Z, data = my.data)
my.data$YP.SQ <- predict(fit)^2
tukey.one.df.fit <- lm(formula = Y ~ BLK + X + Z + X:Z + YP.SQ,
data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ X, groups = Z, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ Z, groups = X, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
The Split Plot CRD analyses will run a two-way ANOVA, show the ANOVA table, show the results of three assumptions tests (Shapiro-Wilk, Levene, Tukey) and plots the residauls vs fitted, Q-Q, and two interaction plots.
fit <- aov(formula = A + B + A:B + Error(A:R), data = my.data)
summary(fit)
fit.no.error <- aov(formula = A + B + A:B, data = my.data)
plot(fit.no.error, which = c(1, 2))
library('car')
leveneTest(Y ~ A, data = my.data)
leveneTest(Y ~ B, data = my.data)
my.data$YP.SQ <- predict(fit.no.error)^2
tukey.one.df.fit <- lm(formula = A + B + A:B + YP.SQ, data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ A, groups = B, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ B, groups = A, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
A three-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and six plots are produced: fitted vs residuals, Q-Q, box plots showing the effect of the levels on the dependent variables, and two interaction plots.
fit <- aov(formula = BLK + A + B + A:B + Error(A:BLK), data = my.data)
summary(fit)
fit.no.error <- aov(formula = BLK + A + B + A:B, data = my.data)
plot(fit.no.error, which = c(1, 2))
library('car')
leveneTest(Y ~ A, data = my.data)
leveneTest(Y ~ B, data = my.data)
my.data$YP.SQ <- predict(fit.no.error)^2
tukey.one.df.fit <- lm(formula = BLK + A + B + A:B + YP.SQ, data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ A, groups = B, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ B, groups = A, data = my.data, se = TRUE,
ylim = range(my.data$Y), offset.scale = 500)
TODO : Waiting on the complete example.
TODO : Waiting on the complete example.
The post hoc analyses allows the user to run Least Significant Difference tests on any significant dependent variables to determine if the levels are significant with respect to each other. Whether or not LSD tests are run for different variables or different interactions of variables depends on these things:
- If there are one or two independent variables.
- If there are two variables, whether the interaction is significant.
- If all or some of the independent variables are significant.
- If it is a split plot design or not.
This pseudo-code shows what needs to happen to decide how to run the LSD tests:
if LR
do not run LSD test
do not make LSD car chart
else if CRD1 or RCBD1
if independent variable is significant
run LSD test
make LSD bar chart
else
do not run LSD test
do not make LSD car chart (is this supposed to read: bar chart? or is it
simply the output of lsd test?)
else if CRD2 or RCBD2
if interaction is significant
run single LSD test with both variables combined (e.g., if it's a 3 x 3
factorial, then there will be 9 comparisons) [**This may actually be
incorrect. I will add to issues]
make LSD car chart for the combined variables
else interaction is not significant
if both independent variables are significant
run LSD test on each significant variables
make LSD car chart for each significant variable
or if one independent variables are significant
run LSD test on the significant variable
make LSD car chart for the significant variable
else no independent variables are significant
do not run LSD test
do not make LSD car chart
else if SPCRD2 or SPRCBD2
if interaction is significant
for each level of main plot
do LSD test on subplot treatment
for each level of subplot
do LSD test on main plot treatment
else interaction is not significant
if subplot treatment is significant
LSD of subplot
else
do not do LSD
if mainplot treatment is significant
LSD of main plot
else
do not do LSD
TODO : Compare between subplot levels across main plot levels
else if RCBDM1 or RCBDM2
TODO : TBD
For any LSD results a bar graph should be created showing the results graphically. The code for a bar graph from a single variable LSD results looks like:
library('agricolae')
lsd.results <- LSD.Test(fit, X)
library('Rmisc')
summary.stats <- summarySE(data = my.data, Y, groupvars = X)
merged.table <- merge(summary.stats, lsd.results$groups,
by.x = X, by.y = trt)
library('ggplot2')
ggplot(merged.table, aes(x = X, y = means)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymax = means + se, ymin = means - se)) +
geom_text(aes(label = M, y = means + se / 1.8, vjust = -2.5)) +
labs(x = "X", y = "Y")
and from one with two variables:
library('agricolae')
lsd.results <- LSD.test(model, c('X', 'Z'), console = TRUE)
library('Rmisc')
summary.stats <- summarySE(data = my.data, "Y",
groupvars = c("X", "Z"))
summary.stats$trt <- apply(summary.stats[ , c("X", "Z")], 1, paste,
collapse = ":")
merged.table <- merge(summary.stats, lsd.results$groups, by = 'trt')
library('ggplot2')
ggplot(merged.table, aes(x = trt, y = means)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymax = means + se, ymin = means - se)) +
geom_text(aes(label = M, y = means + se / 1.8, vjust = -2.5)) +
labs(x = "X:Z", y = "Y")
NOTE : This ggplot code should be switched out for some "simpler" basic R code if it will be shown to the user.
A button to download a pdf report that includes R code and graphs.
Image and link to USAID plus a disclaimer.