Skip to content
Jason K. Moore edited this page Aug 26, 2015 · 40 revisions

The main headings are tabs on the top bar.

Tab 1: Load Data

This tab has a file loader button and a button for selecting sample data.

  • Load any properly formatted csv file or load one of four datasets from agricolae.
  • Display the data as a segmented table.
  • Only continuous variables and factors are supported.
  • Data with missing values is not supported.
  • Show the code used to load the data.

Types of Data

These are the types of data we will support for analyses:

  1. A continuous independent variable and a continuous dependent variable. (single variate linear regression)
  2. A continuous independent variable + one dependent factor variable from a completely randomized experiment design (CRD). example
  3. A continuous independent variable + two dependent factor variables from a completely randomized experiment design (CRD). example
  4. A continuous independent variable + one dependent "treatment" factor variable + one dependent factor block variable from a randomized complete block design (RCBD). example
  5. A continuous independent variable + two dependent "treatment" factor variables + one dependent factor block variable from a randomized complete block design (RCBD). example
  6. A continuous independent variable + two "treatment" factor variables + one plot rep factor variable from a split plot completely randomized design. example
  7. A continuous independent variable + two "treatment" factor variables + one block factor variable from a split plot randomized complete block design. example
  8. (MIXED EFFECT MODEL) A continuous independent variable + one dependent factor variable + one factor block + one RANDOM factor variable (e.g. location or year) (RCBD for mixed-effects ANOVA) concept example
  9. (MIXED EFFECT MODEL) A continuous independent variable + one dependent factor variable + one factor block + two RANDOM factor variable (e.g. location and year) (RCBD for mixed-effects ANOVA) concept example

Example Code

Custom data:

my_data <- read.csv('path/to/data.csv')

Sample data:

library(agricolae)  # load "agricolae" package for the sample data
data("plots")  # either plots, corn, cotton, etc
my_data <- plots

Tab 2: Analysis

Each sub-heading here represents a section in the side panel.

Side Panel Step 1: Experiment Design

The user will select from a drop down the type of experiment design they used:

  1. Continuous variables (Linear Regression, LR)
  2. One Treatment Completely Randomized Design (CRD1)
  3. Two Treatment Completely Randomized Design (CRD2)
  4. One Treatment Randomized Complete Block Design (RCBD1)
  5. Two Treatment Randomized Complete Block Design (RCBD2)
  6. Two Treatment Split Plot Completely Randomized Design (SPCRD2)
  7. Two Treatment Split Plot Randomized Complete Block Design (SPRCBD2)
  8. One Treatment Randomized Complete Block Design with One Random Effect (RCBDM1)
  9. One Treatment Randomized Complete Block Design with Two Random Effects (RCBDM2)

Side Panel Step 2: Select the dependent variable

This is the same for all data types, i.e. the user selects a single dependent variable. A list of all the variables available in my.data are shown and the user selects one. This variable should always be continuous but there will be not check that it is, the correct choice is up to the user.

Side Panel Step 3: Select the independent variable(s)

LR

The user can select any one of the remaining columns in the data set from a list. This column should be a continuous variable and there is no check for this.

This will create a base formula like:

Y ~ X

CRD1

The user can select one variable from the remaining columns for the treatment. This should be a factor (it will be coerced into one).

This will create a base formula like:

Y ~ X

CRD2

The user can select two variable from the remaining columns for the treatment. These should both be factors (they will be coerced into factors).

This will create a base formula like:

Y ~ X + Z + X:Z

RCBD1

The user can select one variable from the remaining columns for the treatment and select one variable for the block. Both should be factors (they will be coerced into factors if not).

This will create a base formula like:

Y ~ X + BLK

RCBD2

The user can select two variables from the remaining columns for the treatments and select one variable for the block. All should be factors (they will be coerced into factors if not).

This will create a base formula like:

Y ~ X + Z + X:Z + BLK

SPCRD2

There are three drop downs for variable selection: main plot treatment (X), sub plot treatment (Z), and replication (R). Each of these variables will be coerced into factors.

This will create a base formula like:

Y ~ X + Z + X:Z + Error(X:R)

SPRCBD2

There are three drop downs for factor variable selection: main plot treatment (X), sub plot treatment (Z), block (BLK). Each of these variables will be coerced into factors.

This will create a base formula like:

Y ~ X + Z + X:Z + BLK + Error(X:BLK)

RCBDM1

The user can select one variable from the remaining columns for the treatment, select one variable for the block and one variable as a random effect. All should be factors (they will be coerced into factors if not).

Y ~ X + Z + (1|W/BLK)

RCBDM2

The user can select two variables from the remaining columns for the treatments, select one variable for the block and one variable as a random effect. All should be factors (they will be coerced into factors if not).

This will create a base formula like:

Y ~ X + Z + X:Z + BLK + (1|V/W/BLK)

TODO : Make sure this formula is correct.

Side Panel Step 4: Transformations

This will allow the user to apply transformations to the dependent variable in the model. There will be a drop down to select between None, log10, sqrt, and power. This will effectively add a new column to my.data with one of the three transformations and adjust the formula's used in the analysis to:

y.pow ~ ...

or

y.log10 ~ ...

or

y.sqrt ~ ...

For sqrt and log the transformation is simply:

my.data$y.log10 <- log10(my.data$y)

and:

my.data$y.sqrt <- sqrt(my.data$y)

For the power transformation, an exponent is automatically computed with the following code:

# For one independent variable.
mean.data <- aggregate(Y ~ A, data = my.data, function(x)
                       c(logmean=log10(mean(x)), logvar=log10(var(x))))
# For two independent variables.
mean.data <- aggregate(Y ~ A + B, data = my.data, function(x)
                       c(logmean=log10(mean(x)), logvar=log10(var(x))))
power.fit <- lm(logvar ~ logmean, data = as.data.frame(mean.data$Y))
power <- 1 - summary(power.fit)$coefficients[2, 1] / 2
my.data$Y.pow <- my.data$Y^power

Side Panel Step 5: Run Analysis

The user presses the "Run Analysis" button which then displays the code needed to run the analyses and the assumptions tests. The results from running the analyses code will be displayed in the main window, i.e. text results interspersed with graphs.

LR

The linear regression will find the model fit, show the fit summary, show the results of the Shapiro-Wilks test, and make three plots: fitted vs residuals, Q-Q, and scatter plot with best fit line.

The code produced follows this form:

fit <- lm(formula = Y ~ X, data = my.data)
summary(fit)
shapiro.test(residuals(fit))
plot(fit, c(1, 2))
plot(formula = Y ~ X, data = my.data)
abline(fit)

CRD1

The CRD analyses will run a one-way ANOVA, show the ANOVA table, show the results of two assumptions tests (Shapiro-Wilk, Levene) and plots a box plot showing the effect of the levels of the independent variable on the dependent variable.

The code produced follows this form:

fit <- aov(formula = Y ~ X, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
        xlab = "X", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)

CRD2

The CRD analyses will run a two-way ANOVA, show the ANOVA table, show the results of three assumptions tests (Shapiro-Wilk, Levene) and plots a box plot showing the effect of the levels of the independent variables on the dependent variable along with interaction plots.

The code produced follows this form:

fit <- aov(formula = Y ~ X + Z + X:Z, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
        xlab = "X", ylab = "Y")
boxplot(Y ~ Z, data = my.data, main = "Effect of Z on Y",
        xlab = "Z", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
leveneTest(Y ~ Z, data = my.data)
my.data$YP.SQ <- predict(fit)^2
library('HH')
intxplot(Y ~ X, groups = Z, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ Z, groups = X, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)

RCBD1

A one-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and three plots are produced: fitted vs residuals, Q-Q, box plot showing the effect of the levels on the dependent variables.

fit <- aov(formula = Y ~ X + BLK, data = my.data)
summary(fit)
plot(fit, which = c(1, 2))
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
        xlab = "X", ylab = "Y")
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
my.data$YP.SQ <- predict(fit)^2
tukey.one.dof.mod <- lm(formula = Y ~ X + BLK + YP.SQ, data = my.data)
summary(tukey.one.dof.mod)

RCBD2

A three-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and six plots are produced: fitted vs residuals, Q-Q, box plots showing the effect of the levels on the dependent variables, and two interaction plots.

fit <- aov(formula = Y ~ BLK + X + Z + X:Z, data = my.data)
summary(fit)
boxplot(Y ~ X, data = my.data, main = "Effect of X on Y",
        xlab = "X", ylab = "Y")
boxplot(Y ~ Z, data = my.data, main = "Effect of Z on Y",
        xlab = "Z", ylab = "Y")
plot(fit, which = c(1, 2))
shapiro.test(residuals(fit))
library('car')
leveneTest(Y ~ X, data = my.data)
leveneTest(Y ~ Z, data = my.data)
my.data$YP.SQ <- predict(fit)^2
tukey.one.df.fit <- lm(formula = Y ~ BLK + X + Z + X:Z + YP.SQ,
                         data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ X, groups = Z, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ Z, groups = X, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)

SPCRD2

The Split Plot CRD analyses will run a two-way ANOVA, show the ANOVA table, show the results of three assumptions tests (Shapiro-Wilk, Levene, Tukey) and plots the residauls vs fitted, Q-Q, and two interaction plots.

fit <- aov(formula = A + B + A:B + Error(A:R), data = my.data)
summary(fit)
fit.no.error <- aov(formula = A + B + A:B, data = my.data)
plot(fit.no.error, which = c(1, 2))
library('car')
leveneTest(Y ~ A, data = my.data)
leveneTest(Y ~ B, data = my.data)
my.data$YP.SQ <- predict(fit.no.error)^2
tukey.one.df.fit <- lm(formula = A + B + A:B + YP.SQ, data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ A, groups = B, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ B, groups = A, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)

SPRCBD2

A three-way ANOVA is fit to the RCBD data and three assumptions tests are run (Shapiro-Wilk, Levene, Tukey). The ANOVA table is shown for the fit and six plots are produced: fitted vs residuals, Q-Q, box plots showing the effect of the levels on the dependent variables, and two interaction plots.

fit <- aov(formula = BLK + A + B + A:B + Error(A:BLK), data = my.data)
summary(fit)
fit.no.error <- aov(formula = BLK + A + B + A:B, data = my.data)
plot(fit.no.error, which = c(1, 2))
library('car')
leveneTest(Y ~ A, data = my.data)
leveneTest(Y ~ B, data = my.data)
my.data$YP.SQ <- predict(fit.no.error)^2
tukey.one.df.fit <- lm(formula = BLK + A + B + A:B + YP.SQ, data = my.data)
summary(tukey.one.df.fit)
library('HH')
intxplot(Y ~ A, groups = B, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)
intxplot(Y ~ B, groups = A, data = my.data, se = TRUE,
         ylim = range(my.data$Y), offset.scale = 500)

RCBDM1

TODO : Waiting on the complete example.

RCBDM2

TODO : Waiting on the complete example.

Tab 3: Post Hoc Analysis

The post hoc analyses allows the user to run Least Significant Difference tests on any significant dependent variables to determine if the levels are significant with respect to each other. Whether or not LSD tests are run for different variables or different interactions of variables depends on these things:

  1. If there are one or two independent variables.
  2. If there are two variables, whether the interaction is significant.
  3. If all or some of the independent variables are significant.
  4. If it is a split plot design or not.

This pseudo-code shows what needs to happen to decide how to run the LSD tests:

if LR
  do not run LSD test
  do not make LSD car chart
else if CRD1 or RCBD1
  if independent variable is significant
    run LSD test
    make LSD bar chart
  else
    do not run LSD test
    do not make LSD car chart (is this supposed to read: bar chart? or is it
    simply the output of lsd test?)
else if CRD2 or RCBD2
  if interaction is significant
    run single LSD test with both variables combined (e.g., if it's a 3 x 3
    factorial, then there will be 9 comparisons) [**This may actually be
    incorrect. I will add to issues]
    make LSD car chart for the combined variables
  else interaction is not significant
    if both independent variables are significant
        run LSD test on each significant variables
        make LSD car chart for each significant variable
    or if one independent variables are significant
        run LSD test on the significant variable
        make LSD car chart for the significant variable
    else no independent variables are significant
      do not run LSD test
      do not make LSD car chart
else if SPCRD2 or SPRCBD2
  if interaction is significant
    for each level of main plot
       do LSD test on subplot treatment
    for each level of subplot
       do LSD test on main plot treatment
  else interaction is not significant
      if subplot treatment is significant
        LSD of subplot
      else
        do not do LSD
      if mainplot treatment is significant
        LSD of main plot
      else
        do not do LSD
    TODO : Compare between subplot levels across  main plot levels
else if RCBDM1 or RCBDM2
  TODO : TBD

For any LSD results a bar graph should be created showing the results graphically. The code for a bar graph from a single variable LSD results looks like:

library('agricolae')
lsd.results <- LSD.Test(fit, X)
library('Rmisc')
summary.stats <- summarySE(data = my.data, Y, groupvars = X)
merged.table <- merge(summary.stats, lsd.results$groups,
                      by.x = X, by.y = trt)
library('ggplot2')
ggplot(merged.table, aes(x = X, y = means)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymax = means + se, ymin = means - se)) +
  geom_text(aes(label = M, y = means + se / 1.8, vjust = -2.5)) +
  labs(x = "X", y = "Y")

and from one with two variables:

library('agricolae')
lsd.results <- LSD.test(model, c('X', 'Z'), console = TRUE)
library('Rmisc')
summary.stats <- summarySE(data = my.data, "Y",
                           groupvars = c("X", "Z"))
summary.stats$trt <- apply(summary.stats[ , c("X", "Z")], 1, paste,
                           collapse = ":")
merged.table <- merge(summary.stats, lsd.results$groups, by = 'trt')
library('ggplot2')
ggplot(merged.table, aes(x = trt, y = means)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymax = means + se, ymin = means - se)) +
  geom_text(aes(label = M, y = means + se / 1.8, vjust = -2.5)) +
  labs(x = "X:Z", y = "Y")

NOTE : This ggplot code should be switched out for some "simpler" basic R code if it will be shown to the user.

Tab 4: Downloads

A button to download a pdf report that includes R code and graphs.

Tab 5: About

Image and link to USAID plus a disclaimer.