Using hypr for linear regression

Daniel J. Schad & Maximilian M. Rabe

Oct 9th, 2019

Background

hypr is a package for easy translation between experimental (null) hypotheses, hypothesis matrices and contrast matrices, as used for coding factor contrasts in linear regression models. The package can be used to derive contrasts from hypotheses and vice versa. The first step is to define the hypotheses. This step is independent of the package per se and requires some theoretical background knowledge in null hypothesis significance testing (NHST). This vignette shows two examples of deriving contrasts and using them for statistical analyses.

For a general introduction to hypr, see the hypr-intro vignette:

vignette("hypr-intro", package = "hypr")

Simulated dataset

For the examples in this vignette, we are using a simulated dataset with one factor X with four levels X1, X2, X3, and X4:

set.seed(123)
M <- c(mu1 = 10, mu2 = 20, mu3 = 10, mu4 = 40) # condition means
N <- 5
SD <- 10
simdat <- do.call(rbind, lapply(names(M), function(x) {
  data.frame(X = x, DV = as.numeric(MASS::mvrnorm(N, unname(M[x]), SD^2, empirical = TRUE)))
}))
simdat$X <- factor(simdat$X)
simdat$id <- 1:nrow(simdat)
simdat
##      X         DV id
## 1  mu1  0.7025204  1
## 2  mu1  4.7751377  2
## 3  mu1 26.8323215  3
## 4  mu1  8.4826319  4
## 5  mu1  9.2073885  5
## 6  mu2 35.1216132  6
## 7  mu2 24.3424125  7
## 8  mu2  9.5079228  8
## 9  mu2 14.4775279  9
## 10 mu2 16.5505235 10
## 11 mu3 24.3273310 11
## 12 mu3 10.8118074 12
## 13 mu3 11.4523075 13
## 14 mu3  6.9158660 14
## 15 mu3 -3.5073119 15
## 16 mu4 51.8888864 16
## 17 mu4 42.7533446 17
## 18 mu4 25.2877490 18
## 19 mu4 44.1955804 19
## 20 mu4 35.8744396 20

Example: Treatment contrasts

Assume we would like to test three treatments against a baseline. In a typical treatment contrast, we typically test whether any of the treatment conditions \(\mu_2\), \(\mu_3\) or \(\mu_4\) is significantly different from the baseline condition \(\mu_1\). Including the baseline intercept (testing the baseline against zero), this allows us to generate four null hypotheses:

\[\begin{align} H_{0_1}:& \; \mu_1 = 0 \\ H_{0_2}:& \; \mu_2 = \mu_1 \\ H_{0_3}:& \; \mu_3 = \mu_1 \\ H_{0_4}:& \; \mu_4 = \mu_1 \end{align}\]

The hypr() function accepts any set of such equations as comma-separated arguments:

trtC <- hypr(mu1~0, mu2~mu1, mu3~mu1, mu4~mu1)

When calling this function, a hypr object named trtC is generated which contains all four hypotheses from above as well as the hypothesis and contrast matrices derived from those. We can display a summary like any other object in R:

trtC
## hypr object containing 4 null hypotheses:
## H0.1: 0 = mu1        (Intercept)
## H0.2: 0 = mu2 - mu1
## H0.3: 0 = mu3 - mu1
## H0.4: 0 = mu4 - mu1
## 
## Call:
## hypr(~mu1, ~mu2 - mu1, ~mu3 - mu1, ~mu4 - mu1, levels = c("mu1", 
## "mu2", "mu3", "mu4"))
## 
## Hypothesis matrix (transposed):
##     [,1] [,2] [,3] [,4]
## mu1  1   -1   -1   -1  
## mu2  0    1    0    0  
## mu3  0    0    1    0  
## mu4  0    0    0    1  
## 
## Contrast matrix:
##     [,1] [,2] [,3] [,4]
## mu1 1    0    0    0   
## mu2 1    1    0    0   
## mu3 1    0    1    0   
## mu4 1    0    0    1

We can use this object to set the factor contrasts of X in the simdat dataframe:

contrasts(simdat$X) <- contr.hypothesis(trtC)
contrasts(simdat$X)
##     [,1] [,2] [,3]
## mu1 0    0    0   
## mu2 1    0    0   
## mu3 0    1    0   
## mu4 0    0    1
round(coef(summary(lm(DV ~ X, data=simdat))), 3)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       10      4.472   2.236    0.040
## X1                10      6.325   1.581    0.133
## X2                 0      6.325   0.000    1.000
## X3                30      6.325   4.743    0.000

The linear regression returns the expected estimates: The intercept is the baseline condition and the three main effects are the differences between the baseline and the three conditions.

Example: Sum contrast coding

A sum contrast, such as used for ANOVA, with four levels could generate the following null hypotheses:

\[\begin{align} H_{0_1}:& \; \mu_1 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_2}:& \; \mu_2 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_3}:& \; \mu_3 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \end{align}\]

We rewrite them into hypr:

sumC <- hypr(mu1 ~ (mu1+mu2+mu3+mu4)/4, mu2 ~ (mu1+mu2+mu3+mu4)/4, mu3 ~ (mu1+mu2+mu3+mu4)/4)
sumC
## hypr object containing 3 null hypotheses:
## H0.1: 0 = (3*mu1 - mu2 - mu3 - mu4)/4
## H0.2: 0 = (3*mu2 - mu1 - mu3 - mu4)/4
## H0.3: 0 = (3*mu3 - mu1 - mu2 - mu4)/4
## 
## Call:
## hypr(~3/4 * mu1 - 1/4 * mu2 - 1/4 * mu3 - 1/4 * mu4, ~3/4 * mu2 - 
##     1/4 * mu1 - 1/4 * mu3 - 1/4 * mu4, ~3/4 * mu3 - 1/4 * mu1 - 
##     1/4 * mu2 - 1/4 * mu4, levels = c("mu1", "mu2", "mu3", "mu4"
## ))
## 
## Hypothesis matrix (transposed):
##     [,1] [,2] [,3]
## mu1  3/4 -1/4 -1/4
## mu2 -1/4  3/4 -1/4
## mu3 -1/4 -1/4  3/4
## mu4 -1/4 -1/4 -1/4
## 
## Contrast matrix:
##     [,1] [,2] [,3]
## mu1  1    0    0  
## mu2  0    1    0  
## mu3  0    0    1  
## mu4 -1   -1   -1

We next assign the contrast matrix to the factor X:

contrasts(simdat$X) <- contr.hypothesis(sumC)
contrasts(simdat$X)
##     [,1] [,2] [,3]
## mu1  1    0    0  
## mu2  0    1    0  
## mu3  0    0    1  
## mu4 -1   -1   -1

Without creating the intermediate hypr object, you can also set the contrasts directly like this:

contrasts(simdat$X) <- contr.hypothesis(
  mu1 ~ (mu1+mu2+mu3+mu4)/4, 
  mu2 ~ (mu1+mu2+mu3+mu4)/4, 
  mu3 ~ (mu1+mu2+mu3+mu4)/4
)
contrasts(simdat$X)
##     [,1] [,2] [,3]
## mu1  1    0    0  
## mu2  0    1    0  
## mu3  0    0    1  
## mu4 -1   -1   -1

Finally, we run the linear regression:

round(coef(summary(lm(DV ~ X, data=simdat))),3)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20      2.236   8.944     0.00
## X1               -10      3.873  -2.582     0.02
## X2                 0      3.873   0.000     1.00
## X3               -10      3.873  -2.582     0.02