How to do a weighted T-test in R?

I have df1:

PopDens Score1 Group
93.53455 17.985288 B
137.13861 10.549394 A
35.98619 13.392857 A
89.69800 8.644537 B
16.27796 29.591635 A
25.33346 21.081301 F
89.69800 2.644537 C
46.27796 29.591635 A
25.33346 5.081301 B
36.27796 29.591635 A 1.33346 9.081301 B

I would like to perform a t-test between groups A and B looked at the difference in mean of score1.

However, I want to weight the analysis so that rows with a larger PopDens have a stronger weight in the analysis. For example, I don't want the final row to have as much weight in the analysis as the second row because the population densities are very different.

How is this done?

2 Answers

Below is more like a small summary of my thoughts and quick search. I have never used a weighted t.test before, only weights in linear regression.

There is no clear definition for what would make a weighted t-test. The issue lies with how to use weights in estimating the error because that is the basis of your t-test. You can check out this discussion and maybe this paper on weights in linear regression.

So your data:

df = structure(list(PopDens = c(93.53455, 137.13861, 35.98619, 89.698,
16.27796, 25.33346, 89.698, 46.27796, 25.33346, 36.27796, 1.33346
), Score1 = c(17.985288, 10.549394, 13.392857, 8.644537, 29.591635,
21.081301, 2.644537, 29.591635, 5.081301, 29.591635, 9.081301
), Group = structure(c(2L, 1L, 1L, 2L, 1L, 4L, 3L, 1L, 2L, 1L,
2L), .Label = c("A", "B", "C", "F"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))

We subset on only A and B:

df = subset(df,Group %in% c("A","B"))

And we can compare the results of a t-test and lm:

coefficients(summary(lm(Score1~ Group,data=df))) Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.54343 3.653195 6.170881 0.0004580837
GroupB -12.34532 5.479793 -2.252882 0.0589470215
t.test(df$Score1[df$Group=="B"],df$Score1[df$Group=="A"],data=df) Welch Two Sample t-test
data: df$Score1[df$Group == "B"] and df$Score1[df$Group == "A"]
t = -2.404, df = 6.463, p-value = 0.05007
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: -24.695931765 0.005282865
sample estimates:
mean of x mean of y 10.19811 22.54343

You get a p-value of 0.0589470215 for the effect of difference of B from A. For the t.test 0.05007, it's not crazily different.

Now for a weighted linear regression:

coefficients(summary(lm(Score1~ Group,data=df,weight=df$PopDens))) Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.845885 3.780246 4.7208269 0.00215547
GroupB -5.466244 5.727617 -0.9543663 0.37168503

You can see that the coefficients are estimated differently.. more towards the higher weight samples.

For the weighted t-test offered in package weights:

library(weights)
wtd.t.test(x=df$Score1[df$Group=="A"],y=df$Score1[df$Group=="B"],
weight=df$Score1[df$Group=="A"],weighty=df$Score1[df$Group=="B"],samedata=FALSE)
$test
[1] "Two Sample Weighted T-Test (Welch)"
$coefficients t.value df p.value
2.90701563 6.97938063 0.02283172
$additional
Difference Mean.x Mean.y Std. Err 13.468496 25.884728 12.416232 4.633101 

Apparently it is a frequency weight in this weighted t-test but I am not sure. If you prefer to use this, will be good to read the code in detail since it is not very well documented how the standard errors etc are calculated.

5

If you would have more than 2 groups, you could also do an wighted anova with:

library(stats)
aov(Score1 ~ Group, data = df1, weight = PopDens)

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like