R语言 交换假设检验
简单地说,R语言中的交换假设检验是比较两组数值的一种方式。递变假设检验是一种替代方法。
- 独立双样本t检验
- Mann-Whitney U又称Wilcoxon Rank-Sum测试
让我们在R编程中实现这个测试。
为什么使用混合假设检验
- 样本量小。
- 不符合假设(参数方法)。
- 除了比较平均数和中位数的经典方法外,还要测试其他东西。
- 难以估计测试统计量的SE。
混合假设检验步骤
- 指定一个假说
- 选择测试数据(例如:平均数、中位数,等等)。
- 确定测试数据的分布
- 将测试值转换为P值
注:
P值=测试值大于观察到的测试值的排列组合数/排列组合数。
R语言 的实现
- 数据集: 鸡肉饮食数据。该数据集是 “R数据包 “中 “chickwts “数据的一个子集 。在这里 下载该数据集 。
- 假设: 鸡 的 重量与饮食类型无关。
测试-统计学
- 检验统计#1: 两种饮食的平均体重 之差的绝对值 **| Y 1 – Y2 | 。 **这与 独立的双侧 双样本t检验 的测试统计量相同 。
- 检验统计 #2: 两种饮食的中位数重量差异的绝对值 **|中位数 1- 中位数2 | **。
# R program to illustrate
# Permutation Hypothesis Test
# load the data set
d <- read.table(file = "ChickData.csv",
header = T, sep = ",")
# print the dataset
print(d)
# check the names
names(d)
levels(dfeed)
# how many observations in each diet?
table(dfeed)
# let's look at a boxplot of weight gain by those 2 diets
boxplot(dweight~dfeed, las = 1,
ylab = "weight (g)",
xlab = "feed",
main = "Weight by Feed")
# calculate the difference in sample MEANS
mean(dweight[dfeed == "casein"]) # mean for casein
mean(dweight[dfeed == "meatmeal"]) # mean for meatmeal
# lets calculate the absolute diff in means
test.stat1 <- abs(mean(dweight[dfeed == "casein"]) -
mean(dweight[dfeed == "meatmeal"]))
test.stat1
# calculate the difference in sample MEDIANS
median(dweight[dfeed == "casein"]) # median for casein
median(dweight[dfeed == "meatmeal"]) # median for meatmeal
# lets calculate the absolute diff in medians
test.stat2 <- abs(median(dweight[dfeed == "casein"]) -
median(dweight[dfeed == "meatmeal"]))
test.stat2
# Permutation Test
# for reproducability of results
set.seed(1979)
# the number of observations to sample
n <- length(dfeed)
# the number of permutation samples to take
P <- 100000
# the variable we will resample from
variable <- dweight
# initialize a matrix to store the permutation data
PermSamples <- matrix(0, nrow = n, ncol = P)
# each column is a permutation sample of data
# now, get those permutation samples, using a loop
# let's take a moment to discuss what that code is doing
for(i in 1:P)
{
PermSamples[, i] <- sample(variable,
size = n,
replace = FALSE)
}
# we can take a quick look at the first 5 columns of PermSamples
PermSamples[, 1:5]
# initialize vectors to store all of the Test-stats
Perm.test.stat1 <- Perm.test.stat2 <- rep(0, P)
# loop thru, and calculate the test-stats
for (i in 1:P)
{
# calculate the perm-test-stat1 and save it
Perm.test.stat1[i] <- abs(mean(PermSamples[dfeed == "casein",i]) -
mean(PermSamples[dfeed == "meatmeal",i]))
# calculate the perm-test-stat2 and save it
Perm.test.stat2[i] <- abs(median(PermSamples[dfeed == "casein",i]) -
median(PermSamples[dfeed == "meatmeal",i]))
}
# before going too far with this,
# let's remind ourselves of
# the TEST STATS
test.stat1; test.stat2
# and, take a look at the first 15
# permutation-TEST STATS for 1 and 2
round(Perm.test.stat1[1:15], 1)
round(Perm.test.stat2[1:15], 1)
# and, let's calculate the permutation p-value
# notice how we can ask R a true/false question
(Perm.test.stat1 >= test.stat1)[1:15]
# and if we ask for the mean of all of those,
# it treats 0 = FALSE, 1 = TRUE
mean((Perm.test.stat1 >= test.stat1)[1:15])
# Calculate the p-value, for all P = 100,000
mean(Perm.test.stat1 >= test.stat1)
# and, let's calculate the p-value for
# option 2 of the test statistic (abs diff in medians)
mean(Perm.test.stat2 >= test.stat2)
输出
> print(d)
weight feed
1 325 meatmeal
2 257 meatmeal
3 303 meatmeal
4 315 meatmeal
5 380 meatmeal
6 153 meatmeal
7 263 meatmeal
8 242 meatmeal
9 206 meatmeal
10 344 meatmeal
11 258 meatmeal
12 368 casein
13 390 casein
14 379 casein
15 260 casein
16 404 casein
17 318 casein
18 352 casein
19 359 casein
20 216 casein
21 222 casein
22 283 casein
23 332 casein
> names(d)
[1] "weight" "feed"
> levels(dfeed)
[1] "casein" "meatmeal"
> table(dfeed)
casein meatmeal
12 11
> mean(dweight[dfeed == "casein"]) # mean for casein
[1] 323.5833
> mean(dweight[dfeed == "meatmeal"]) # mean for meatmeal
[1] 276.9091
> test.stat1
[1] 46.67424
> median(dweight[dfeed == "casein"]) # median for casein
[1] 342
> median(dweight[dfeed == "meatmeal"]) # median for meatmeal
[1] 263
> test.stat2
[1] 79
> PermSamples[, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] 379 283 380 352 206
[2,] 380 303 258 260 380
[3,] 257 206 379 380 153
[4,] 283 242 222 404 359
[5,] 222 260 325 258 258
[6,] 315 352 153 379 263
[7,] 352 263 263 325 325
[8,] 153 325 315 359 216
[9,] 368 379 344 242 260
[10,] 344 258 368 368 257
[11,] 359 257 206 257 315
[12,] 206 153 404 222 303
[13,] 404 344 303 390 390
[14,] 325 318 318 303 352
[15,] 242 404 332 263 404
[16,] 390 380 257 206 379
[17,] 260 332 216 315 318
[18,] 303 359 352 344 368
[19,] 263 222 242 283 222
[20,] 332 368 260 332 344
[21,] 318 315 283 318 283
[22,] 216 390 390 153 332
[23,] 258 216 359 216 242
> test.stat1; test.stat2
[1] 46.67424
[1] 79
> round(Perm.test.stat1[1:15], 1)
[1] 17.1 32.4 17.6 47.1 56.1 28.9 31.0 40.8 6.8 13.8 9.1 46.5 28.9 50.9 32.7
> round(Perm.test.stat2[1:15], 1)
[1] 61.0 75.0 4.5 59.0 78.0 17.0 62.0 38.5 4.5 16.0 23.0 60.5 63.5 75.0 37.0
> (Perm.test.stat1 >= test.stat1)[1:15]
[1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
> mean((Perm.test.stat1 >= test.stat1)[1:15])
[1] 0.2
> mean(Perm.test.stat1 >= test.stat1)
[1] 0.09959
> mean(Perm.test.stat2 >= test.stat2)
[1] 0.05407