Введение в дисперсионный анализ
tomato <-
data.frame(weight=
c(1.5, 1.9, 1.3, 1.5, 2.4, 1.5, # water
1.5, 1.2, 1.2, 2.1, 2.9, 1.6, # nutrient
1.9, 1.6, 0.8, 1.15, 0.9, 1.6), # nutrient+24D
trt = rep(c("Water", "Nutrient", "Nutrient+24D"),
c(6, 6, 6)))
boxplot(weight~as.factor(trt),data=tomato)
summary(aov(weight ~ trt, data = tomato))
Df Sum Sq Mean Sq F value Pr(>F)
trt 2 0.627 0.3135 1.202 0.328
Residuals 15 3.912 0.2608
anova (lm(weight ~ trt, data = tomato))
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
trt 2 0.6269 0.31347 1.2019 0.328
Residuals 15 3.9121 0.26081
Проведенный анализ с использованием стандартного контраста treatmrnt рассматривает уровни фактора Nutrient и Nutrient+24D как независимые, что не позволяет вычленить влияние 24D. Предложим контраст, который позволяет это сделать – отметим, что результат дисперсионного анализа не меняется, меняются только оценки влияния факторов (мы разделили Nutrient и 24D, оценив вклад последнего).
levels(tomato$trt)
[1] "Nutrient" "Nutrient+24D" "Water"
ct =matrix(c(1,0, 1,1, 0,0),byrow=T,ncol=2)
rownames (ct)<-c("Nutrient","Nutrient+24D","Water")
colnames (ct)<-c("Nutrient","24D")
Ct
Nutrient 24D
Nutrient 1 0
Nutrient+24D 1 1
Water 0 0
p.lm1<-lm(weight ~ trt, data = tomato)
Summary(p.lm1)
Call:
lm(formula = weight ~ trt, data = tomato)
Residuals:
Min 1Q Median 3Q Max
-0.5500 -0.3500 -0.1792 0.2750 1.1500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.75000 0.20849 8.394 4.74e-07 ***
trtNutrient+24D -0.42500 0.29485 -1.441 0.170
trtWater -0.06667 0.29485 -0.226 0.824
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5107 on 15 degrees of freedom
|
|
Multiple R-squared: 0.1381, Adjusted R-squared: 0.02321
F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328
Anova (p.lm1)
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
trt 2 0.6269 0.31347 1.2019 0.328
Residuals 15 3.9121 0.26081
p.lm2<-lm(weight ~ trt, data = tomato,contrasts=list(trt=ct))
Summary(p.lm2)
Call:
lm(formula = weight ~ trt, data = tomato, contrasts = list(trt = ct))
Residuals:
Min 1Q Median 3Q Max
-0.5500 -0.3500 -0.1792 0.2750 1.1500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.68333 0.20849 8.074 7.69e-07 ***
trtNutrient 0.06667 0.29485 0.226 0.824
trt24D -0.42500 0.29485 -1.441 0.170
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5107 on 15 degrees of freedom
Multiple R-squared: 0.1381, Adjusted R-squared: 0.02321
F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328
Anova (p.lm2)
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
trt 2 0.6269 0.31347 1.2019 0.328
Residuals 15 3.9121 0.26081
Data(warpbreaks)
Head(warpbreaks)
breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
summary(aov(breaks ~ wool + tension, data = warpbreaks))
Df Sum Sq Mean Sq F value Pr(>F)
wool 1 451 450.7 3.339 0.07361 .
tension 2 2034 1017.1 7.537 0.00138 **
Residuals 50 6748 135.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova (lm(breaks ~ wool + tension, data = warpbreaks))
Analysis of Variance Table
Response: breaks
Df Sum Sq Mean Sq F value Pr(>F)
wool 1 450.7 450.67 3.3393 0.073614 .
tension 2 2034.3 1017.13 7.5367 0.001378 **
Residuals 50 6747.9 134.96
|
|
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova (lm(breaks ~ wool * tension, data = warpbreaks))
Analysis of Variance Table
Response: breaks
Df Sum Sq Mean Sq F value Pr(>F)
wool 1 450.7 450.67 3.7653 0.0582130 .
tension 2 2034.3 1017.13 8.4980 0.0006926 ***
wool:tension 2 1002.8 501.39 4.1891 0.0210442 *
Residuals 48 5745.1 119.69
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
plot.design (breaks ~ wool + tension, data = warpbreaks)
Оценка корреляции двух случайных величин
Набор данных Dreissena_Conchophthirus.dat
'data.frame': 476 obs. of 5 variables:
$ Month:Factor w/ 3 levels "July","May","September": 2 2 2 2 2 2 2 2 2 2 ...
$ Lake:Factor w/ 3 levels "Batorino","Myastro",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Site: Factor w/ 9 levels "S1","S2","S3",..: 3 3 3 3 3 3 3 3 3 3 ...
$ ZMlength: num 14.9 14 13 14 12 14 12 19 16.5 18 ...
$ CAnumber: int 36 30 331 110 4 171 31 887 525 497 ...
cor.test (dat$CAnumber, dat$ZMlength)
Pearson's product-moment correlation
data: dat$CAnumber and dat$ZMlength
t = 11.496, df = 474, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3935877 0.5343949
sample estimates:
cor
0.466946
summary(lm(CAnumber~ZMlength,data=dat))
Call:
lm(formula = CAnumber ~ ZMlength, data = dat)
Residuals:
Min 1Q Median 3Q Max
-1452.9 -383.6 -92.1 159.9 9379.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -665.700 119.022 -5.593 3.77e-08 ***
ZMlength 79.540 6.919 11.496 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 829.5 on 474 degrees of freedom
|
|
Multiple R-squared: 0.218, Adjusted R-squared: 0.2164
F-statistic: 132.2 on 1 and 474 DF, p-value: < 2.2e-16
0.466946^2
[1] 0.2180386
plot(log(dat$ZMlength),log(1+dat$CAnumber),pch=20,cex=1.5)
cor.test (log(dat$CAnumber+1), log(dat$ZMlength))
Pearson's product-moment correlation
data: log(dat$CAnumber + 1) and log(dat$ZMlength)
t = 21.517, df = 474, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6543961 0.7456953
sample estimates:
cor
0.7029297
cor.test (dat$CAnumber, dat$ZMlength, method = "spearman")
Spearman's rank correlation rho
data: dat$CAnumber and dat$ZMlength
S = 6574100, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.6342627
Warning message:
In cor.test.default(dat$CAnumber, dat$ZMlength, method = "spearman") :
Cannot compute exact p-value with ties
cor.test (dat$CAnumber, dat$ZMlength, method = "kendall")
Kendall's rank correlation tau
data: dat$CAnumber and dat$ZMlength
z = 14.931, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.4655232
Критерий хи -квадрат
mice <- matrix(c(13, 44, 25, 29), nrow = 2, byrow = TRUE)
Mice
[,1] [,2]
[1,] 13 44
[2,] 25 29
Chisq.test (mice)
Pearson's Chi-squared test with Yates' continuity correction
data: mice
X-squared = 5.7923, df = 1, p-value = 0.0161
light <- c(12, 40, 45)
dark <- c(87, 34, 75)
very.dark <- c(3, 8, 2)
color.data <- matrix(c(light, dark, very.dark), nrow = 3,
+ dimnames = list(c("Pop1", "Pop2", "Pop3"),
+ c("Light", "Dark", "Very dark")))
Color.data
Light Dark Very dark
Pop1 12 87 3
|
|
Pop2 40 34 8
Pop3 45 75 2
Chisq.test (color.data)
Pearson's Chi-squared test
data: color.data
X-squared = 43.434, df = 4, p-value = 8.411e-09
Warning message:
In chisq.test(color.data) : Chi-squared approximation may be incorrect
Задание
Набор данных «Ирис» ( iris. data) - стандартный тестовый набор данных, предложенный Сэром Фишером (Sir Ronald Aylmer Fisher) в 1936 году пример для иллюстрации дискриминантного анализа. Иногда упоминается как набор данных Андерсона (Anderson), который собрал его изучая географическую изменчивость ирисов в Gaspé Peninsula [2]. Набор данных содержит по 50 записей для трех видов ириса (Iris setosa, Iris virginica and Iris versicolor), в каждой записи – 4 числа: Sepal Length (длина чашелистика), Sepal Width (ширина чашелистика), Petal Length (длина лепестка) и Petal Width (ширина лепестка).
Распределение значений по видам показано ниже (виды показаны разным цветом):
Основные численные характеристики распределения приведены ниже:
Ниже приведены гистограммы численных характеристик по видам ириса.
Дата добавления: 2018-11-24; просмотров: 888; Мы поможем в написании вашей работы! |
Мы поможем в написании ваших работ!