Введение в дисперсионный анализ

⇐ ПредыдущаяСтр 2 из 3Следующая ⇒

tomato <-

data.frame(weight=

c(1.5, 1.9, 1.3, 1.5, 2.4, 1.5, # water

1.5, 1.2, 1.2, 2.1, 2.9, 1.6, # nutrient

1.9, 1.6, 0.8, 1.15, 0.9, 1.6), # nutrient+24D

trt = rep(c("Water", "Nutrient", "Nutrient+24D"),

c(6, 6, 6)))

boxplot(weight~as.factor(trt),data=tomato)

summary(aov(weight ~ trt, data = tomato))

Df Sum Sq Mean Sq F value Pr(>F)

trt 2 0.627 0.3135 1.202 0.328

Residuals 15 3.912 0.2608

anova (lm(weight ~ trt, data = tomato))

Analysis of Variance Table

Response: weight

Df Sum Sq Mean Sq F value Pr(>F)

trt 2 0.6269 0.31347 1.2019 0.328

Residuals 15 3.9121 0.26081

Проведенный анализ с использованием стандартного контраста treatmrnt рассматривает уровни фактора Nutrient и Nutrient+24D как независимые, что не позволяет вычленить влияние 24D. Предложим контраст, который позволяет это сделать – отметим, что результат дисперсионного анализа не меняется, меняются только оценки влияния факторов (мы разделили Nutrient и 24D, оценив вклад последнего).

levels(tomato$trt)

[1] "Nutrient" "Nutrient+24D" "Water"

ct =matrix(c(1,0, 1,1, 0,0),byrow=T,ncol=2)

rownames (ct)<-c("Nutrient","Nutrient+24D","Water")

colnames (ct)<-c("Nutrient","24D")

Nutrient 24D

Nutrient 1 0

Nutrient+24D 1 1

Water 0 0

p.lm1<-lm(weight ~ trt, data = tomato)

Summary(p.lm1)

Call:

lm(formula = weight ~ trt, data = tomato)

Residuals:

Min 1Q Median 3Q Max

-0.5500 -0.3500 -0.1792 0.2750 1.1500

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.75000 0.20849 8.394 4.74e-07 ***

trtNutrient+24D -0.42500 0.29485 -1.441 0.170

trtWater -0.06667 0.29485 -0.226 0.824

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5107 on 15 degrees of freedom

Multiple R-squared: 0.1381, Adjusted R-squared: 0.02321

F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328

Anova (p.lm1)

Analysis of Variance Table

Response: weight

Df Sum Sq Mean Sq F value Pr(>F)

trt 2 0.6269 0.31347 1.2019 0.328

Residuals 15 3.9121 0.26081

p.lm2<-lm(weight ~ trt, data = tomato,contrasts=list(trt=ct))

Summary(p.lm2)

Call:

lm(formula = weight ~ trt, data = tomato, contrasts = list(trt = ct))

Residuals:

Min 1Q Median 3Q Max

-0.5500 -0.3500 -0.1792 0.2750 1.1500

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.68333 0.20849 8.074 7.69e-07 ***

trtNutrient 0.06667 0.29485 0.226 0.824

trt24D -0.42500 0.29485 -1.441 0.170

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5107 on 15 degrees of freedom

Multiple R-squared: 0.1381, Adjusted R-squared: 0.02321

F-statistic: 1.202 on 2 and 15 DF, p-value: 0.328

Anova (p.lm2)

Analysis of Variance Table

Response: weight

Df Sum Sq Mean Sq F value Pr(>F)

trt 2 0.6269 0.31347 1.2019 0.328

Residuals 15 3.9121 0.26081

Data(warpbreaks)

Head(warpbreaks)

breaks wool tension

1 26 A L

2 30 A L

3 54 A L

4 25 A L

summary(aov(breaks ~ wool + tension, data = warpbreaks))

Df Sum Sq Mean Sq F value Pr(>F)

wool 1 451 450.7 3.339 0.07361 .

tension 2 2034 1017.1 7.537 0.00138 **

Residuals 50 6748 135.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova (lm(breaks ~ wool + tension, data = warpbreaks))

Analysis of Variance Table

Response: breaks

Df Sum Sq Mean Sq F value Pr(>F)

wool 1 450.7 450.67 3.3393 0.073614 .

tension 2 2034.3 1017.13 7.5367 0.001378 **

Residuals 50 6747.9 134.96

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova (lm(breaks ~ wool * tension, data = warpbreaks))

Analysis of Variance Table

Response: breaks

Df Sum Sq Mean Sq F value Pr(>F)

wool 1 450.7 450.67 3.7653 0.0582130 .

tension 2 2034.3 1017.13 8.4980 0.0006926 ***

wool:tension 2 1002.8 501.39 4.1891 0.0210442 *

Residuals 48 5745.1 119.69

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

plot.design (breaks ~ wool + tension, data = warpbreaks)

Оценка корреляции двух случайных величин

Набор данных Dreissena_Conchophthirus.dat

'data.frame': 476 obs. of 5 variables:

$ Month:Factor w/ 3 levels "July","May","September": 2 2 2 2 2 2 2 2 2 2 ...

$ Lake:Factor w/ 3 levels "Batorino","Myastro",..: 1 1 1 1 1 1 1 1 1 1 ...

$ Site: Factor w/ 9 levels "S1","S2","S3",..: 3 3 3 3 3 3 3 3 3 3 ...

$ ZMlength: num 14.9 14 13 14 12 14 12 19 16.5 18 ...

$ CAnumber: int 36 30 331 110 4 171 31 887 525 497 ...

cor.test (dat$CAnumber, dat$ZMlength)

Pearson's product-moment correlation

data: dat$CAnumber and dat$ZMlength

t = 11.496, df = 474, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3935877 0.5343949

sample estimates:

cor

0.466946

summary(lm(CAnumber~ZMlength,data=dat))

Call:

lm(formula = CAnumber ~ ZMlength, data = dat)

Residuals:

Min 1Q Median 3Q Max

-1452.9 -383.6 -92.1 159.9 9379.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -665.700 119.022 -5.593 3.77e-08 ***

ZMlength 79.540 6.919 11.496 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 829.5 on 474 degrees of freedom

Multiple R-squared: 0.218, Adjusted R-squared: 0.2164

F-statistic: 132.2 on 1 and 474 DF, p-value: < 2.2e-16

0.466946^2

[1] 0.2180386

plot(log(dat$ZMlength),log(1+dat$CAnumber),pch=20,cex=1.5)

cor.test (log(dat$CAnumber+1), log(dat$ZMlength))

Pearson's product-moment correlation

data: log(dat$CAnumber + 1) and log(dat$ZMlength)

t = 21.517, df = 474, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.6543961 0.7456953

sample estimates:

cor

0.7029297

cor.test (dat$CAnumber, dat$ZMlength, method = "spearman")

Spearman's rank correlation rho

data: dat$CAnumber and dat$ZMlength

S = 6574100, p-value < 2.2e-16

alternative hypothesis: true rho is not equal to 0

sample estimates:

rho

0.6342627

Warning message:

In cor.test.default(dat$CAnumber, dat$ZMlength, method = "spearman") :

Cannot compute exact p-value with ties

cor.test (dat$CAnumber, dat$ZMlength, method = "kendall")

Kendall's rank correlation tau

data: dat$CAnumber and dat$ZMlength

z = 14.931, p-value < 2.2e-16

alternative hypothesis: true tau is not equal to 0

sample estimates:

tau

0.4655232

Критерий хи -квадрат

mice <- matrix(c(13, 44, 25, 29), nrow = 2, byrow = TRUE)

Mice

[,1] [,2]

[1,] 13 44

[2,] 25 29

Chisq.test (mice)

Pearson's Chi-squared test with Yates' continuity correction

data: mice

X-squared = 5.7923, df = 1, p-value = 0.0161

light <- c(12, 40, 45)

dark <- c(87, 34, 75)

very.dark <- c(3, 8, 2)

color.data <- matrix(c(light, dark, very.dark), nrow = 3,

+ dimnames = list(c("Pop1", "Pop2", "Pop3"),

+ c("Light", "Dark", "Very dark")))

Color.data

Light Dark Very dark

Pop1 12 87 3

Pop2 40 34 8

Pop3 45 75 2

Chisq.test (color.data)

Pearson's Chi-squared test

data: color.data

X-squared = 43.434, df = 4, p-value = 8.411e-09

Warning message:

In chisq.test(color.data) : Chi-squared approximation may be incorrect

Задание

Набор данных «Ирис» ( iris. data) - стандартный тестовый набор данных, предложенный Сэром Фишером (Sir Ronald Aylmer Fisher) в 1936 году пример для иллюстрации дискриминантного анализа. Иногда упоминается как набор данных Андерсона (Anderson), который собрал его изучая географическую изменчивость ирисов в Gaspé Peninsula [2]. Набор данных содержит по 50 записей для трех видов ириса (Iris setosa, Iris virginica and Iris versicolor), в каждой записи – 4 числа: Sepal Length (длина чашелистика), Sepal Width (ширина чашелистика), Petal Length (длина лепестка) и Petal Width (ширина лепестка).

Распределение значений по видам показано ниже (виды показаны разным цветом):

Основные численные характеристики распределения приведены ниже:

Ниже приведены гистограммы численных характеристик по видам ириса.

Дата добавления: 2018-11-24; просмотров: 888; Мы поможем в написании вашей работы!

Поделиться с друзьями:

⇐ Предыдущая 123 Следующая ⇒

Мы поможем в написании ваших работ!