在R統計軟件, 進行回歸分析可以很方便地完成, 例如主要的回歸分析:
1.Linear Regression: lm(), glm(family=gaussian)
2.Logistic Regression: glm(family=binomial)
3.泊松回歸: glm(family=poission)
4.多元無序回歸: nnet::multinom()
5.多元有序回歸: MASS::polr()
只要確定了函數, 公式, 就可以完成對應的回歸了!
另外, 回歸分析的一個重點步驟, 就是進行啞變量(Dummy)了, 其實在數據庫內, 將變項的類型設定好, 如計數資料設為int, 分類資料設為factor, 在進行主要的回歸分析時, R會自動進行啞變量的處理, 如:
一個數據庫有4個變項(num, brand, female, age), 在初初讀入時全部都設定為int(數值型), 設定brand及female兩變項為factor(分類型)後, 以female為依變項進行Logistic Regression, 即可得:
> log.fit<-glm(female~brand+age,family = binomial,data = example_logistic_regression)
> summary(log.fit)
Call:
glm(formula = female
~ brand + age, family = binomial, data = example_logistic_regression)
Deviance Residuals:
Min
1Q Median 3Q
Max -1.5523 -1.3217 0.8738 0.9375 1.1586
Coefficients:
Estimate Std. Error z value
Pr(>|z|) (Intercept) 1.08843 1.17784 0.924 0.35544
brand3 0.46076 0.22489 2.049 0.04048 *
brand2 0.55677 0.19261 2.891 0.00384 **
age -0.02747 0.03712 -0.740 0.45928
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 965.47 on 734
degrees of freedom
Residual deviance:
956.86 on 731 degrees of freedomAIC: 964.86
Number of Fisher
Scoring iterations: 4
> log.or
Logistic regression predicting female : 1 vs 0
crude OR(95%CI) adj. OR(95%CI)
brand: ref.=1
3 1.47 (0.99,2.16) 1.59 (1.02,2.46)
2 1.68 (1.17,2.42) 1.75 (1.2,2.55)
age (cont. var.) 1.01 (0.94,1.07) 0.97 (0.9,1.05)
P(Wald's test) P(LR-test)
brand: ref.=1 0.014
3 0.04
2 0.004
age (cont. var.) 0.459 0.459
Log-likelihood = -478.4285
No. of observations = 735
AIC value = 964.8569
當然據我所知, R軟件在有些回歸分析中不能自動啞變量的, 就可參考些文章了!
沒有留言:
張貼留言