An Epidemiologist + Health Statistician in Macao CHINA: 在R內進行複雜樣本的加權處理 (The weight of Complex sample in R)

2015年12月27日星期日

library(survey) #先安裝survey統計包, 並載上R

load(url("http://knutur.at/wsmt/R/RData/small.RData")) #使用的例子

#Create unweighted survey design object 第一步: 先做一個無加權的複雜樣本

small.svy.unweighted<-svydesign(ids = ~1, data=small)

#Rake unweighted survey design 第二步: 製作加權樣本

#First need marginal distributions 首先要知道總體中分層/整群的比例, 通常可在常規的統計資料獲得

#使用survey包內的rake()函數進行加權

#例如分別對姓別(sex)及教育(edu)進行加權

sex.dist<-data.frame(sex=c("M","F"),Freq=nrow(small)*c(0.45,0.55))

edu.dist<-data.frame(edu=c("Lo","Mid","Hi"),Freq=nrow(small)*c(0.30,0.5,0.2))

small.svy.rake<-rake(design = small.svy.unweighted, sample.margins = list(~sex, ~edu), population.margins = list(sex.dist, edu.dist))

#Trim the weights第三步: 如果加權數過小或過大, 根據經驗法則修正為: 下限為0.3, 上限為3

#為防個別個案的權重溢出, 加上指令 "strict=T" 來強加限制在上下限內

summary(weights(small.svy.rake))

small.svy.rake.trim<-trimWeights(small.svy.rake,lower = 0.3,upper = 3,strict = T)

參考資料: http://knutur.at/wsmt/cs-materials.pdf

An Epidemiologist + Health Statistician in Macao CHINA