Survey sampling
Computer class No. 7 (Imbi Traat)
1 Auxiliary information in sampling stage
Auxiliary information in the sampling stage consists of such variables known for all population elements before the sampling has taken place.
With auxiliary information one can design a sampling procedure which gives more precise estimates, is less costly or makes administration of the survey simpler.
Nowadays there are registers for various populations (people, enterprises, taxpayers, cattle, etc.) including plenty of auxiliary information. Often it is possible to draw auxiliary information from different registers and include into the frame of the current problem.
For StatVillage there is a Housing Register available involving information on the construction year of houses. We merge this information to our frame.
Save Housing Register in your Sampling catalogue. Look at it in SAS.INSIGHT. We have also Permanent Random Numbers in that register. They are used for sample coordination in surveys.
Merge information from Housing Register to the frame of StatVillage by the following code in Programme Editor
/* merging auxiliary information to the frame */
data sampling.frameaux;
merge sampling.frame sampling.housereg;
by block unit;
drop RANUNI; /* this is unnecessary information for us*/
run;
Look at the frame including auxiliary information in SAS.INSIGHT.
2 Stratification
With present available information it is possible to stratify the population
Stratification helps to increase precision of estimates if the study variables are more homogeneous in strata. For example, if it is important to measure INCOME, one should stratify population as high- and low-income earners. For StatVillage it is known that richer people live in North.
Let us decide to have 2 strata (North 1 , South 2). Create a stratum variable in the frame, e.g. in the following way
/* stratum variable by block */
data sampling.framestr;
set sampling.frameaux;
if block le 20 then str=1;
else str=2;
run;
2 Sample allocation problem
Let the overall sample size be given, and let it be the same as you had for SI-sampling (n=97 in my case). We have

For proportional sample allocation we should have

In stratified sampling one has to select independent samples in strata according to pre-specified design. For example, SI-sampling in strata can be performed in SAS.INSIGHT in the following way.
Open frame with stratification variable in SAS.INSIGHT (framestr).
Generate uniform random numbers (Edit -> variables -> other -> RANUNI)
Sort the file by variables STR and RANUNI (menu Triangle -> sort -> first place STR in Y-field then place RANUNI, OK).
First n1 in stratum 1 and first n2 in stratum 2 are sampled units. It remains to take the file with corresponding addresses and get the responses from these addresses.
We know that SI-sampling allocates sample approximately proportionally in all population subgroups. Let us check this in your SI-sample. Create a stratum indicator in your data set sampling.statvil2(?). Use the same programme as for Frame but change the file names in the programme. Look it in SAS.INSIGHT. Then calculate moments for certain variable (Analyze -> Distribution), e.g. TOTINCH, in these strata (put str-variable as grouping variable).
Table 1. Moments of TOTINCH in stratum 1

Table 2. Moments of TOTINCH in stratum 2

Check the sample sizes (approximately proportional allocation)! Means are very different in these strata, saying that this stratification collects similar units in strata, and there is difference between strata, consequently stratification helps to increase precision of estimates correlated with TOTINCH. Standard deviations of a variable are not equal in these strata, consequently proportional allocation (though it is good) is not optimal. What would have been Neyman allocation based on TOTINCH:
,
where
is a standard deviation inside stratum.
This allocation gives minimal variance for the estimated total of TOTINCH, and is also
good for the variables correlated with TOTINCH.
Insert in your Report section of regional stratification in StatVillage with three strata. Calculate proportional allocation given your sample size. Calculate Neyman allocation based on your most important study variable for your strata. Calculate the estimator of total and its variance estimator as if your SI-sample were stratified SI-sample, use your realised sample sizes in strata and your most important study variable (what we do is, in fact, poststratification, except, the variance estimator underestimates the poststratification variance estimator). Compare the precisions of your estimator under SI- and Stratified SI-sampling.
3 Pps sampling, Poisson sample from StatVillage
If the study variable is approximately proportional to some auxiliary variable (size variable) then it is beneficial to sample with probabilities proportional to this auxiliary variable. Estimators based on the study variable will have smaller variances.
Remark: In the case of many study variables usually one cant assume that they all are proportional to an auxiliary variable. Then pps-sampling is not a good method.
Let us want to estimate very precisely income TOTINC. We have belief that income is proportional to the age of the house BUILTH. We find the expected sampling counts (inclusion probabilities for WOR-designs) proportional to BUILTH. Let sample size n=100 (this is expected sample size for Poisson design).
/* inclusion probabilities PR based on size variable BUILTH */
data sampling.pps;
set sampling.frameaux;
link=1;
run;
proc summary data=sampling.pps nway;
class link;
var builth; /* lists variables for which summary statistics are needed */
output out=sdata sum=bu; /* dataset sdata includes variable bu which is sum of builth */
proc print;
run;
data sampling.pps;
merge sampling.pps sdata(keep=link bu);
by link;
pr = 100*builth / bu;
drop link bu;
run;
Use Poisson pps-sampling and generate sample
*Poisson sample;
data sampling.pps;
set sampling.pps;
I=ranbin(0,1,pr);
run;
Look the file pps in SAS.INSIGHT. Order the file by I, those marked by 1 are sampled units. Check whether inclusion probabilities pr among sampled are greater than among non-sampled! Delete non-sampled. Reorder by BLOCK and UNIT. What is your realized sample size? Delete variables Builth, Tenurh, I. Save as sampling.pois
Extract data of sampled units. Open Mini Village in Netscape
http://www.amstat.org/publications/jse/v5n2/schwarz.supp/
Adjust your SAS-file with selected units and Mini Village on the same screen. Mark selected and extract data. Do all the steps through until you have a SAS-file with extracted data.
We continue with this file next time!