3 STRATIFIED SIMPLE RANDOM SAMPLING
• Suppose the population is partitioned into disjoint sets of sampling units called strata.
If a sample is selected within each stratum, then this sampling procedure is known as
stratified sampling.
• If we can assume the strata are sampled independently across strata, then
(i) the estimator of t or y
U
can be found by combining stratum sample sums or means
using appropriate weights
(ii) the variances of estimators associated with the individual strata can be summed
to obtain the variance an estimator associated with the whole population. (Given
independence, the variance of a sum equals the sum of the individual variances.)
• (ii) implies that only within-stratum variances contribute to the variance of an estimator.
Thus, the basic motivating principle behind using stratification to produce an estimator
with small variance is to partition the population so that units within each stratum are as
similar as possible. This is known as the stratification principle.
• In ecological studies, it is common to stratify a geographical region into subregions that are
similar with respect to a known variable such as elevation, animal habitat type, vegetation
types, etc. because it is suspected that the y-values may vary greatly across strata while
they will tend to be similar within each stratum. Analogously, when sampling people, it
is common to stratify on variables such as gender, age groups, income levels, education
levels, marital status, etc.
• Sometimes strata are formed based on sampling convenience. For example, suppose a
large study region appears to be homogeneous (that is, there are no spatial patterns) and
is stratified based on the geographical proximity of sampling units. Taking a stratified
sample ensures the sample is spread throughout the study region. It may not, however,
lead to any significant reduction in the variance of an estimator.
• But, if the y-values are spatially correlated (y values tend to be similar for neighboring
units), geographically determined strata can improve estimation of population parameters.
Notation: H = the number of strata
N
h
= number of population units in stratum h h = 1, 2, . . . , H
N =
P
H
h=1
N
h
= the number of units in the population
n
h
= number of sampled units in stratum h h = 1, 2, . . . , H
n =
P
H
h=1
n
h
= the total number of units sampled
y
hj
= the y-value associated with unit j in stratum h
y
h
= the sample mean for stratum h
t
h
=
N
h
X
j=1
y
hj
= stratum h total t =
H
X
h=1
N
h
X
j=1
y
hj
=
H
X
h=1
t
h
= the population total
y
hU
=
t
h
N
h
= stratum h mean y
U
=
1
N
H
X
h=1
N
h
X
j=1
y
hj
=
t
N
= the population mean
45