Padilla, Alberto
Working Paper
An unbiased estimator of the variance of simple
random sampling using mixed random-systematic
sampling
Working Papers, No. 2009-13
Provided in Cooperation with:
Bank of Mexico, Mexico City
Suggested Citation: Padilla, Alberto (2009) : An unbiased estimator of the variance of simple random
sampling using mixed random-systematic sampling, Working Papers, No. 2009-13, Banco de
México, Ciudad de México
This Version is available at:
https://hdl.handle.net/10419/83776
Standard-Nutzungsbedingungen:
Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen
Zwecken und zum Privatgebrauch gespeichert und kopiert werden.
Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle
Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich
machen, vertreiben oder anderweitig nutzen.
Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen
(insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten,
gelten abweichend von diesen Nutzungsbedingungen die in der dort
genannten Lizenz gewährten Nutzungsrechte.
Terms of use:
Documents in EconStor may be saved and copied for your personal
and scholarly purposes.
You are not to copy documents for public or commercial purposes, to
exhibit the documents publicly, to make them publicly available on the
internet, or to distribute or otherwise use the documents in public.
If the documents have been made available under an Open Content
Licence (especially Creative Commons Licences), you may exercise
further usage rights as specified in the indicated licence.
Banco de M´exico
Documentos de Investigaci´on
Banco de M´exico
Working Papers
N
2009-13
An Unbiased Estimator of the Variance of Simple
Random Sampling Using Mixed Random-Systematic
Sampling
Alberto Padilla
Banco de M´exico
November, 2009
La serie de Documentos de Investigaci´on del Banco de M´exico divulga resultados preliminares de
trabajos de investigaci´on econ´omica realizados en el Banco de exico con la finalidad de propiciar
el intercambio y debate de ideas. El contenido de los Documentos de Investigaci´on, as´ı como las
conclusiones que de ellos se derivan, son responsabilidad exclusiva de los autores y no reflejan
necesariamente las del Banco de exico.
The Working Papers series of Banco de exico disseminates preliminary results of economic
research conducted at Banco de M´exico in order to promote the exchange and debate of ideas. The
views and conclusions presented in the Working Papers are exclusively the responsibility of the
authors and do not necessarily reflect those of Banco de M´exico.
Documento de Investigaci´on Working Paper
2009-13 2009-13
An Unbiased Estimator of the Variance of Simple
Random Sampling Using Mixed Random-Systematic
Sampling
*
Alberto Padilla
Banco de M´exico
Abstract
Systematic sampling is a commonly used technique due to its simplicity and ease of imple-
mentation. The drawback of this simplicity is that it is not possible to estimate the design
variance without bias. There are several ways to circumvent this problem. One method is
to suppose that the variable of interest has a random order in the population, so the sam-
ple variance of simple random sampling without replacement is used. By means of a mixed
random - systematic sample, an unbiased estimator of the population variance for simple
random sample is proposed without model assumptions. Some examples are given.
Keywords: Variance estimator; Systematic sampling; Simple random sampling; Random
order.
JEL Classification: C80, C83.
Resumen
El muestreo sistem´atico es un etodo ampliamente usado en la pr´actica debido a su sencillez.
Empero, tal sencillez tiene un costo, no es posible estimar insesgadamente la varianza de
dicho dise˜no muestral. Hay varias formas de tratar este problema. Una de ellas consiste en
suponer que la variable de inter´es tiene un orden aleatorio en la poblaci´on, con lo cual puede
emplearse el estimador de la varianza bajo muestreo aleatorio simple. En el presente trabajo
se propone un estimador insesgado para la varianza poblacional del muestreo aleatorio simple
sin suponer modelo alguno, empleando un muestreo mixto aleatorio-sistem´atico. Se ilustra
el m´etodo con algunos ejemplos.
Palabras Clave: Estimador de varianza; Muestreo sistem´atico; Muestreo aleatorio simple;
Orden aleatorio.
*
The author would like to thank Ignacio endez from IIMAS-UNAM, research seminar participants at
Banco de M´exico and two reviewers from Banco de M´exico for their useful comments and suggestions.
Direcci´on General de Emisi´on. Email: [email protected]
1. Introduction
Systematic sampling is a commonly used technique due to its simplicity and
operational convenience. The main disadvantage is the non-existence of a design
unbiased variance estimate of the sample mean with a single systematic sample. Several
approaches have been proposed to overcome this difficulty. One of them treats the
systematic sample as if it were drawn from a population in random order, so the
formula of the variance estimator of the mean under simple random sampling without
replacement, hereinafter srswor, applies, Cochran (1986). In another approach, a model
is used for the variable of interest and, consequently, a specific formula for the
estimator of the model variance has to be obtained. From the design perspective of a
survey, one can also apply a random permutation to the elements of the population
before the sample is drawn. With this method the variance estimator )
ˆ
(
ˆ
yv
srswor
is used,
although this procedure is not feasible in many surveys. Another class of methods
supplements the systematic sample with another systematic sample or a simple random
sample. For a thorough discussion of these strategies see Wolter (1985) or Chaudhuri &
Stenger (2005). In one of these methods a simple random sample is selected first, and in
the remaining population a systematic sample is extracted, Leu & Tsui (1996) and
Huang (2004). Other systematic sampling methods, called ‘Markov sampling’, have
been proposed, see Sampath & Uthayakumaran (1998) and the references cited therein.
Unfortunately, these methods cannot be applied to a population containing a large
number of elements and the population size has to be a multiple of the sample size. In
Sampath & Uthayakumaran (1998), for example, the sample size must be even. These
are very stringent conditions in large surveys and have not been used extensively in
applied work. All the methods above mentioned and its merits have been examined in
detail in the literature and shall not be reviewed here.
A mixed random-systematic sampling method is proposed in which the population
mean and variance of the mean, under srswor, are unbiasedly estimated by the sample
1
mean and a simple expression for the variance
1
. This last expression can be used
without assuming that the sample was drawn from a population in random order or a
random permutation has been applied to the population before the sample was
extracted, preventing people to fall in PISE, an acronym coined by Valliant (2007),
which stands for ‘pretend it’s something else’. It is worth mentioning that, compared to
systematic sampling and similar methods, no gain in efficiency is expected with the
proposed method, since it coincides with the population mean and variance of a srswor.
A fair comparison of the proposed method is with the estimator of the variance between
elements used under the random order approach in systematic sampling.
The article is organized as follows. Definitions, notation and a brief overview of
finite population sampling are given in Section 2. Standard practices regarding the
estimation of the design variance under systematic sampling are reviewed in Section 3.
In this section, expressions for the bias and relative bias of the estimator of the variance
between elements of the random order approach are given. To the author´s best
knowledge, these expressions have not appeared previously in the literature. Section 4
contains the sampling procedure and an example. The estimators for the population
mean and variance )
ˆ
(yv
srswor
are presented in Section 5. Finally, the method is illustrated
with numerical examples.
2. Finite population sampling
There are two types of surveys, descriptive or analytical. The former refers to the
estimation of quantities such as totals, means, proportions and ratios, while the latter to
the use of models based on the results of a survey. The formulas developed in this paper
are of the descriptive type.

1
ThisisanextendedversionofanarticlepresentedbytheauthorinPuebla,Padilla(2009).
2
In this article it is assumed that all variability stems from sampling error, so any
errors caused by faulty measurement, non-response and other nonsampling sources are
ignored. It is also supposed that the design is noninformative. An informative design is
one in which the probability of selection of the elements in the sample depends
explicitly on the values of the study variables. As a matter of fact, the latter is an
assumption made in almost all practical survey work not usually mentioned in books or
articles.
It is also assumed that a frame exists from which a sample will be drawn.
2.1 Notation, population and sample
Let U denote a finite population of N elements labeled k=1,…,N, 1<N. It is
customary to represent the finite population by its label k as: U={1,2,…,k,…,N}.
Moreover, there is a one to one correspondence between the labels of U and the labels
of the frame.
The variable under study will be represented by and will be the value of
for the kth population element,
y
k
y
y
Uk
.
The sample will be denoted by s, a subset of U of size 1<n<N, and will be
represented by a column vector . In this case, I
N
Nk
IIII }1,0{),...,,...,(
1
k
is an
indicator random variable and it is equal to 1 if the kth element is in the sample and 0
otherwise. It is worth mentioning that this indicator variable is the random element in
finite population sampling and is a num
ber. So, the density function induced by the
design is discrete. This approach is also known in the literature as design-based
sampling.
k
y
3
2.2 Estimation
The objective is to estimate a function t that depends on the y
k
,
. For example, a total is written as ),...,,...,(
1 Nk
yyytt
N
k
kU
yy
1
. Since we are
interested in estimating a total, from the design-based approach, it is customary to use
the Horvitz-Thompson estimator, HTE, Horvitz & Thompson (1952). This estimator has
the following expression:
.0 with
k
,
ˆ
11
n
k
k
k
N
k
k
kk
U
yyI
y
In this formula,
)1(
kk
IP
is the first-order inclusion probability. For variance computation and
estimation it is also necessary to determine the second-order inclusion probabilities,
)1(
lkkl
IIP
.
The variance of a HTE is,

U
lklkkl
U
lklkU
yyyyIIcyv
ˆˆ
)(
ˆˆ
),()
ˆ
(
.
An unbiased estimator of this variance is, provided that 0
kl
:
l
l
k
k
s
kl
lkkl
s
lklkU
yy
yyIIcyv

ˆˆ
),(
ˆ
)
ˆ
(
ˆ
In these expressions, denote the population and estim
ated
covariances respectively, between the sample indicator variables.
),(
ˆ
and ),(
lklk
IIcIIc
Remark 2.2.1: It is worth mentioning that in finite population sampling, the first two
moments are well defined for designs used in practice, so there is no need to include
this fact in the rest of the article.
Remark 2.2.2: Estimation in finite populations can also be made under a different
approach known in the literature as model-based design in which it is supposed that the
finite population is drawn from an infinite population (superpopulation), see Valliant et
4
al. (2000). The design and model based methods can be used together in what is
denominated combined sampling, see Brewer (2002).
3. Standard practices in systematic sampling
As it was mentioned in the introduction, there is no design unbiased variance
estimates of the variance of the sample mean with a single systematic sample, so in
practice the following strategies, among others, are used.
3.1 During the design stage of a survey
D1) Supplement the systematic sample with another sample.
D2) Apply a random permutation to the elements of the population before the
sample is extracted, so under all possible permutations of the population, the
expectation of the design variance is the same as the variance under srswor. This
result is due to Madow & Madow (1944).
Remark 3.1.1: A comparison of the efficiency of some designs of the D1 type, can
be found in Zinger (1980), Cochran (1986) and Wolter (1985).
3.2 Model for the structure of the variable of interest
Postulate a model for the structure of the variable under study before extracting the
systematic sample and construct the variance estimator under this model. In this case,
two models are routinely employed:
M
sc
) Serial correlation: in some settings, there is evidence of similarities between
neighboring elements in the population with respect to the variable of interest and
this similarity diminishes as two elements are far apart from each other.
5
M
ro
) Random order model in an infinite population: the finite population is
considered as a random sample from an infinite population (superpopulation). If the
variates Niy
i
,...,1 ,
, are drawn from a superpopulation in which
)(
iM
yE ,
and
22
)
i
(
iM
yE
jiyyE
jiM
,0))((
, it is known as a population in
random order. In these expressions, E
M
refers to expectation under the assumed
model. The result of this is, see Cochran (1986), that
))
ˆ
(
ˆ
( yv
srsworM
))
ˆ
(
ˆ
( EyvE
sysM
,
where sys refers to systematic sampling. Under this model, it is assumed that there
is no relationship between the variable under study and the order of the elements in
the frame, so one treats a systematic sample from a list, sorted in a specific order, as
if the list were randomly ordered.
Remark 3.2.1: A comparison of the efficiency of models M
sc
and M
ro
, can be found
in Wolter (1985) and Chaudhuri & Stenger (2005).
3.3 Bias of the random order approach (M
ro
)
Under the M
ro
approach, the estimator of the variance of the mean under simple
random sampling,
)1(
ˆ
)1()
ˆ
(
ˆ
2
nsNnyv
syssrswor
, is used. In this expression,
stands for the variance between elements of the systematic sample. This is a reasonable
strategy whenever there is information about the random order of the elements in the
population. The problem is that it is easy to fall in PISE and work with a biased
estimator of the variance or to routinely apply the simple random estimator without
having enough information about the ordering of the elements in the population. To
assess this approach, in the following theorem the bias and relative bias of the variance
estimator are obtained. Suppose that
2
ˆ
sys
s
nNk
is an integer and
n
j
iijisys
nyys
1
22
,
)1()
ˆ
(
ˆ
, nyy
n
j
iji
1
ˆ
, )1()(
1
22
NyyS
N
j
UjU
,
Nyy
N
j
jU
1
and
2
1
1
1
)1)(1())((2
U
k
l
n
i
n
ij
UljUli
SNnyyyy


, where
is
the intraclass correlation coefficient, Cochran (1986).
6
Theorem 1: Under systematic sampling the expected value of the estimator
is
2
,
ˆ
isys
s
2
)1(
1
U
S
N
N
.
Corollary 1.1: The relative bias of the estimator is
2
,
ˆ
isys
s
1)1(
1
N
N
.
Corollary 1.2: is a linear decreasing function of ρ, which achieves its
maximum at
)
ˆ
(
2
,isys
sE
)1(1
n
, its minimum at
1
and whenever
22
,
)
ˆ
(
Uisys
SsE
)1(1 N
. The maximum and minimum values of are
)
,isys
ˆ
(
2
sE
N
N
n
n
S
U
1
1
2
and
cero respectively.
Corollary 1.3: The expected value and relative bias of the estimator can also be
expressed as and , where
2
,
ˆ
isys
s
2
)1(
U
S
is the measure of homogeneity proposed by
Särndal et al. (1992).
Proof: see the Appendix.
Remark 3.3.1: It can be seen from corollary 1.2 that overestimates
for
)
ˆ
(
2
,isys
sE
2
U
S
)
1
1
,
1-n
1-
[
N
.
7
4. Design
4.1 Definition of mixed random-systematic sampling
Following the design based approach, we consider a population U, with N elements,
. From this population a sample of size n, 1<n<N, is drawn by means of a
mixed random-systematic sample, mrss. That is, a srsrwor of size 1 is first selected
from the elements of U and then m elements, m2, are drawn from the N-1 remaining
elements of U using circular systematic sampling, Murty & Rao (1988). For brevity,
this method shall be denoted by mrss(1,m). The number of samples under this design is
.
Nky
k
,...,1 ,
)1( NN
Remark 4.1.1: When
mN )1(
is an integer, circular and linear systematic sampling
coincide, Murty & Rao (1988), so the systematic sample can also be extracted by the
latter method. In this case there are repeated circular systematic samples; nonetheless,
the point estimators of the mean and element variance, which are built in the next
section, continue to be unbiased after suppressing information.
Remark 4.1.2: The number of samples under a mrss(1,m) design, after eliminating
repeated systematic samples, is
mNN )1(
if
mN )1(
is an integer and in
other case. For further details see Murthy & Rao (1988).
)1( NN
4.2 Circular systematic sampling
In order to obtain a circular systematic sample, css, of size 1<m<M from a
population with M elements, one proceeds as follows:
Step 1: compute mNk
m
)1( ; if is not an integer, round it to the nearest integer,
m
k
Step 2: select a random integer between 1 and M, say r, this is the first element in the
css,
8
Step 3: determine the next numbers in the css,
m
jkr
, for . If
consider the list as circular and assign the numbers until the sample size is
achieved.
}1,...,1{ mj
Mjkr
m
Remark 4.2.1: this procedure can be easily implemented in a spreadsheet or in the R
system.
Example 1: let U be a population of size N=7 and suppose a sample of size n=3 is to
be drawn using a mrss(1,2). In this case m=2 and there are 7(7-1)=42 samples. The
indices for the possible samples are:
Table 1
1 2 5 2 1 5 3 1 5 4 1 5 5 1 4 6 1 4 7 1 4
1 3 6 2 3 6 3 2 6 4 2 6 5 2 6 6 2 5 7 2 5
1 4 7 2 4 7 3 4 7 4 3 7 5 3 7 6 3 7 7 3 6
1 5 2 2 5 1 3 5 1 4 5 1 5 4 1 6 4 1 7 4 1
1 6 3 2 6 3 3 6 2 4 6 2 5 6 2 6 5 2 7 5 2
1 7 4 2 7 4 3 7 4 4 7 3 5 7 3 6 7 3 7 6 3
The first number in each entry refers to the srswor selection and the following two
correspond to the systematic sample.
9
5. Point estimators
As it was noted by Huang (2004), in mixed random systematic sampling the HTE
0 ,1
ˆ
1
k
n
k
kk
yNy
can be used to estimate the population mean, provided that
N is known. To compute this estimator, we only need to determine the first-order
inclusion probabilities.
Theorem 2: Under mrss(1,m), the first-order inclusion probabilities, π
k
, are equal to
Nn
, for all .
Nk ,...,1
Proof: see the Appendix.
Corollary: For an mrss(1,m) design, the HTE is the usual sample mean.
Proof: it follows immediately by substituting Nn
k
in the expression of the HTE of
the mean.
Remark 5.1: The mrss(1,m) estimator of the mean can also be written as a weighted
sum,
srsr
yyy
ˆˆ
,
, with
nmn
,1
. The first term of the sum refers to the
value of
y
obtained by srswor, while the second one is the sample mean of the
systematic sample. This is also known as a Zinger estimator, Ruiz-Espejo (1997).
Remark 5.2: The mrss(1,m) estimator of the mean is unbiased because it is a HTE.
The most important result of this article is expressed in the next theorem.
Theorem 3: Under mrss(1,m), an unbiased estimator of the population variance
between elements,
)1()(
1
22
Nyys
N
k
UkU
, is:
m
yy
s
m
k
ksr
sr
2
)(
ˆ
1
2
,
2
,
,
10
where is the value of the variable selected by srswor, are the values of the
elements selected by the circular systematic sample and
r
y
ks
y
,
U
y is the population mean.
Proof: see the Appendix.
Corollary: Under mrss(1,m), an unbiased estimator or the variance of the mean of
srswor, )
ˆ
(yv
srswot
, is given by the following expression,
2
,,
ˆ
)1(
)
ˆ
(
ˆ
srsrsrswor
s
n
Nn
yv
.
Proof: immediate from the property of expectations,
)()( XcEcXE
, where
nNnc )1(
.
Remark 5.3: There is no assumption about random order in the population and there
was no need for applying a permutation before the sample was drawn. To put this
briefly, the mrss(1,m) design provides a simple expression for the variance estimation
without pretending it is something else, Valliant (2007).
Remark 5.4: In the expression
)
ˆ
(
ˆ
,srsrswor
yv
one can use a sample size m to estimate
it.
Remark 5.5: Zinger (1980) proposed an unbiased estimator of the variance between
elements using partially systematic sampling in which one first selects a systematic
sample and then a srswor from the remaining population. Unfortunately, the formula
proposed by Zinger is quite complex.
11
6. Numerical example
Example 2: let U be the population of example 3.4.2, pages 80-82, Särndal et al.
(1992). This population has N=100 elements and the variable y takes the values 1,
2,…,100. Using systematic sampling with n=10 there are N/n=10 samples and the
population mean
U
y and variance between elements
are 50.5 and 841.67
respectively. As simple random sampling does not take into account the ordering of the
population, the variance of the mean estimator under this design is
2
U
S
75.75)1()
ˆ
(
2
nSNnyv
U
srswor
. In Tables 2 to 5 there are four orderings of the same
population which have different values of the intraclass correlation coefficient. For each
ordering and for all samples under systematic sampling, we present the values of the
sample mean,
sys
y
ˆ
, the estimator of the variance between elements, , and the
estimator of the variance of the sample mean under the random order assumption,
2
ˆ
sys
s
)
ˆ
(
ˆ
sysro
yv
. Under the random order assumption, the estimators for every systematic
sample and were computed using the following expressions:
2
ˆ
sys
s
ro
v
ˆ
22
,
2
,
)110()
ˆ
ˆ
isyskisys
ys
10
1
(
k
y
and
10
ˆ
)100 s101()
ˆ
(
ˆ
2
,,
jsysjsysro
yv
. The labels s-1,
s-2,…,s-10 correspond to the results of sample 1 to sample 10. The last column has the
expected values of the sample means and variances,
)
ˆ
(
sys
yE
, and
respectively.
)
ˆ
(
2
sys
sE
)
ˆ
(
ro
vE
Table 2
Population A: perfect linear trend in the values y
k
, roh= -0.10.
s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8 s-9 s-10
sys
y
ˆ
46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0 55.0
50.5
2
ˆ
sys
s
916.7 916.7 916.7 916.7 916.7 916.7 916.7 916.7 916.7 916.7
916.7
ro
v
ˆ
82.5 82.5 82.5 82.5 82.5 82.5 82.5 82.5 82.5 82.5
82.5
12
Table 3
Population B: a minimal variance ordering for systematic sampling, roh= -0.11.
s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8 s-9 s-10
sys
y
ˆ
50.5 50.5 50.5 50.5 50.0 50.5 50.5 50.5 50.50 50.50
50.5
2
ˆ
sys
s
989.2 969.2 951.4 935.8 922.5 911.4 902.5 895.8 891.4 889.2
925.8
ro
v
ˆ
89.0 87.2 85.6 84.2 83.0 82.0 81.2 80.6 80.2 80.0
83.3
Table 4
Population C: a large positive roh value, roh= 0.989.
s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8 s-9 s-10
sys
y
ˆ
5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5
50.5
2
ˆ
sys
s
9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2
9.2
ro
v
ˆ
0.83 0.83 0.83 0.83 0.83 0.83 0.83 0.83 0.83 0.83
0.83
Table 5
Population D: a random ordering, roh= -0.015.
s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8 s-9 s-10
sys
y
ˆ
44.3 34.8 40.7 61.2 48.8 59.5 47.6 58.7 58.4 51.0
50.5
2
ˆ
sys
s
720.9 420.0 1014.7 948.2 494.4 948.7 1222.5 522.7 780.5 1388.4
846.1
ro
v
ˆ
64.9 37.8 91.3 85.3 44.5 85.4 110.0 47.0 70.2 125.0
76.1
In order to make a comparison between the strategy of estimating the variance
between elements assuming random ordering of the population in systematic sampling
and mixed random-systematic sampling, for populations A to D, a mrss(1,9) was used.
13
In this case, there are 100(100-1)=9,900 possible samples under mixed random-
systematic sampling. For each population, the 9,900 samples were generated and the
coefficient of variation of the variance between elements, , was computed to assess
the performance of the estimator of the variance.
2
,
ˆ
sr
s
Table 6
A B C D
Population mean
U
y =
50.5 50.5 50.5 50.5
2
U
S
=
841.7 841.7 841.7 841.7
)
ˆ
(yv
srswor
= 75.75 75.75 75.75 75.75
Systematic sampling:
Intraclass correlation=
-0.10 -0.11 0.989 -0.015
Random order estimator =
2
ˆ
sys
S
916.
7 925.8 9.2 846.1
Relative bias ( )=
2
ˆ
sys
S
8.9%
10.0% 98.9% 0.5%
Variance estimator
)
ˆ
(
ˆ
sysro
yv
=
82.5 83.3 0.83 76.1
Coefficient of variation (
sys
y
ˆ
)=
6.0% 0% 60.0% 17.7%
Coefficient of variation ( )=
2
ˆ
sys
s
0%
3.7% 0% 37.7%
Mixed random-systematic sampling:
Variance estimator
)
ˆ
(
ˆ
,srsrswor
yv
=
75.75 75.75 75.75 75.75
Coefficient of variation (
sr
y
,
ˆ
)=
7.7% 7.7% 7.7% 21.3%
Coefficient of variation ( )=
2
,
ˆ
sr
s
46.0%
46.3% 46.6% 60.9%
In Table 6, the letters at the top of each column correspond to populations from
Tables 2 to 5. Comparing the variance estimators
)
ˆ
(
ˆ
sysro
yv
,
)
ˆ
(
ˆ
,srsrswor
yv
and the
coefficients of variation of the estimators of the population mean and variance between
elements for both designs, we can see that the estimators under the random order
assumption used in systematic sampling, behave erratically and depend heavily on the
order of the population. Mixed random-systematic sampling performs well for
populations A through C; nevertheless, for population D the sampling distributions of
14
sr
y
,
ˆ
and have more variation than their counterpart in systematic sampling. This is
due to the presence of influential observations in the distribution of the .
2
,
ˆ
sr
s
2
,
ˆ
sr
s
7. Summary
By means of a mixed random-systematic sample, an unbiased estimator of the population
variance for simple random sampling without replacement has been proposed. It was shown
that there is no need to suppose random ordering of the population or to apply a
permutation before a systematic sample is drawn in order to use the proposed estimator of
the population variance between elements. It was also shown that the bias and relative bias
of the estimator of the variance between elements under systematic sampling with the
assumption of random ordering of the population depend on the intraclass correlation
coefficient.
15
Appendix
Proof of Theorem 1:
Suppose that , and k and n are integers.
nkN Nn 1
Note that the variation between elements in the population can be decomposed as:


N
i
k
i
Ui
k
i
n
j
iijUi
yykyyyy
11
2
11
22
)()()(.
This is the decomposition of the total variation into the variation within systematic samples
and the variation between systematic samples, as it is done in the standard one-way analysis
of variance and can be expressed as:
SSBSSWSST
Here, SS represents sums of squares; T, total; W, within and B, between. The proof consists
in computing the expectation of the sample variance between elements of the systematic
sample, )1()
ˆ
(
1
22
,
nyys
n
j
iijisys
.
)1(
ˆ
)1(
ˆ
ˆ
)
ˆ
(
1
2
1
2
1
2
11
2
1
2
,
2
,


nk
yny
nk
yny
k
s
sE
k
i
i
N
i
i
k
i
i
k
i
n
j
ij
k
i
isys
isys
We add
UU
yknyN in the last expression and noting that
kNnk
)1(
,
)(
11
)
ˆ
(
11
)
ˆ
(
2
1
222
,
SSWSST
k
N
S
k
N
N
yyk
k
N
S
k
N
N
sE
U
k
i
UiUisys
Recalling that , we have
2
)1(
U
SNSST
k
N
SSW
sE
isys
)
ˆ
(
2
,
.
16
This is the intra-sample variance proposed by Särndal et al. (1992, p. 79). This authors also
showed that
SS
T
SSW
n
n
1
1
. Solving this equation for SSW, substituting into
and using the fact that , the result follows.
)
ˆ
(
2
,isys
sE
Nkn
Proof of Corollary 1. 1:
It follows immediately by simplifying
2
22
,
)
ˆ
(
U
Uisys
S
SsE
, provided that .
0
U
S
2
Proof of Corollary 1. 2:
Recall that in the design based approach, N and are constants, so the expression
is linear in ρ.
2
U
S
)
ˆ
(
2
,
isys
sE
As it has been shown elsewhere, see for example Kish (1965), the minimum value of ρ is
)1(1 n
and the maximum is 1. Substitute this values in to obtain the maximum
and minimum values. On the other hand, solving , for ρ implies that
)
ˆ
(
2
,isys
sE
)
ˆ
(
2
,
isys
s 0E
)1(1 N
.
Proof of Corollary 1. 3:
Särndal et al. (1992, p. 79) showed that
SS
T
SSW
k
N
N
1
1
. Solving this equation for SSW
we have that . Substituting this expression into the formula for the
intra-sample variance, the result follows from the expected value of .
2
)1)((
U
SkNSSW
)
ˆ
(
2
,
isys
sE
The formula for the relative bias in terms of the measure of homogeneity is obtained by
computing
2
22
,
)
ˆ
(
U
Uisys
S
SsE
in terms of
.
17
Proof of Theorem 2:
Case 1: If mN )1( is an integer.
The first element in the sample is selected with probability
N1
and an element is included
in the circular systematic sample with probability
)1()1(
NNmN
. The factor
NN )1( corresponds to those elements of the population not selected in the srswor of size
1, and
)1( Nm is the probability of inclusion of an element under css, see Murty & Rao
(1988). It follows that for ,
Nk ,...,1
N
n
N
n
N
N
m
N
N
N
k
11
1
11
.
Case 2: If
mN )1(
is not integer.
The proof is equal, since the first-order inclusion probability of an element under css is
and the result follows.
)1/( Nm
Proof of Theorem 3:
Case 1: N-1 even and eliminating duplicated systematic samples.
Let ns denote the number of possible samples under an mrss(1,m) design.
m
yy
NN
m
sE
ns
j
m
k
jksjr
sr
2
)(
)1(
)
ˆ
(
11
2
,,,
2
,


Note that for every random selection between 1 and N, say k, there are N(N-1)/m systematic
samples and all elements of population U, except the k-th random number, appear once (for
brevity, this N(N-1)/m possible samples will be denominated as a kth-block). After doing
some algebra, a kth-block has the following form:
18
m
yyyyyyyymy
N
ki
ik
k
i
ikNkkk
2
22......
1
1
1
22
1
2
1
2
1
2
.
The sum of the kth-blocks from 1 to N is equal to:


1
1
22
1
22
1
4)...)(1()...(
1
N
I
N
ij
jiNN
yyyyNyym
m
N
We substitute this value in the expectation of the sample element variance:
)1(
2)1(
)
ˆ
(
1
11
2
2
,


NN
yyyN
sE
N
i
N
ij
ji
N
k
k
sr
Using the identity,

1
11
22
1
2)(
N
i
N
ij
ji
N
k
k
N
k
k
yyyy , the last expression turns out to
be:
1)1(
)()1(
)
ˆ
(
2
1
2
1
22
11
2
2
,
N
yNy
NN
yyyN
sE
U
N
k
k
N
k
k
N
k
k
N
k
k
sr
, which completes the
proof.
Case 2: N-1 odd.
Note that for every random selection between 1 and N, say k, there are (N-1) systematic
samples and all elements of population U, except random number k, appear m times (for
brevity, this (N-1) possible samples will be denominated as a kth-block). After doing some
algebra, a kth-block has the following form:
m
ymyymyyyyymmyN
N
ki
ik
k
i
ikNkkk
2
22)......()1(
1
1
1
22
1
2
1
2
1
2
The sum of the kth-blocks from 1 to N is equal to:

1
1
22
1
22
1
4)...()1()...()1(
N
I
N
ij
jiNN
yymyymNyymN
19
Using the same identity for the square of a sum as in the previous case and replacing this
value in the expectation of the sample element variance the result follows.
Case 3: N-1 even and without eliminating duplicated systematic samples.
Same proof as case 2.
20
References
Brewer, K. (2002) Combined Survey Sampling Inference: Weighing of Basu´s elephant,
London: Arnold.
Chaudhuri, A. & Stenger, H.(2005) Survey Sampling: theory and methods, 2
nd
ed.,
Chapman & Hall/CRC.
Cochran, W. (1986) Técnicas de Muestreo, Ed. CECSA, México.
Horvitz, D.G. & Thompson, D.J. (1952) A generalization of sampling without replacement
from a finite universe, Journal of the American Statistical Association, Vol. 47, No. 260,
pp. 663-685.
Huang, K. (2004) Mixed random systematic sampling designs, Metrika, 59, pp. 1-11.
Kish, L. (1965) Survey sampling, John Wiley & Sons, New York.
Leu, C. & Tsui, K. (1996) New partially systematic sampling, Statistica Sinica, 6, pp. 617-
630.
Madow, G. W. & Madow, L. H. (1944) On the theory of systematic sampling, I, Annals of
Mathematical Statistics, 25, pp. 1-24.
Murthy, M.N. & Rao, T.J. (1988) Systematic Sampling, Chapter 7 in Handbook of Statistics
6: Sampling, ed. by C.R. Rao, Amsterdam: North Holland.
Padilla, Terán, A. M. “Un estimador insesgado de la varianza del muestreo aleatorio
simple usando un diseño mixto aleatorio-sistemático”. Memorias electrónicas en extenso de
la 2ª Semana Internacional de la Estadística y la Probabilidad, Puebla de Zaragoza, Puebla,
México. Julio 2009, CD ISBN: 978-607-487-035-0.
Ruiz-Espejo, M. (1997) Uniqueness of the Zinger strategy with estimable variance: Rana-
Singh estimator, Sankhya, Volume 59, Series B, pp. 76-83.
21
22
Sampath, S. & Uthayakumaran, N. (1998) Markov systematic sampling, Biometrical
Journal, Vol. 40, Issue 7, pp. 883-895.
Särndal, C.E., Swensson, B. & Wretman, J.H. (1992) Model Assisted Survey Sampling,
Springer-Verlag, New York.
Tillé, Y. (2006) Sampling Algorithms, Springer-Verlag, New York.
Valliant, R., An Overview of the Pros and Cons of Linearization versus Replication in
Establishment Surveys, 2007 International Conference on Establishment Surveys, CD-
ROM, Alexandria, VA: American Statistical Association: 929-940.
Valliant, R., Dorfman, A. and Royall, R. (2000) Finite Population Sampling and Inference:
a prediction approach, John Wiley and Sons, New York.
Wolter, K.M. (1985) Introduction to Variance Estimation, Springer-Verlag, New York.
Zinger, A. (1980) Variance estimation in partially systematic sampling, Journal of the
American Statistical Association, Vol. 75, No. 369, pp. 206-211.