International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 378

ISSN 2229-5518

Generalized Estimators of Population Median using Auxiliary

Information

H.S. Jhajj1 and H. K. Bhangu2

Abstract

For estimating the median of the population, we have proposed two estimators using linear transformations using the information on median of the auxiliary variable. The expressions for biases, mean square errors and their minimum values have been obtained. It has been shown that proposed estimators are always efficient than the ratio estimator and equally efficient to the other estimators derived from a different approach respectively defined by Kuk and Mak (1989). The comparison of estimators among the proposed estimators with respect to their biases has also been done.The results have been illustrated by carrying out the simulation study.

Keywords: Median estimation, Auxiliary variable, Mean squared errors, Bias, Simple random sampling, Population Median, Sample Median.

1. Introduction

In survey sampling, statisticians have given more attention to the estimation of population mean, total, variance etc. but median is regarded as a more appropriate measure of location than mean when the distribution of variables such as income, expenditure etc is highly skewed. In such situations, it is necessary to estimate median. First of all some statisticians such as Gross(1980), Sedransk and Meyer(1978), Smith and Sedransk(1983) have considered the problem of estimating the median by dealing exclusively with variable under study Y only.

1. Prof. & Head, Department of Statistics, Punjabi University, Patiala-147002, India. Email: drhsjhajj@yahoo.co.in

2. Department of Community Medicine, SGRDIMSAR, Amritsar-143501, India. Email: harpreet3182@gmail.com

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 379

ISSN 2229-5518

Kuk and Mak (1989) are the first to introduce the estimation of median of study variable Y by using information of the values on the auxiliary variable X highly correlated with Y for the units in the sample and its known median M X for the whole population. Later problem of estimation of median was discussed by various authors such as Chambers and Dunstan(1986), Rao et al.(1990), Mak and Kuk(1993), Rueda et al.(2001),Arcos et al.(2005), Garcia and Cebrian(2001), Meeden(1995), and Singh, S. et al(2007).
Using known value of population median MX of the auxiliary variable X, Kuk and Mak (1989)
suggested an estimator for the population median MY of study variable Y under simple random sampling similar to ratio estimator of its population mean as

Mˆ YR = Mˆ M X

(1.1)
where, ˆ
and ˆ
are the estimators of MY and MX respectively based on a simple random
sample of size n drawn from the population.
Let 𝑌𝑖 and 𝑋𝑖 denote the values on the ith unit of the population i = 1, 2, 3, …, N for the study variable Y and auxiliary variable X respectively and corresponding small letters denote the
values in the sample.
Suppose that Y(1), Y(2),…,Y(n) are the values of Y on the sample units in ascending order. Further, let t be an integer such that Y(t) ≤ MY ≤ Y(t+1) and let p = t/n be the proportion of Y
values in the sample that are less than or equal to the median value MY , an unknown population
parameter. If pˆ is a predictor of p, the sample median ˆ Y
as Qˆ ( pˆ ), where pˆ = 0.5.
can be written in terms of quantiles

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 380

ISSN 2229-5518

Kuk and Mak (1989) define a matrix of proportions (Pij, (i,j=1,2 )) of units in the population
as

	X ≤ MX	X > MX	Total
Y ≤ MY	P11	P12	P1.
Y > MY	P21	P22	P2.
Total	P.1	P.2	1

Where for instance, P11 denotes the proportion of the units in the population with Y≤ 𝑀𝑌 and
X ≤ 𝑀𝑋 . In practice, the Pij are usually unknown but can be estimated by 𝑝𝑖𝑗 based on a similar
cross-classification of the sample. Thus, 𝑝11 , for instance, represents the proportion of units in
the sample with Y≤ 𝑀𝑌 and X ≤ 𝑀𝑋 . For estimating the population median MY of study variable
Y, Kuk and Mak(1989) has also proposed two other estimators, position estimator 𝑀�𝑌𝑃 and
stratification estimator 𝑀�𝑌𝑆 respectively derived from a different approach.
𝑀�𝑌𝑃 = 𝑄�𝑌 (𝑝1 ) (1.2)

where, 𝑝1 = 2/𝑛{𝑛𝑋 𝑝11 + (𝑛 − 𝑛𝑋 )(
− 𝑝11 )}
where, 𝑛𝑋 be the number of units in the sample with X ≤ MX.
𝑀�𝑌𝑆 = inf{𝑦: 𝐹�𝑌 (𝑦) > 1/2} (1.3)
where, 𝐹�𝑌 (𝑦) ≅

{𝐹�𝑌1(𝑦) + 𝐹�𝑌2(𝑦)} and for any value of y, let 𝐹�𝑌1(𝑦) be the proportion
among those units in the sample with X ≤ 𝑀𝑋 that have Y values less than or equal to y.

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 381

ISSN 2229-5518

Similarly, 𝐹�𝑌2(𝑦) is the proportion among those with X > 𝑀𝑋 .
Defining
𝑒0 =

𝑀�𝑌

𝑀𝑌

− 1 , 𝑒 = 𝑀�𝑋 − 1

𝑀𝑋

such that E(ek ) ≅ 0 and | ek | < 1 for k = 0,1
Using results of Kuk and Mak (1989) up to the first order of approximation, we have
E(𝑒 2) = (1 − 𝑓)(4𝑛)−1
[𝑀𝑌 𝑓𝑌 (𝑀𝑌 )]−2
(1.4)
E(𝑒 2) = (1 − 𝑓)(4𝑛)−1
[𝑀𝑋 𝑓𝑋 (𝑀𝑋 )]−2
(1.5)
𝐸(𝑒0𝑒1)=(1 − 𝑓)(4𝑛)−1[4𝑃11(𝑋, 𝑌) − 1][𝑀𝑋 𝑀𝑌 𝑓𝑋 (𝑀𝑋 )𝑓𝑌 (𝑀𝑌 )]−1 (1.6)
where it is being assumed that as N→∞, the distribution of the bivariate variable (X,Y)
approaches to a continuous distribution with marginal densities 𝑓𝑋 (𝑥) and 𝑓𝑌 (𝑦) for X and Y
respectively. This assumption holds in particular under a superpopulation model framework,
treating the values of (X,Y) in the population as a realization of N independent observations from
a continuous distribution. We also assume that 𝑓𝑋 (𝑥) and 𝑓𝑌 (𝑦) are positive.

2. The Proposed Estimators and their results

When the median 𝑀𝑋 of the auxiliary variable X is known, we propose following estimators of population median using linear transformation under the simple random sampling design as
𝑀�𝐻1 =

𝑀�𝑌

𝑀�𝑋

[𝑀�𝑋 + 𝛼�𝑀𝑋 − 𝑀�𝑋 �] (2.1)
𝑀�𝐻2 =

𝑀�𝑌

[𝑀𝑋 + 𝑣�𝑀� ′ − 𝑀
�] (2.2)

𝑀𝑋

𝑋 𝑋

where 𝑀� ′ = 𝑁𝑀𝑋 −𝑛𝑀�𝑋

𝑋 𝑁−𝑛

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 382

ISSN 2229-5518

where 0 ≤ 𝛼 ≤ 1 , 0 ≤ 𝑣 ≤ 1
Assuming that sample size is large enough such that terms involving 𝑒𝑖 ’s more than second
degree are negligible in the expansions of estimators 𝑀�𝐻1𝑎𝑛𝑑 𝑀�𝐻2 in terms of 𝑒𝑖 ’s while
obtaining biases and mean squared errors.
Using results (1.2) – (1.4), the biases and MSE’s of 𝑀�𝐻1 𝑎𝑛𝑑 𝑀�𝐻2 , up to first order of approximation are
Bias (𝑀�𝐻1 ) = 𝑀𝑌 𝛼(1 − 𝑓)(4𝑛)−1 [ {𝑀𝑋 𝑓𝑋 (𝑀𝑋 )}−2
−{4𝑃11 (𝑋, 𝑌) − 1}{𝑀𝑋 𝑀𝑌 𝑓𝑋 (𝑀𝑋 )𝑓𝑌 (𝑀𝑌 )}−1 ] (2.3)
Bias (𝑀�𝐻2 ) = −𝑀𝑌

𝑛𝑣

(𝑁−𝑛)

(1 − 𝑓)(4𝑛)−1{4𝑃11 (𝑋, 𝑌) − 1}{𝑀𝑋 𝑀𝑌 𝑓𝑋 (𝑀𝑋 )𝑓𝑌 (𝑀𝑌 )}−1 (2.4)
MSE (𝑀�𝐻1 ) = (1 − 𝑓)(4𝑛)−1 [{ 𝑓𝑌 (𝑀𝑌 )}−2 + 𝛼 �

𝑀𝑌 �2

𝑀𝑋

{𝑓𝑋 (𝑀𝑋 )}−2 (𝛼 − 2𝐶) ] (2.5)

where C = [4𝑃11 (𝑋,𝑌)−1]𝑀𝑋 𝑓𝑋 (𝑀𝑋 )

𝑀𝑌 𝑓𝑌 (𝑀𝑌 )

(2.6)
From (2.5), we note that MSE of 𝑀�𝐻1 decreases with the decrease in the value of 𝛼 provided
C ≤ 𝛼/2.
Similarly, up to the first order of approximation, we get
MSE(𝑀�𝐻2 ) = (1 − 𝑓)(4𝑛)−1[ {𝑓𝑌 (𝑀𝑌 )}−2 + 𝜃 �

𝑀𝑌 �2

𝑀𝑋

{𝑓𝑋 (𝑀𝑋 )}−2𝑣(𝜃𝑣 − 2𝐶) ] (2.7)

where 𝜃 = 𝑛

𝑁−𝑛

In (2.7), we note note that MSE of 𝑀�𝐻2 decreases with decrease in the value of 𝑣 provided
C ≤ 𝜃𝑣/2.

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 383

ISSN 2229-5518

MSE (𝑀�𝐻1 ) minimizes for

𝛼 = [4𝑃11 (𝑋,𝑌)−1]𝑀𝑋 𝑓𝑋 (𝑀𝑋 ) = C (2.8)

𝑀𝑌 𝑓𝑌 (𝑀𝑌 )

and its minimum value is given by
𝑀𝑆𝐸𝑚𝑖𝑛 (𝑀�𝐻1 ) = (1 − 𝑓)(4𝑛)−1 {𝑓𝑌 (𝑀𝑌 )}−2 {1 − (4𝑃11(𝑋, 𝑌) − 1)2} (2.9) Bias of optimum estimator 𝑀�𝐻1 is given by
Bias (𝑀�𝐻1 ) = (1 − 𝑓)(4𝑛)−1{4𝑃11 (𝑋, 𝑌) − 1}[ {𝑀𝑋 𝑓𝑋 (𝑀𝑋 )𝑓𝑌 (𝑀𝑌 )}−1
−{4𝑃11 (𝑋, 𝑌) − 1}𝑀𝑌 −1 {𝑓𝑌 (𝑀𝑌 )}−2 ] (2.10)
Similarly, MSE(𝑀�𝐻2 ) minimizes for

𝑣 = (𝑁−𝑛)[4𝑃11 (𝑋,𝑌)−1]𝑀𝑋 𝑓𝑋 (𝑀𝑋 ) = (𝑁−𝑛) 𝐶 (2.11)

𝑛𝑀𝑌 𝑓𝑌 (𝑀𝑌 ) 𝑛

and its minimum value is given by
𝑀𝑆𝐸𝑚𝑖𝑛 (𝑀�𝐻2 ) = (1 − 𝑓)(4𝑛)−1 {𝑓𝑌 (𝑀𝑌 )}−2 [1 − {4𝑃11(𝑋, 𝑌) − 1}2] = 𝑀𝑆𝐸𝑚𝑖𝑛 (𝑀�𝐻1 ) (2.12)
and bias of optimum estimator of 𝑀�𝐻2 is given by
Bias (𝑀�𝐻2 ) = −(1 − 𝑓)(4𝑛)−1{4𝑃11(𝑋, 𝑌) − 1}2 𝑀𝑌 −1{𝑓𝑌 (𝑀𝑌 )}−2 (2.13)
Using the expressions (2.10) and (2.13), we have

|Bias (𝑀�𝐻1 )| = � 𝑀𝑌 𝑓𝑌 (𝑀𝑌 )

� (2.14)

|Bias (𝑀�𝐻2 )|

− 1

[4𝑃11 (𝑋,𝑌)−1]𝑀𝑋 𝑓𝑋 (𝑀𝑋 )

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 384

ISSN 2229-5518

Expression (2.14) shows that bias of (𝑀�𝐻2 ) is smaller than the bias of (𝑀�𝐻1)

1 𝑀𝑌 𝑓𝑌 (𝑀𝑌 )

if 𝜌𝑐 <

2 𝑀𝑋

𝑓𝑋

(𝑀𝑋

(2.15)

)

where 𝜌𝑐 = [4𝑃11(𝑋, 𝑌) − 1] ,the correlation coefficient between the variables X and Y,goes
from -1 to 1 as 𝑃11(𝑋, 𝑌) increases from 0 to 1/2 which implies that 𝜌𝑐 is negative and positive
for 𝑃11(𝑋, 𝑌) belongs to [0

1) and (1

4 4

1] respectively.

Note : We have seen that value of C remains fairly stable in repeated survey. So the value of C may often be more or less known on the basis of previous data, past experience, a pilot survey or otherwise, more information about the range of possible values of C may be available in practical situations.

Using the additional knowledge of C in addition to known value of population median 𝑀𝑋 of
auxiliary variable X, we can construct from (2.1) and (2.2) efficient estimators of population median 𝑀𝑌 of study variate Y.

3. Comparison

To compare the proposed estimators with 𝑀�𝑌𝑅 given by Kuk and Mak(1989) and usual sample median 𝑀�𝑌 , we first write the expressions of MSEs of estimators 𝑀�𝑌𝑅 and 𝑀�𝑌 of population
median up to the first order of approximation as
MSE (𝑀�𝑌𝑅 ) = (1 − 𝑓)(4𝑛)−1 [ {𝑓𝑌 (𝑀𝑌 )}−2 + �

𝑀𝑌 �2

𝑀𝑋

{𝑓𝑋 (𝑀𝑋 )}−2
−2{4𝑃11(𝑋, 𝑌) − 1}(

𝑀𝑌

𝑀𝑋

){𝑓𝑋 (𝑀𝑋 )𝑓𝑌 (𝑀𝑌 )}−1 ] (3.1)

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 385

ISSN 2229-5518

MSE (𝑀�𝑌 ) = (1 − 𝑓)(4𝑛)−1 {𝑓𝑌 (𝑀𝑌 )}−2 (3.2)
Using (2.9) & (3.1), we have
MSE (𝑀�𝑌𝑅 ) - MSE (𝑀�𝐻1 ) = [

𝑀𝑌

𝑀𝑋

{𝑓𝑋 (𝑀𝑋 )}−1 – {𝑓𝑌 (𝑀𝑌 )}−1 {4𝑃11(𝑋, 𝑌) − 1}]2
≥ 0, which is always true. (3.3)
Similarly, using (2.9) & (3.2), we have
MSE (𝑀�𝑌 ) - MSE (𝑀�𝐻1 ) = {4𝑃11(𝑋, 𝑌) − 1 }2
≥ 0, which is always true. (3.4)
From (3.3) and (3.4), we note that the estimator 𝑀�𝐻1 is always efficient than the estimator 𝑀�𝑌𝑅
defined by Kuk and Mak (1989) and usual sample median 𝑀�𝑌 but it is equally efficient to the
other two estimators proposed by KUK and Mak (1989).

4. Numerical Illustration

To obtain the rough idea about the efficiencies of proposed estimators over the existing ones, simulation study has been carried out using R software in which we drew 10,00,000 repeated samples from a bivariate normal population for different correlation coefficient values
with different samples sizes having Medians : 𝑀𝑌 = 4, 𝑀𝑋 = 3, Means: 𝜇𝑋 = 4, 𝜇𝑌 = 3 and
Standard deviations : 𝜎𝑌 = 3, 𝜎𝑋 = 1. Numerical values of results are given in table 4.1

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 386

ISSN 2229-5518

Table 4.1 Biases of different estimators

Correlation coefficient (𝜌𝑐 )	Sample size(n)	Bias
Correlation coefficient (𝜌𝑐 )	Sample size(n)	𝑀�𝐻1	𝑀�𝐻2	𝑀�𝑌	𝑀�𝑌𝑅	𝑀�𝑌𝑃	𝑀�𝑌𝑆
0.3	3	0.049	0.045	1.89	0.154	0.151	0.152
0.3	5	0.032	0.029	1.15	0.094	0.093	0.093
0.3	7	0.022	0.019	0.84	0.066	0.058	0.057
0.3	9	0.018	0.015	0.67	0.053	0.049	0.049
0.5	3	0.055	0.051	1.89	0.097	0.088	0.087
0.5	5	0.037	0.032	1.15	0.062	0.058	0.057
0.5	7	0.026	0.023	0.84	0.044	0.036	0.034
0.5	9	0.022	0.018	0.67	0.037	0.030	0.030
0.7	3	0.025	0.022	1.89	0.029	0.028	0.027
0.7	5	0.021	0.019	1.15	0.026	0.026	0.026
0.7	7	0.017	0.013	0.84	0.021	0.019	0.017
0.7	9	0.015	0.012	0.67	0.017	0.016	0.014
0.9	3	0.058	0.046	1.89	0.059	0.812	0.811
0.9	5	0.036	0.01	1.15	0.029	0.486	0.484
0.9	7	0.021	0.011	0.84	0.017	0.355	0.358
0.9	9	0.014	0.005	0.67	0.011	0.273	0.269

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 387

ISSN 2229-5518

Table 4.2 Comparison of efficiencies of estimators

Correlation coefficient (𝜌𝑐 )	Sample size(n)	MSE	Relative efficiencies
Correlation coefficient (𝜌𝑐 )	Sample size(n)	𝑀�𝑌	𝑀�𝑌𝑅	𝑀�𝐻1	𝑀�𝐻2	𝑀�𝑌	𝑀�𝑌𝑅	𝑀�𝐻1
0.3	3	1.89	3.516	1.88	1.88	100	53.75	100.5
0.3	5	1.15	2.124	1.13	1.13	100	54.14	101.8
0.3	7	0.84	1.491	0.82	0.82	100	56.33	102.4
0.3	9	0.67	1.122	0.65	0.65	100	59.82	103.1
0.5	3	1.89	2.114	1.71	1.71	100	89.40	110.5
0.5	5	1.15	1.258	1.01	1.01	100	91.41	113.8
0.5	7	0.84	0.901	0.72	0.72	100	93.22	116.7
0.5	9	0.67	0.698	0.56	0.56	100	95.98	119.6
0.7	3	1.89	1.457	1.39	1.39	100	129.7	136.0
0.7	5	1.15	0.881	0.82	0.82	100	130.5	140.2
0.7	7	0.84	0.639	0.59	0.59	100	131.4	142.4
0.7	9	0.67	0.505	0.46	0.46	100	132.7	145.7
0.9	3	1.89	0.835	0.81	0.81	100	226.4	233.3
0.9	5	1.15	0.501	0.48	0.48	100	229.5	239.6
0.9	7	0.84	0.365	0.35	0.35	100	230.1	240.0
0.9	9	0.67	0.287	0.27	0.27	100	233.4	248.2

From Table 4.1, we see that proposed estimators have lower bias than all the three estimators proposed by

Kuk and Mak(1989) and usual sample median estimator. Also the bias of estimator 𝑀�𝐻2 is lower than

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 388

ISSN 2229-5518

estimator 𝑀�𝐻1 . From table 4.2, it is clear that proposed estimators always have higher efficiency

than the ratio estimator defined by Kuk and Mak(1989) and usual sample median estimator.

5. Conclusion

The theoretical study shows that proposed estimators are always more efficient than ratio estimator defined by Kuk and Mak (1989) as well as usual sample median estimator for all the
situations. It has also been shown that both the estimators 𝑀�𝐻1 and 𝑀�𝐻2 are equally efficient but
in spite of exact bias of 𝑀�𝐻2 as compared to the bias of 𝑀�𝐻1 taken up to first order of
approximation is smaller than 𝑀�𝐻1 . Biases of the proposed estimators are less than the estimators
proposed by Kuk and Mak(1989).It is also shown that efficient estimators can be constructed by choosing the values of 𝛼 and 𝑣 in the proposed estimators corresponding to given values of C or
range of C. Numerical results given in table 4.2 by using simulation also show that the proposed estimators are always efficient than ratio estimator defined by Kuk and Mak (1989) as well as usual sample median estimator and table 4.1 shows that bias of proposed estimators is less than the estimators proposed by Kuk and Mak(1989) and usual sample median estimator.

References

[1] A.Arcos, M. Rueda, M.D. Martinz, S. Gonzalez, and Y. Roman, “Incorporating the auxiliary information available in variance estimation”, Applied Mathematics and Computation , vol. 160, pp.387-399, 2005.
[2] R.L.Chambers, R. Dunstan, “Estimating distribution functions from survey data”,

Biometrika, vol.73, pp.597 -604,1986.

[3] S.T. Gross, “Median estimation in sample surveys”. Proc. Surv. Res. Meth. Sect. Amer.

Statist. Ass., pp. 181-184, 1980.

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 389

ISSN 2229-5518

[4] Y.C.A. Kuk and T.K. Mak, “Median estimation in the presence of auxiliary information”,

J.R. Statist. Soc. B, vol. 2, pp.261-269, 1989.

[5] T.K. Mak and A.Y.C. Kuk , “A new method for estimating finite population quantiles using auxiliary information”, The Canadian Journal of Statistics vol. 25, pp.29-38, 1993.
[6] G. Meeden , “Median estimation using auxiliary information”. Survey Methodology, vol. 21, pp.71-77,1980.
[7] J.N.K. Rao, J.G. Kovar, H.J. Mantel , “On estimating distribution functions and quantiles from survey data using auxiliary information”. Biometrika vol. 77, pp. 365-375, 1990.
[8] M. Rueda, and A. Arcos, “On estimating the median from survey data using multiple auxiliary information”. Metrika, vol. 54, pp.59-76, 2001.
[9] J. Sedransk and J. Meyer, “Confidence intervals for the quantiles of a finite population:
simple random sampling and stratified simple random sampling”.J.R.Statist. Soc. B, vol.
40, pp.239 -252, 1978.
[10] B.W. Silverman, Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986.
[11] S. Singh, P.S. Housila and L.N. Upadhyaya,. “Chain ratio and regression type estimators for median estimation in survey sampling”. Staistical Papers, no.1, vol. 48, pp. 23-46,
2007.
[12] P. Smith and J. Sedransk, “Lower bounds for confidence coefficients for confidence intervals for finite population quantiles”. Commun. Statist. Theor. Meth., vol.12, pp.1329- 1344,1983.

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013

ISSN 2229-5518

390

I£ER 2013 http://W WW.ISer .o rg