International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 13

ISSN 2229-5518

A Study on Retrieval Models and Query

Expansion using PRF

Rekha Vaidyanathan,Sujoy Das, Namita Srivastava

Abstract— This article compares the state of the art retrieval models and reports how query expansion enhances the retrieval effectiveness. Five state- of-the- art retrieval models (parametric and non-parametric), three Query expansion Techniques Bo1, Bo2 and KL are selected and presented. A comparative study of the retrieval models, namely TF_IDF, DLH, DPH, I(n)L2 and PL2 enhanced with the mentioned QE models are experimentally shown using FIRE 2011 Adhoc data. This is an initial study carried out to understand how the performance of these approaches varies on multiple languages (English and Hindi).Furthermore, we explore the optimal parameter settings for the non-parameteric models incase of Short, Normal and Long queries. Results show that I(n)L2 performed well for Hindi dataset and BM25 and PL2 gave best MAP for English dataset. We use the Terrier, the information Retrieval framework for indexing, retrieval and evaluation. The models used for comparison are Terrier’s DFR based weighting models.

Index Terms—pseudo relevance feedback, retrieval models, query expansion, FIRE, Terrier

——————————  ——————————

1 INTRODUCTION

he purpose of retrieval models is to retrieve and rank the set of relevant documents based on the user’s query. Bool- ean models, statistical models like the Vector Space Mod-
el, Probabilistic models and Language models have been de- veloped for Information Retrieval. The Vector space and the probabilistic retrieval models give significantly good results in terms of Precision and recall compared to the exact match or Boolean retrieval models [1].
From the naïve method of classifying the documents as match- ing or otherwise, the next generation retrieval models have term weighting schemes where documents are ranked on their de- gree of relevance. Each document is given a score based on the words they contain pertaining to a given topic and ranked ac- cordingly. These term weighting schemes can be parametric and non-parametric. Almost all term weighting models use the term frequency (tf), the number of times a term t appears in a document d as the basis for calculating the score [2]. tf.idf is the most commonly used term weighting scheme where the inverse document frequency (idf), introduced by Karen Sparck Jones, computes the term specificity. The formula for idf is given by,
3. User marks this set as relevant or irrelevant
4. Based on the user’s feedback, system retrieves a better-
set of results.
Manually skimming through the initial set of documents and
marking them as relevant or irrelevant is a tedious task. Pseudo
Relevance Feedback or blind Relevance Feedback automates
this marking system and it assumes that the top k ranked doc-
uments of the initially retrieved results are relevant. Terms re-
lated to the search query are selected from these documents to
improve the query representation with the help of query Ex-
pansion [5]. The process of adding more significant and contex- tually similar words to the original query is called query expan- sion. Most often, queries contain terms that may not match the indexed terms leading to lesser accuracy in retrieval process
[15]. This problem is addressed by relevance feedback, an au- tomatic process of query reformulation, where important words are chosen from previously retrieved documents that are rele- vant to the query [16]. Thus the basic idea behind query expan- sion is to augment the query with related terms like synonyms, plurals, modifiers, category keywords etc. for improving the retrieval accuracy [6]. Many Techniques have been proposed by researchers for query expansion. For our study, we select three

𝑖𝑖𝑖𝑡 = log(𝑁⁄𝑖𝑖𝑡 )

• N is the total number of documents in Collection C,
• dft is the document frequency for term t.

(1)

QE models namely Bo1, Bo2 and KL. A comparative study on the retrieval effectiveness of state-of-the-art retrieval models on two different languages is introduced in this paper. Also, we
• tf combined with idf give high weights to rare terms and low weights to more frequent ones.
The statistical models usually use different weighting models for ranking the documents. These term weighting methods on documents are based on the query input by the user. Most of the time, the Collection that is to be searched contain relevant and irrelevant information. IR faces the two-sided problem of the searchers not being able to frame the best suitable query for their information need and also lack of information regarding the collection used for retrieval [3]. This led to the development of the concept called Relevance Feedback where [4];
1. User submits a query
2. System retrieves an initial set of results
investigate the effectiveness of applying query expansion to improve the retrieval accuracy. The study is mainly done to understand which baseline works most effectively on multiple languages of the FIRE Adhoc 2011 Test collection. We also try to understand the effectiveness of the optimal QE model and how it improves the retrieval accuracy. Terrier™ is used as Infor- mation Retrieval framework for all our experiments [17].
The paper is organized as follows:
The weighting models selected for retrieval and QE in our ex-
periments are discussed in Section 2 and 3 in detail. Section 4 showcases the Experimental Results of different Retrieval mod- els and the effect of Query expansion on them. Section 5 con- tains concluding remarks.

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 14

ISSN 2229-5518

2 WEIGHTING MODELS

Several weighting models have been proposed by many re- searchers in IR. In this paper we will discuss a few parametric and non-parametric models. In the parametric models, there

2.1.3 PL2:

A Posisson model with Lapalce after effect and normalization
2. PL2 is one of the Divergence from Randomness weighting
models [9]. Scoring function PL2 is given by:
involves a hyper parameter tuning for length normalization. Since each query behaves differently the optimal parameter setting is different for each of them. Through our experiments, we find the optimal values for normalization parameters by
where,
𝑤𝑡 = (1⁄(𝑡𝑖𝑛 + 1)) (𝐴 + 𝐵 + 𝐶)
𝐴 = 𝑡𝑖𝑛 . 𝑙𝑙𝑙2 [𝑡𝑖𝑛 ⁄⋋]
𝐵 = (⋋ +(1⁄12. 𝑡𝑖𝑛) − 𝑡𝑖𝑛). 𝑙𝑙𝑙2𝑒
(7)
(8) (9)
selecting the one that gives the highest MAP. Discussed are BM25, I (n) L2 and PL2 Parametric Models and DPH, DLH parameter free models. These models are based on Terrier’s Divergence from Randomness DFR Models [7].

2.1 Parametric Models

2.1.1 OKAPI’s BM25

BM25 is one of the best known term-weighting schemes de- rived from the probabilistic model. BM25 is a family of scoring functions and BM stands for Best Match. It takes into account the three components namely, the term frequency, inverse document frequency and the length of the document [8]. In this method, each document D is scored against a Query q given by the formula:
𝑤𝑡 = 𝑡𝑖𝑑 . (𝐴⁄𝐵) (2)

where,

𝐶 = 0.5. 𝑙𝑙𝑙2(2𝜋. 𝑡𝑖𝑛) (10)
• tfn is the normalized term frequency as explained in
(4).
• λ is the mean and variance of a Poisson distribution.

2.2 PARAMETER FREE MODELS

2.2.1 DLH: DLH hyper geometric DFR Model

This is a DFR model based on the hyper geometric distribution of tf. For a workable weighting function, the hyper geometric function is reduced to binomial distribution with non-uniform term priors [10]. It is a parameter free model and there is no need for expensive training. This model assumes that the oc- currences of a query term in a document are samples from the whole collection instead of from the document [11]. The scor- ing function is given by:

𝑆𝑐𝑙𝑆𝑒(𝑖, 𝑄)𝑡 =

𝐴 = (log[(𝑁 − 𝑛 + 0.5)⁄(𝑛 + 0.5)]

(3)

� 𝑞𝑡𝑤 . ((1⁄(𝑡𝑖 + 0.5)) . (𝑙𝑙𝑙2[(𝑡𝑖. 𝑎𝑎𝑙_𝑙)⁄𝑙 . (𝑁⁄𝐹)])

𝐵 = (𝑡1 . ((1 − 𝑏) + 𝑏. (𝑖𝑙⁄𝑎𝑎𝑖𝑙) + 𝑡𝑖𝑑 )

(4)

𝑡𝜖𝑄

+ 0.5 𝑙𝑙𝑙2 (2𝜋𝑡𝑖(1 − (𝑡𝑖⁄𝑙))

(11)

• wt is the relevance weight assigned to a document due to query term t,
• tfd is the number of times t occurs in document
• N is the total number of documents, n is the num-
ber of documents containing at least one occur-
rence of t; dl is the length of the document and avdl is the average document length.
• k1 is the term-frequency influence parameter
1.0≤k1 ≤2.0
• b is the normalization parameter 0.0 ≤b≤1.0, for
document length. b can be set to zero of the docu-
ment length need not be considered.

2.1.2 I (n)L2

An Inverse document Frequency model with LaPlace after-
where,
• F is given by tf/l is within document frequency
• l is the document length in tokens.
• avg_l is the average document length in collection
• tf is the term frequency in the collection.

2.2.2 DPH

DPH is a parameter free scoring technique which is derived from the Divergence from Randomness model [12]. The scor- ing function is given by[19]:

Score(d,Q) =

�((𝑞𝑡𝑤 (1 − 𝐹)2⁄(𝑡𝑖 + 1)) . (𝑡𝑖. 𝑙𝑙𝑙2 (𝑡𝑖. ( 𝑎𝑎𝑙_𝑙⁄𝑙). (𝑁⁄𝑇𝐹)))
effect normalization 2. The scoring function is given by:
𝑤𝑡 = (1⁄𝑡𝑖𝑛 + 1) . (𝑡𝑖𝑛 . 𝑙𝑙𝑙2 [𝑁 + 1⁄𝑁𝑡 + 0.5] (5)

𝑡𝜖𝑄

+0.5 . 𝑙𝑙𝑙2 (2𝜋. 𝑡𝑖. (1 − 𝐹))
(12)
where, tfn is the normalized term frequency given by the for- mula:
𝑡𝑖𝑛 = 𝑡𝑖 . 𝑙𝑙𝑙2 [1 + 𝑐. (𝑎𝑎𝑙𝑙⁄𝑙)] (6)
• c is the term frequency normalization parameter
• l is the document length which corresponds to num-
ber of tokens in a document and
• avg_l is the average document length in the collection.
DPH, like DLH is a parameter free model.
• qtw = qtf/qtfmax ,
• where, qtf is the query term frequency and qtfmax is the maximum query term frequency among all query
terms.
• N is the total number of documents
• avg_l is the average document length in collection
• tf is the term frequency in the collection.

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 15

ISSN 2229-5518

3 QUERY EXPANSION MODELS

For our experiments, we use Terrier’s DFR-based Term weighting models namely, Bo1, Bo2 and KL. Terrier employs a Divergence from Randomness based QE mechanism which is a generalization of Rocchio’s method [18]. In the first step, the term weights of the terms from top ranked documents are cal- culated. The DFR model calculates the informativeness of a term by the divergence of its distribution in top ranked docu- ments from random distribution [10]. The top most informa- tive terms are then extracted ad merged with the original que- ry to form an expanded one. Weighting schemes for the three models mentioned are as given in the sections 3.1, 3.2 and 3.3 [13].

3.1 Kullback Leibler

The Scoring function is given by,

W(t)=𝑃𝑥 . 𝑙𝑙𝑙2[ 𝑥� ] (6)

𝑐

• Px =tfx / lx ;
• tfx is the frequency of the query term in the top- ranked documents.
• lx is the sum of the length of the exp_doc top ranked docu- ments where exp_doc is a parameter of the query expansion methodology.

3.2 Bo1

This model is based on the based on the Bose Einstein Statistic and the weight of the term t in the top ranked documents (rank ranging from 3 to 10) is given by [14].

1 𝑃

• 𝑃𝑖𝑥 = (𝑡𝑖𝑥 . 𝑙𝑥 )/𝑡𝑙𝑡𝑒𝑛𝑐 ; where lx is the sum of the
length of the exp_doc top ranked documents where
exp_doc is a parameter of the query expansion meth-
odology.
• F, is the term frequency of the query term in the
whole collection.
• tokenc , is the total number of tokens in the whole col- lection.

4 EXPERIMENTS AND RESULTS

Our experiments are performed with Terrier Information Re- trieval framework. It provides indexing, retrieval and evalua- tion for English and non-english documents. For the evalua- tion of various retrieval models and performance of QE mod- els on them, we use both English and Hindi collections. This is provided by Forum for Information Retrieval Evaluation (FIRE) and the dataset conforms to the TREC style Format. 100 topics were chosen with 50 each for one language. They are numbered from 126-175 for Adhoc English and Hindi
2011dataset. The corpus in encoded in UTF-8 format and the tags are as follows:

<title>Swine flu vaccine</title>

<desc>Indigenous vaccine made in India for swine flu prevention</desc>

<narr>Relevant documents should contain in- formation related to making indigenous swine flu vaccines in India, the vaccine's use on humans and animals, arrangements that are in place to prevent scarcity / unavailability of the vaccine, and the vaccine's role in saving lives.</narr>

</top></topics>

Each of the FIRE Topic consists of three fields: title, descrip- tion and narration. All the three types of queries were experi- mented to understand the impact of query length [2]. We evaluate the performance of this dataset on (i) different re-

𝑤(𝑡) = 𝑡𝑖𝑥 . 𝑙𝑙𝑙2

𝑃𝑛

+ 𝑙𝑙𝑙2(1 + 𝑃𝑛 ) (7)

trieval models (parametric and parameter-free) and (ii) en-
hancing the retrieval models using query expansion. We ex-
periment with Short Queries (title field only), Normal Queries
• tfx is the frequency of the query term in the top-
ranked documents
• Pn is given by F/N, where F is the term frequency
in the collection and N is the number of documents
in the collection.

3.2 Bo2

The scoring function of B+o2 is given by :
(Description field) and Long Queries (title + description + nar- ration)
Evaluation is done for TF_IDF, PL2, I(n)L2,DLH, DPH and BM25 to study the impact of short, normal and long queries. The optimum values for the parameters b and c have been set manually. The value that gives the highest MAP is considered as optimum.

𝑤(𝑡) = 𝑡𝑖𝑥 . 𝑙𝑙𝑙2

𝑃𝑓

+ 𝑙𝑙𝑙2(1 + 𝑃𝑓 ) (8)

4.1 Experiments with Short Queries

• tfx is the frequency of the query term in the top- ranked documents
• Pn is given by F/N, where F is the term frequency in the collection and N is the number of documents in the collection.
We tested with the 6 models out of which DLH and DPH are parameter free. The MAP, R Precision (R is the relevant re- trieved documents), Precision at 10 and 20 documents are re- ported in Table 1.

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 16

ISSN 2229-5518

• Results from Hindi_Short_Query (Table 1): The high- est MAP was obtained for PL2 model where term fre- quency normalization parameter c is set to be 4.0 for optimum result. This was set manually. R Precision also was highest for PL2. But Precision at 10 and 20 documents was better for DPH model .Thus, if we
consider the MAP, PL2 model with c set to 4.0 gave the best results for Hindi short queries followed by I(n)L2 and DPH.

TABLE 1

MAP FOR HINDI SHORT QUERIES

• Results for English_Normal_Query (Table 4): Highest MAP is obtained for BM25 at b=0.5. Precision at 10 documents was best obtained for DPH. Precision at 20 was best at PL2 with c=2.0.

TABLE 4

MAP FOR ENGLSIH NORMAL QUERIES

TF_IDF DLH DPH PL2 (C=4.0)

BM25 (b=0.25)

InL2 (c=1.0)

TF_IDF DLH DPH PL2

BM25

InL2

MAP 0.2241 0.2295 0.2433 0.2467 0.2214 0.2454

(c=2.0)

(b=0.5)

(c=0.75)

R Preci-

0.2504 0.2613 0.2772 0.2848 0.2576 0.2775

MAP 0.3694 0.3586 0.3765 0.377 0.3816 0.3696

sion

P@10 0.3580 0.3640 0.4040 0.3980 0.3820 0.3920

R Preci-

sion

0.3888 0.379 0.3893 0.3912 0.4008 0.388

P@20 0.3140 0.3290 0.3670 0.3600 0.3450 0.3550

P@10 0.5160 0.5120 0.5360 0.5260 0.5200 0.5060

P@20 0.4640 0.4580 0.4740 0.4780 0.4740 0.4640

• Results from English_Short_Query (Table 2): The Highest MAP was obtained for PL2 at c=5.0 and BM25 at b=0.25. R Precision is marginally better for DPH compared to PL2 and BM25. P@10 was best ob- tained for PL2 and P@20 for DPH. Thus we got a dis- tributed result but overall, we can conclude that both PL2 and BM25 retrieval worked well for English short queries.

TABLE 2

MAP FOR ENGLISH SHORT QUERIES

4.3 Experiments with Long Queries

Long queries comprise of the (i) title field (ii) description field and (iii) narration field. The average query length is 20-30 words.
• Results for Hindi Long Queries (Table 5): The highest MAP is obtained for BM25 with b =0.75. Rest of the values for Recall and P@10 and 20 are also higher with this model.

TABLE 5

MAP FOR HINDI LONG QUERIES

TF_IDF DLH DPH PL2 (C=5.0)

BM25 (b=0.25)

InL2 (c=2.0)

TF_IDF DLH DPH PL2 (C=2.0)

BM25 (b=0.75)

InL2 (c=0.5)

MAP 0.2965 0.2881 0.3102 0.3138 0.3139 0.2977

MAP 0.2267 0.2047 0.2158 0.2126 0.3506 0.2402

R Preci-

sion

0.3208 0.3148 0.3303 0.3273 0.3258 0.3136

R Preci- sion

0.26 0.2384 0.2534 0.2487 0.3695 0.2711

P@10 0.4200 0.3980 0.4560 0.4640 0.4440 0.4440

P@20 0.3820 0.3740 0.4060 0.4030 0.4050 0.3900

P@10 0.3600 0.3300 0.3500 0.3420 0.4820 0.3740

P@20 0.3360 0.2960 0.3110 0.3010 0.4590 0.3450

4.2 Experiments with Normal Queries

For Normal queries, only the description field was considered with number of words ranging from 7-10.
• Results for Hindi_Normal_Query (Table 3): The high- est MAP is obtained for I(n)L2 model with c value at
0.75. Rest of the values: R Precision, P@10 and P@20
• Results for English Long Queries (Table 6): The high- est MAP is obtained for InL2 model with c value at
0.5. However, PL2 at c=1.0 fared well for P@10 and 20 documents.

TABLE 6

MAP FOR ENGLISH LONG QUERIES

are also highest for this model.

TF_IDF DLH DPH PL2

(C=1.0)

BM25

(b=0.25)

InL2

(c=0.5)

TABLE 3

MAP 0.3463 0.3313 0.33 0.3488 0.1554 0.3539

MAP FOR HINDI NORMAL QUERIES

R Preci-

sion

0.3667 0.3561 0.3546 0.3722 0.1906 0.3741

TF_IDF DLH DPH PL2

(C=2.0)

BM25

(b=0.25)

InL2

(c=0.75)

P@10 0.4900 0.4800 0.4900 0.5040 0.2960 0.4980

P@20 0.4550 0.4370 0.4460 0.4740 0.2540 0.4690

MAP 0.2498 0.2529 0.24 0.25 0.2072 0.2651

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 17

ISSN 2229-5518

4.3 Experiments for Query Expansion with Bo1, Bo2 and KL models on short queries

We perform enhancement of the retrieval models explained here using Query expansion on short queries and test whether the results are significantly different. 1000 documents each are retrieved initially using each of the retrieval models explained in Table 7&8. Each of the models is enhanced with Bo1, Bo2 and KL models and is checked for the improvement in MAP. For QE, top 10 documents are used and 10 words are used for

QE on English Short Queries: Results on English data show that the highest MAP was obtained for PL2 (c=5.0) enhanced with Bo1 with an 18% improvement over baseline. The highest MAP among baseline was given by PL2 and BM25 (b=0.25). The highest delta was obtained for TF_IDF (+23%) with Bo2 and InL2 (23.9%) with c=2.0. The KL and Bo1 models per- formed almost similarly in all the cases.

TABLE 8

QE ON ENGLISH SHORT QUERIES

expansion with “title” query which is short. Table 7 & 8 shows the results obtained after query expansion with 10 terms from

MAP (%improvement)

RPrecision P@10 P@20

QE on Hindi Short Queries: Results indicate that the highest MAP was obtained for I(n)L2 enhanced with Bo1 model with c value set to 1.0. The delta obtained from baseline for this is

24%. Among the baseline, PL2 with c=4 and InL2 with c=1 gave the best MAP for initial retrieval with 0.2467 and 0.2454 respectively. The highest delta +34% was obtained for BM25 (b-0.25), enhanced with Bo1. We can also observe that Query expansion using Bo2 weighting hurt the MAP drastically and deteriorated the results in DLH (-18%), PL2(-4.8%), BM25 (-
80%).

TABLE 7

QE ON HINDI SHORT QUERIES

TF_IDF 0.2965 0.3208 0.4200 0.3820

TF_IDF_Bo1 0.3548(+19.66%) 0.3661 0.4580 0.4090

TF_IDF_Bo2 0.3652(+23%) 0.3721 0.4880 0.4350

TF_IDF_KL 0.3543(+19%) 0.3694 0.4500 0.4040

DLH 0.2882 0.3148 0.3980 0.3740

DLH_Bo1 0.3473(+20.5%) 0.3666 0.4460 0.3990

DLH_Bo2 0.3423(+18.77 %) 0.3553 0.4420 0.4070

DLH_KL 0.3452(+19.77%) 0.3641 0.4480 0.4000

DPH 0.3102 0.3303 0.4560 0.4060

DPH_Bo1 0.3723(+20%) 0.3778 0.4900 0.4360

DPH_Bo2 0.3712(+19.66%) 0.3771 0.5020 0.4460

DPH_KL 0.3703(+19.37%) 0.3764 0.4960 0.4360

MAP (%improvement)

RPrecision P@10 P@20

PL2_C5.0 0.3138 0.3273 0.4640 0.4030

PL2_Bo1 0.3724(+18.6%) 0.3766 0.4880 0.4430

TF_IDF 0.2241 0.2504 0.3580 0.3140

TF_IDF_Bo1 0.2785 (+24%) 0.2924 0.418 0.372

TF_IDF_Bo2 0.2415(+7%) 0.267 0.354 0.332

TF_IDF_KL 0.2768(+23%) 0.2921 0.404 0.368

DLH 0.2295 0.2613 0.3640 0.3290

DLH_Bo1 0.2919(+27%) 0.3097 0.4240 0.3860

DLH_Bo2 0.1861(-18%) 0.2145 0.3240 0.2750

DLH_KL 0.2912(+27%) 0.3039 0.4260 0.3850

DPH 0.2433 0.2772 0.4040 0.3670

DPH_Bo1 0.3004(+23%) 0.3181 0.434 0.367

DPH_Bo2 0.2467(+1.39%) 0.2706 0.392 0.349

DPH_KL 0.3(+23%) 0.3155 0.436 0.391

PL2_Bo2 0.3686(+17.46%) 0.3699 0.4920 0.4260

PL2_KL 0.3712(+18.29%) 0.3771 0.5040 0.4340

BM25_b0.25 0.3139 0.3258 0.4440 0.4050

BM25_b0.25_Bo1 0.3651(16.3%) 0.3715 0.4860 0.4330

BM25_b0.25_Bo2 0.3633(+15.7%) 0.371 0.4900 0.4240

BM25_b0.25_KL 0.3629(+15.6%) 0.3682 0.4700 0.4220

InL2_c2.0 0.2977 0.3136 0.4440 0.3900

InL2_c1.0_Bo1 0.3629(+21%) 0.3626 0.4760 0.4190

InL2_c1.0_Bo2 0.369(+23.9%) 0.3684 0.4860 0.4300

InL2_c1.0_KL 0.3627(+21.8%) 0.3625 0.4760 0.4270

PL2_C4.0	0.2467	0.2848	0.3980	0.3600	5 CONCLUSION In our experiment we conducted a study on effectiveness
PL2_Bo1	0.3033(+22%)	0.3253	0.43	0.398	5 CONCLUSION In our experiment we conducted a study on effectiveness
PL2_Bo2	0.2348(-4.8%)	0.2746	0.382	0.342	5 CONCLUSION In our experiment we conducted a study on effectiveness
PL2_KL	0.3021(+22.45%)	0.3218	0.436	0.399	shown by QE. This is an initial step towards identifying a
					baseline for our future experiments that involves finding a
BM25_b0.25	0.2214	0.2576	0.3820	0.3450	term weighting strategy for Query Expansion using PRF. We
BM25_b0.25_Bo1	0.2976(+34%)	0.3045	0.432	0.398	also investigated the improvement shown by state-of-the-art
BM25_b0.25_Bo2	0.0426(-80%)	0.0599	0.0918	0.0776	QE models on FIRE Collection. The results of our study show
BM25_b0.25_KL	0.2973(34%)	0.3094	0.436	0.396	that there is a relation between the retrieval effectiveness and
					query expansion as mentioned by previous researchers. Also,
InL2_c1.0	0.2454	0.2775	0.392	0.355	Query Expansion has improved the MAP of the retrieval by
InL2_c1.0_Bo1	0.3056(+24%)	0.3169	0.432	0.418	18-20% for the FIRE 2011 Collection.
InL2_c1.0_Bo2	0.2609(+6.3%)	0.2887	0.392	0.362	For Hindi dataset, Bo1 model gave the best results where as
InL2_c1.0_KL	0.3034(23.6%)	0.3191	0.43	0.417	all three models performed similarly for English dataset. It is

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 18

ISSN 2229-5518

also observed that the b parameter in BM25 was optimum at
0.25 for both the Collections, in case of short queries.
For Hindi dataset, InL2 performed better with different c val- ues for short and normal queries. Though PL2 gave the best result for short queries, the result is not very significantly dif- ferent from InL2. Applying QE on PL2 did not show much improvement and incase of Bo2, it hurt the MAP by 4.8%. In- fact, Bo2 consistently did not improve the MAP in any signifi- cant manner for Hindi dataset.Hence we can support the sug- gestion that it is not only the quality of the top ranked docu- ments but also the quality of the reweighting fo the query terms that improves the retrieval effectiveness [11]. In case of English dataset, BM25 at b=0.25 and PL2 gave the highest MAP during initial retrieval for short and normal queries. MAP was improved by 17-18% using Bo1, Bo2 and KL for English dataset.
It is observed that the drawback with the parametric models is that they require the parameter tuning and in case of automat- ic query expansion, setting the parameter automatically would in itself be a research problem. Our future study aims at for- mulating a QE model that will find the optimal values for pa- rameters, if any, automatically and yield better results com- pared to the state-of-the-art models. We would also like to consider the length of the query while reformulating it as this can reduce the iterations during retrieval. Thus our future work aims at integrating both these aspects effectively and giving improved results for retrieval.

ACKNOWLEDGMENT

The authors sincerely thank FIRE for providing test collections especially on Indian Languages and for the relevance judge- ment files. The authors would also like to thank the Terrier team for the Terrier Retrieval Engine which was used for in- dexing, retrieval and evaluation of the experiments on differ- ent languages.

REFERENCES

[1] Turtle, H. R., & Croft, W. B. (1992). A comparison of text retrieval models. The computer journal, 35(3), 279-290.

[2] He, Ben, and Iadh Ounis. "Term frequency normalisation tuning for BM25 and DFR models." Advances in Information Retrieval. Springer Berlin Heidelberg, 2005. 200-214.

[3] Ruthven, Ian, and Mounia Lalmas. "A survey on the use of relevance feedback for information access systems." The Knowledge Engineer- ing Review 18.02 (2003): 95-145.

[4] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge university press, 2008.

[5] Lv, Yuanhua, and ChengXiang Zhai. "Positional relevance model for pseudo-relevance feedback." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

[6] Vaidyanathan, Rekha, Sujoy Das, and Namita Srivastava. "Query Expansion Based on Equi-Width and Equi-Frequency Partition." Multilingual Information Access in South Asian Languages. Springer

Berlin Heidelberg, 2013. 13-22.

[7] Plachouras, Vassilis, Ben He, and Iadh Ounis. "University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier." TREC. 2004.

[8] Hawking, David, Trystan Upstill, and Nick Craswell. "Toward better

weighting of anchors." Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2004.

[9] G. Amati and C. J van Rijsbergen. Probabilistic models of Infor-

mation Retrieval based on measuring the divergence from random-

ness. In ACM Transactions on Informaton Systems (TOIS), volu- men20 (4), pages 357-389, 2002.

[10] Lu, Sha, Ben He, and Jungang Xu. "Hyper-geometric Model for In-

formation Retrieval Revisited." Information Retrieval Technology. Springer Berlin Heidelberg, 2013. 62-73.

[11] He, Ben, and Iadh Ounis. "Combining fields for query expansion and adaptive query expansion." Information processing & management

43.5 (2007): 1294-1307.

[12] G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G.Gambosi. FUB,

IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track. In

Proceedings of TREC 2007.

[13] Plachouras, Vassilis, Ben He, and Iadh Ounis. "University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier." TREC. 2004.

[14] Macdonald, C., He, B., Plachouras, V., & Ounis, I. (2005). University of Glasgow at TREC 2005: Experiments in terabyte and enterprise tracks with terrier. In Proceedings of the 14th text retrieval confer- ence (TREC 2005). Gaithersburg, MD.

[15] Harman, Donna. "Relevance Feedback and Other Query Modification

Techniques." (1992): 241-263.

[16] Salton, Gerard, and Chris Buckley. "Improving retrieval performance by relevance feedback." Readings in information retrieval 24.5 (1997).

[17] Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006, August). “Terrier: A high performance and scalable infor- mation retrieval platform”. In Proceedings of the OSIR Workshop (pp. 18-25).

[18] Rocchio, J. (1971). Relevance feedback in Information Retrieval. In G.

Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 313–323). Prentice-Hall Englewood Cliffs.

[19] McCreadie, R., Macdonald, C., Ounis, I., Peng, J., & Santos, R. L. (2009). University of glassgow at trec 2009: Experiments with terrier. GLASGOW UNIV (UNITED KINGDOM).