International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 13
ISSN 2229-5518
A Study on Retrieval Models and Query
Expansion using PRF
Rekha Vaidyanathan,Sujoy Das, Namita Srivastava
Abstract— This article compares the state of the art retrieval models and reports how query expansion enhances the retrieval effectiveness. Five state- of-the- art retrieval models (parametric and non-parametric), three Query expansion Techniques Bo1, Bo2 and KL are selected and presented. A comparative study of the retrieval models, namely TF_IDF, DLH, DPH, I(n)L2 and PL2 enhanced with the mentioned QE models are experimentally shown using FIRE 2011 Adhoc data. This is an initial study carried out to understand how the performance of these approaches varies on multiple languages (English and Hindi).Furthermore, we explore the optimal parameter settings for the non-parameteric models incase of Short, Normal and Long queries. Results show that I(n)L2 performed well for Hindi dataset and BM25 and PL2 gave best MAP for English dataset. We use the Terrier, the information Retrieval framework for indexing, retrieval and evaluation. The models used for comparison are Terrier’s DFR based weighting models.
Index Terms—pseudo relevance feedback, retrieval models, query expansion, FIRE, Terrier
—————————— ——————————
he purpose of retrieval models is to retrieve and rank the set of relevant documents based on the user’s query. Bool- ean models, statistical models like the Vector Space Mod-
el, Probabilistic models and Language models have been de- veloped for Information Retrieval. The Vector space and the probabilistic retrieval models give significantly good results in terms of Precision and recall compared to the exact match or Boolean retrieval models [1].
From the naïve method of classifying the documents as match- ing or otherwise, the next generation retrieval models have term weighting schemes where documents are ranked on their de- gree of relevance. Each document is given a score based on the words they contain pertaining to a given topic and ranked ac- cordingly. These term weighting schemes can be parametric and non-parametric. Almost all term weighting models use the term frequency (tf), the number of times a term t appears in a document d as the basis for calculating the score [2]. tf.idf is the most commonly used term weighting scheme where the inverse document frequency (idf), introduced by Karen Sparck Jones, computes the term specificity. The formula for idf is given by,
3. User marks this set as relevant or irrelevant
4. Based on the user’s feedback, system retrieves a better-
set of results.
Manually skimming through the initial set of documents and
marking them as relevant or irrelevant is a tedious task. Pseudo
Relevance Feedback or blind Relevance Feedback automates
this marking system and it assumes that the top k ranked doc-
uments of the initially retrieved results are relevant. Terms re-
lated to the search query are selected from these documents to
improve the query representation with the help of query Ex-
pansion [5]. The process of adding more significant and contex- tually similar words to the original query is called query expan- sion. Most often, queries contain terms that may not match the indexed terms leading to lesser accuracy in retrieval process
[15]. This problem is addressed by relevance feedback, an au- tomatic process of query reformulation, where important words are chosen from previously retrieved documents that are rele- vant to the query [16]. Thus the basic idea behind query expan- sion is to augment the query with related terms like synonyms, plurals, modifiers, category keywords etc. for improving the retrieval accuracy [6]. Many Techniques have been proposed by researchers for query expansion. For our study, we select three
𝑖𝑖𝑖𝑡 = log(𝑁⁄𝑖𝑖𝑡 )
• N is the total number of documents in Collection C,
• dft is the document frequency for term t.
(1)
QE models namely Bo1, Bo2 and KL. A comparative study on the retrieval effectiveness of state-of-the-art retrieval models on two different languages is introduced in this paper. Also, we
• tf combined with idf give high weights to rare terms and low weights to more frequent ones.
The statistical models usually use different weighting models for ranking the documents. These term weighting methods on documents are based on the query input by the user. Most of the time, the Collection that is to be searched contain relevant and irrelevant information. IR faces the two-sided problem of the searchers not being able to frame the best suitable query for their information need and also lack of information regarding the collection used for retrieval [3]. This led to the development of the concept called Relevance Feedback where [4];
1. User submits a query
2. System retrieves an initial set of results
investigate the effectiveness of applying query expansion to improve the retrieval accuracy. The study is mainly done to understand which baseline works most effectively on multiple languages of the FIRE Adhoc 2011 Test collection. We also try to understand the effectiveness of the optimal QE model and how it improves the retrieval accuracy. Terrier™ is used as Infor- mation Retrieval framework for all our experiments [17].
The paper is organized as follows:
The weighting models selected for retrieval and QE in our ex-
periments are discussed in Section 2 and 3 in detail. Section 4 showcases the Experimental Results of different Retrieval mod- els and the effect of Query expansion on them. Section 5 con- tains concluding remarks.
IJSER © 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 14
ISSN 2229-5518
Several weighting models have been proposed by many re- searchers in IR. In this paper we will discuss a few parametric and non-parametric models. In the parametric models, there
2.1.3 PL2:
A Posisson model with Lapalce after effect and normalization
2. PL2 is one of the Divergence from Randomness weighting
models [9]. Scoring function PL2 is given by:
involves a hyper parameter tuning for length normalization. Since each query behaves differently the optimal parameter setting is different for each of them. Through our experiments, we find the optimal values for normalization parameters by
where,
𝑤𝑡 = (1⁄(𝑡𝑖𝑛 + 1)) (𝐴 + 𝐵 + 𝐶)
𝐴 = 𝑡𝑖𝑛 . 𝑙𝑙𝑙2 [𝑡𝑖𝑛 ⁄⋋]
𝐵 = (⋋ +(1⁄12. 𝑡𝑖𝑛) − 𝑡𝑖𝑛). 𝑙𝑙𝑙2𝑒
(7)
(8) (9)
selecting the one that gives the highest MAP. Discussed are BM25, I (n) L2 and PL2 Parametric Models and DPH, DLH parameter free models. These models are based on Terrier’s Divergence from Randomness DFR Models [7].
2.1.1 OKAPI’s BM25
BM25 is one of the best known term-weighting schemes de- rived from the probabilistic model. BM25 is a family of scoring functions and BM stands for Best Match. It takes into account the three components namely, the term frequency, inverse document frequency and the length of the document [8]. In this method, each document D is scored against a Query q given by the formula:
𝑤𝑡 = 𝑡𝑖𝑑 . (𝐴⁄𝐵) (2)
where,
𝐶 = 0.5. 𝑙𝑙𝑙2(2𝜋. 𝑡𝑖𝑛) (10)
• tfn is the normalized term frequency as explained in
(4).
• λ is the mean and variance of a Poisson distribution.
2.2.1 DLH: DLH hyper geometric DFR Model
This is a DFR model based on the hyper geometric distribution of tf. For a workable weighting function, the hyper geometric function is reduced to binomial distribution with non-uniform term priors [10]. It is a parameter free model and there is no need for expensive training. This model assumes that the oc- currences of a query term in a document are samples from the whole collection instead of from the document [11]. The scor- ing function is given by:
𝑆𝑐𝑙𝑆𝑒(𝑖, 𝑄)𝑡 =
𝐴 = (log[(𝑁 − 𝑛 + 0.5)⁄(𝑛 + 0.5)]
(3)
� 𝑞𝑡𝑤 . ((1⁄(𝑡𝑖 + 0.5)) . (𝑙𝑙𝑙2[(𝑡𝑖. 𝑎𝑎𝑙_𝑙)⁄𝑙 . (𝑁⁄𝐹)])
𝐵 = (𝑡1 . ((1 − 𝑏) + 𝑏. (𝑖𝑙⁄𝑎𝑎𝑖𝑙) + 𝑡𝑖𝑑 )
(4)
𝑡𝜖𝑄
+ 0.5 𝑙𝑙𝑙2 (2𝜋𝑡𝑖(1 − (𝑡𝑖⁄𝑙))
(11)
• wt is the relevance weight assigned to a document due to query term t,
• tfd is the number of times t occurs in document
• N is the total number of documents, n is the num-
ber of documents containing at least one occur-
rence of t; dl is the length of the document and avdl is the average document length.
• k1 is the term-frequency influence parameter
1.0≤k1 ≤2.0
• b is the normalization parameter 0.0 ≤b≤1.0, for
document length. b can be set to zero of the docu-
ment length need not be considered.
2.1.2 I (n)L2
An Inverse document Frequency model with LaPlace after-
where,
• F is given by tf/l is within document frequency
• l is the document length in tokens.
• avg_l is the average document length in collection
• tf is the term frequency in the collection.
2.2.2 DPH
DPH is a parameter free scoring technique which is derived from the Divergence from Randomness model [12]. The scor- ing function is given by[19]:
Score(d,Q) =
�((𝑞𝑡𝑤 (1 − 𝐹)2⁄(𝑡𝑖 + 1)) . (𝑡𝑖. 𝑙𝑙𝑙2 (𝑡𝑖. ( 𝑎𝑎𝑙_𝑙⁄𝑙). (𝑁⁄𝑇𝐹)))
effect normalization 2. The scoring function is given by:
𝑤𝑡 = (1⁄𝑡𝑖𝑛 + 1) . (𝑡𝑖𝑛 . 𝑙𝑙𝑙2 [𝑁 + 1⁄𝑁𝑡 + 0.5] (5)
𝑡𝜖𝑄
+0.5 . 𝑙𝑙𝑙2 (2𝜋. 𝑡𝑖. (1 − 𝐹))
(12)
where, tfn is the normalized term frequency given by the for- mula:
𝑡𝑖𝑛 = 𝑡𝑖 . 𝑙𝑙𝑙2 [1 + 𝑐. (𝑎𝑎𝑙𝑙⁄𝑙)] (6)
• c is the term frequency normalization parameter
• l is the document length which corresponds to num-
ber of tokens in a document and
• avg_l is the average document length in the collection.
DPH, like DLH is a parameter free model.
• qtw = qtf/qtfmax ,
• where, qtf is the query term frequency and qtfmax is the maximum query term frequency among all query
terms.
• N is the total number of documents
• avg_l is the average document length in collection
• tf is the term frequency in the collection.
IJSER © 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 15
ISSN 2229-5518
For our experiments, we use Terrier’s DFR-based Term weighting models namely, Bo1, Bo2 and KL. Terrier employs a Divergence from Randomness based QE mechanism which is a generalization of Rocchio’s method [18]. In the first step, the term weights of the terms from top ranked documents are cal- culated. The DFR model calculates the informativeness of a term by the divergence of its distribution in top ranked docu- ments from random distribution [10]. The top most informa- tive terms are then extracted ad merged with the original que- ry to form an expanded one. Weighting schemes for the three models mentioned are as given in the sections 3.1, 3.2 and 3.3 [13].
3.1 Kullback Leibler
The Scoring function is given by,
W(t)=𝑃𝑥 . 𝑙𝑙𝑙2[ 𝑥� ] (6)
𝑐
• Px =tfx / lx ;
• tfx is the frequency of the query term in the top- ranked documents.
• lx is the sum of the length of the exp_doc top ranked docu- ments where exp_doc is a parameter of the query expansion methodology.
3.2 Bo1
This model is based on the based on the Bose Einstein Statistic and the weight of the term t in the top ranked documents (rank ranging from 3 to 10) is given by [14].
1 𝑃
• 𝑃𝑖𝑥 = (𝑡𝑖𝑥 . 𝑙𝑥 )/𝑡𝑙𝑡𝑒𝑛𝑐 ; where lx is the sum of the
length of the exp_doc top ranked documents where
exp_doc is a parameter of the query expansion meth-
odology.
• F, is the term frequency of the query term in the
whole collection.
• tokenc , is the total number of tokens in the whole col- lection.
Our experiments are performed with Terrier Information Re- trieval framework. It provides indexing, retrieval and evalua- tion for English and non-english documents. For the evalua- tion of various retrieval models and performance of QE mod- els on them, we use both English and Hindi collections. This is provided by Forum for Information Retrieval Evaluation (FIRE) and the dataset conforms to the TREC style Format. 100 topics were chosen with 50 each for one language. They are numbered from 126-175 for Adhoc English and Hindi
2011dataset. The corpus in encoded in UTF-8 format and the tags are as follows:
<topics> <top>
<num>126</num>
<title>Swine flu vaccine</title>
<desc>Indigenous vaccine made in India for swine flu prevention</desc>
<narr>Relevant documents should contain in- formation related to making indigenous swine flu vaccines in India, the vaccine's use on humans and animals, arrangements that are in place to prevent scarcity / unavailability of the vaccine, and the vaccine's role in saving lives.</narr>
</top></topics>
Each of the FIRE Topic consists of three fields: title, descrip- tion and narration. All the three types of queries were experi- mented to understand the impact of query length [2]. We evaluate the performance of this dataset on (i) different re-
𝑤(𝑡) = 𝑡𝑖𝑥 . 𝑙𝑙𝑙2
𝑃𝑛
+ 𝑙𝑙𝑙2(1 + 𝑃𝑛 ) (7)
trieval models (parametric and parameter-free) and (ii) en-
hancing the retrieval models using query expansion. We ex-
periment with Short Queries (title field only), Normal Queries
• tfx is the frequency of the query term in the top-
ranked documents
• Pn is given by F/N, where F is the term frequency
in the collection and N is the number of documents
in the collection.
3.2 Bo2
The scoring function of B+o2 is given by :
(Description field) and Long Queries (title + description + nar- ration)
Evaluation is done for TF_IDF, PL2, I(n)L2,DLH, DPH and BM25 to study the impact of short, normal and long queries. The optimum values for the parameters b and c have been set manually. The value that gives the highest MAP is considered as optimum.
𝑤(𝑡) = 𝑡𝑖𝑥 . 𝑙𝑙𝑙2
𝑃𝑓
+ 𝑙𝑙𝑙2(1 + 𝑃𝑓 ) (8)
• tfx is the frequency of the query term in the top- ranked documents
• Pn is given by F/N, where F is the term frequency in the collection and N is the number of documents in the collection.
We tested with the 6 models out of which DLH and DPH are parameter free. The MAP, R Precision (R is the relevant re- trieved documents), Precision at 10 and 20 documents are re- ported in Table 1.
IJSER © 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 16
ISSN 2229-5518
• Results from Hindi_Short_Query (Table 1): The high- est MAP was obtained for PL2 model where term fre- quency normalization parameter c is set to be 4.0 for optimum result. This was set manually. R Precision also was highest for PL2. But Precision at 10 and 20 documents was better for DPH model .Thus, if we
consider the MAP, PL2 model with c set to 4.0 gave the best results for Hindi short queries followed by I(n)L2 and DPH.
TABLE 1
MAP FOR HINDI SHORT QUERIES
• Results for English_Normal_Query (Table 4): Highest MAP is obtained for BM25 at b=0.5. Precision at 10 documents was best obtained for DPH. Precision at 20 was best at PL2 with c=2.0.
TABLE 4
MAP FOR ENGLSIH NORMAL QUERIES
TF_IDF DLH DPH PL2 (C=4.0)
BM25 (b=0.25)
InL2 (c=1.0)
TF_IDF DLH DPH PL2
BM25
InL2
MAP 0.2241 0.2295 0.2433 0.2467 0.2214 0.2454
(c=2.0)
(b=0.5)
(c=0.75)
R Preci-
0.2504 0.2613 0.2772 0.2848 0.2576 0.2775
MAP 0.3694 0.3586 0.3765 0.377 0.3816 0.3696
sion
P@10 0.3580 0.3640 0.4040 0.3980 0.3820 0.3920
R Preci-
sion
0.3888 0.379 0.3893 0.3912 0.4008 0.388
P@20 0.3140 0.3290 0.3670 0.3600 0.3450 0.3550
P@10 0.5160 0.5120 0.5360 0.5260 0.5200 0.5060
P@20 0.4640 0.4580 0.4740 0.4780 0.4740 0.4640
• Results from English_Short_Query (Table 2): The Highest MAP was obtained for PL2 at c=5.0 and BM25 at b=0.25. R Precision is marginally better for DPH compared to PL2 and BM25. P@10 was best ob- tained for PL2 and P@20 for DPH. Thus we got a dis- tributed result but overall, we can conclude that both PL2 and BM25 retrieval worked well for English short queries.
TABLE 2
MAP FOR ENGLISH SHORT QUERIES
Long queries comprise of the (i) title field (ii) description field and (iii) narration field. The average query length is 20-30 words.
• Results for Hindi Long Queries (Table 5): The highest MAP is obtained for BM25 with b =0.75. Rest of the values for Recall and P@10 and 20 are also higher with this model.
TABLE 5
MAP FOR HINDI LONG QUERIES
TF_IDF DLH DPH PL2 (C=5.0)
BM25 (b=0.25)
InL2 (c=2.0)
TF_IDF DLH DPH PL2 (C=2.0)
BM25 (b=0.75)
InL2 (c=0.5)
MAP 0.2965 0.2881 0.3102 0.3138 0.3139 0.2977
MAP 0.2267 0.2047 0.2158 0.2126 0.3506 0.2402
R Preci-
sion
0.3208 0.3148 0.3303 0.3273 0.3258 0.3136
R Preci- sion
0.26 0.2384 0.2534 0.2487 0.3695 0.2711
P@10 0.4200 0.3980 0.4560 0.4640 0.4440 0.4440
P@20 0.3820 0.3740 0.4060 0.4030 0.4050 0.3900
P@10 0.3600 0.3300 0.3500 0.3420 0.4820 0.3740
P@20 0.3360 0.2960 0.3110 0.3010 0.4590 0.3450
For Normal queries, only the description field was considered with number of words ranging from 7-10.
• Results for Hindi_Normal_Query (Table 3): The high- est MAP is obtained for I(n)L2 model with c value at
0.75. Rest of the values: R Precision, P@10 and P@20
• Results for English Long Queries (Table 6): The high- est MAP is obtained for InL2 model with c value at
0.5. However, PL2 at c=1.0 fared well for P@10 and 20 documents.
TABLE 6
MAP FOR ENGLISH LONG QUERIES
are also highest for this model.
TF_IDF DLH DPH PL2
(C=1.0)
BM25
(b=0.25)
InL2
(c=0.5)
TABLE 3
MAP 0.3463 0.3313 0.33 0.3488 0.1554 0.3539
MAP FOR HINDI NORMAL QUERIES
R Preci-
sion
0.3667 0.3561 0.3546 0.3722 0.1906 0.3741
TF_IDF DLH DPH PL2
(C=2.0)
BM25
(b=0.25)
InL2
(c=0.75)
P@10 0.4900 0.4800 0.4900 0.5040 0.2960 0.4980
P@20 0.4550 0.4370 0.4460 0.4740 0.2540 0.4690
MAP 0.2498 0.2529 0.24 0.25 0.2072 0.2651
IJSER © 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 17
ISSN 2229-5518
We perform enhancement of the retrieval models explained here using Query expansion on short queries and test whether the results are significantly different. 1000 documents each are retrieved initially using each of the retrieval models explained in Table 7&8. Each of the models is enhanced with Bo1, Bo2 and KL models and is checked for the improvement in MAP. For QE, top 10 documents are used and 10 words are used for
TABLE 8
QE ON ENGLISH SHORT QUERIES
expansion with “title” query which is short. Table 7 & 8 shows the results obtained after query expansion with 10 terms from
MAP (%improvement)
RPrecision P@10 P@20
top 10 documents.
24%. Among the baseline, PL2 with c=4 and InL2 with c=1 gave the best MAP for initial retrieval with 0.2467 and 0.2454 respectively. The highest delta +34% was obtained for BM25 (b-0.25), enhanced with Bo1. We can also observe that Query expansion using Bo2 weighting hurt the MAP drastically and deteriorated the results in DLH (-18%), PL2(-4.8%), BM25 (-
80%).
TABLE 7
QE ON HINDI SHORT QUERIES
TF_IDF 0.2965 0.3208 0.4200 0.3820
TF_IDF_Bo1 0.3548(+19.66%) 0.3661 0.4580 0.4090
TF_IDF_Bo2 0.3652(+23%) 0.3721 0.4880 0.4350
TF_IDF_KL 0.3543(+19%) 0.3694 0.4500 0.4040
DLH 0.2882 0.3148 0.3980 0.3740
DLH_Bo1 0.3473(+20.5%) 0.3666 0.4460 0.3990
DLH_Bo2 0.3423(+18.77 %) 0.3553 0.4420 0.4070
DLH_KL 0.3452(+19.77%) 0.3641 0.4480 0.4000
DPH 0.3102 0.3303 0.4560 0.4060
DPH_Bo1 0.3723(+20%) 0.3778 0.4900 0.4360
DPH_Bo2 0.3712(+19.66%) 0.3771 0.5020 0.4460
DPH_KL 0.3703(+19.37%) 0.3764 0.4960 0.4360
MAP (%improvement)
RPrecision P@10 P@20
PL2_C5.0 0.3138 0.3273 0.4640 0.4030
PL2_Bo1 0.3724(+18.6%) 0.3766 0.4880 0.4430
TF_IDF 0.2241 0.2504 0.3580 0.3140
TF_IDF_Bo1 0.2785 (+24%) 0.2924 0.418 0.372
TF_IDF_Bo2 0.2415(+7%) 0.267 0.354 0.332
TF_IDF_KL 0.2768(+23%) 0.2921 0.404 0.368
DLH 0.2295 0.2613 0.3640 0.3290
DLH_Bo1 0.2919(+27%) 0.3097 0.4240 0.3860
DLH_Bo2 0.1861(-18%) 0.2145 0.3240 0.2750
DLH_KL 0.2912(+27%) 0.3039 0.4260 0.3850
DPH 0.2433 0.2772 0.4040 0.3670
DPH_Bo1 0.3004(+23%) 0.3181 0.434 0.367
DPH_Bo2 0.2467(+1.39%) 0.2706 0.392 0.349
DPH_KL 0.3(+23%) 0.3155 0.436 0.391
PL2_Bo2 0.3686(+17.46%) 0.3699 0.4920 0.4260
PL2_KL 0.3712(+18.29%) 0.3771 0.5040 0.4340
BM25_b0.25 0.3139 0.3258 0.4440 0.4050
BM25_b0.25_Bo1 0.3651(16.3%) 0.3715 0.4860 0.4330
BM25_b0.25_Bo2 0.3633(+15.7%) 0.371 0.4900 0.4240
BM25_b0.25_KL 0.3629(+15.6%) 0.3682 0.4700 0.4220
InL2_c2.0 0.2977 0.3136 0.4440 0.3900
InL2_c1.0_Bo1 0.3629(+21%) 0.3626 0.4760 0.4190
InL2_c1.0_Bo2 0.369(+23.9%) 0.3684 0.4860 0.4300
InL2_c1.0_KL 0.3627(+21.8%) 0.3625 0.4760 0.4270
PL2_C4.0 | 0.2467 | 0.2848 | 0.3980 | 0.3600 | 5 CONCLUSION In our experiment we conducted a study on effectiveness |
PL2_Bo1 | 0.3033(+22%) | 0.3253 | 0.43 | 0.398 | 5 CONCLUSION In our experiment we conducted a study on effectiveness |
PL2_Bo2 | 0.2348(-4.8%) | 0.2746 | 0.382 | 0.342 | 5 CONCLUSION In our experiment we conducted a study on effectiveness |
PL2_KL | 0.3021(+22.45%) | 0.3218 | 0.436 | 0.399 | shown by QE. This is an initial step towards identifying a |
baseline for our future experiments that involves finding a | |||||
BM25_b0.25 | 0.2214 | 0.2576 | 0.3820 | 0.3450 | term weighting strategy for Query Expansion using PRF. We |
BM25_b0.25_Bo1 | 0.2976(+34%) | 0.3045 | 0.432 | 0.398 | also investigated the improvement shown by state-of-the-art |
BM25_b0.25_Bo2 | 0.0426(-80%) | 0.0599 | 0.0918 | 0.0776 | QE models on FIRE Collection. The results of our study show |
BM25_b0.25_KL | 0.2973(34%) | 0.3094 | 0.436 | 0.396 | that there is a relation between the retrieval effectiveness and |
query expansion as mentioned by previous researchers. Also, | |||||
InL2_c1.0 | 0.2454 | 0.2775 | 0.392 | 0.355 | Query Expansion has improved the MAP of the retrieval by |
InL2_c1.0_Bo1 | 0.3056(+24%) | 0.3169 | 0.432 | 0.418 | 18-20% for the FIRE 2011 Collection. |
InL2_c1.0_Bo2 | 0.2609(+6.3%) | 0.2887 | 0.392 | 0.362 | For Hindi dataset, Bo1 model gave the best results where as |
InL2_c1.0_KL | 0.3034(23.6%) | 0.3191 | 0.43 | 0.417 | all three models performed similarly for English dataset. It is |
IJSER © 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 18
ISSN 2229-5518
also observed that the b parameter in BM25 was optimum at
0.25 for both the Collections, in case of short queries.
For Hindi dataset, InL2 performed better with different c val- ues for short and normal queries. Though PL2 gave the best result for short queries, the result is not very significantly dif- ferent from InL2. Applying QE on PL2 did not show much improvement and incase of Bo2, it hurt the MAP by 4.8%. In- fact, Bo2 consistently did not improve the MAP in any signifi- cant manner for Hindi dataset.Hence we can support the sug- gestion that it is not only the quality of the top ranked docu- ments but also the quality of the reweighting fo the query terms that improves the retrieval effectiveness [11]. In case of English dataset, BM25 at b=0.25 and PL2 gave the highest MAP during initial retrieval for short and normal queries. MAP was improved by 17-18% using Bo1, Bo2 and KL for English dataset.
It is observed that the drawback with the parametric models is that they require the parameter tuning and in case of automat- ic query expansion, setting the parameter automatically would in itself be a research problem. Our future study aims at for- mulating a QE model that will find the optimal values for pa- rameters, if any, automatically and yield better results com- pared to the state-of-the-art models. We would also like to consider the length of the query while reformulating it as this can reduce the iterations during retrieval. Thus our future work aims at integrating both these aspects effectively and giving improved results for retrieval.
The authors sincerely thank FIRE for providing test collections especially on Indian Languages and for the relevance judge- ment files. The authors would also like to thank the Terrier team for the Terrier Retrieval Engine which was used for in- dexing, retrieval and evaluation of the experiments on differ- ent languages.
[1] Turtle, H. R., & Croft, W. B. (1992). A comparison of text retrieval models. The computer journal, 35(3), 279-290.
[2] He, Ben, and Iadh Ounis. "Term frequency normalisation tuning for BM25 and DFR models." Advances in Information Retrieval. Springer Berlin Heidelberg, 2005. 200-214.
[3] Ruthven, Ian, and Mounia Lalmas. "A survey on the use of relevance feedback for information access systems." The Knowledge Engineer- ing Review 18.02 (2003): 95-145.
[4] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge university press, 2008.
[5] Lv, Yuanhua, and ChengXiang Zhai. "Positional relevance model for pseudo-relevance feedback." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.
[6] Vaidyanathan, Rekha, Sujoy Das, and Namita Srivastava. "Query Expansion Based on Equi-Width and Equi-Frequency Partition." Multilingual Information Access in South Asian Languages. Springer
Berlin Heidelberg, 2013. 13-22.
[7] Plachouras, Vassilis, Ben He, and Iadh Ounis. "University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier." TREC. 2004.
[8] Hawking, David, Trystan Upstill, and Nick Craswell. "Toward better
weighting of anchors." Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2004.
[9] G. Amati and C. J van Rijsbergen. Probabilistic models of Infor-
mation Retrieval based on measuring the divergence from random-
ness. In ACM Transactions on Informaton Systems (TOIS), volu- men20 (4), pages 357-389, 2002.
[10] Lu, Sha, Ben He, and Jungang Xu. "Hyper-geometric Model for In-
formation Retrieval Revisited." Information Retrieval Technology. Springer Berlin Heidelberg, 2013. 62-73.
[11] He, Ben, and Iadh Ounis. "Combining fields for query expansion and adaptive query expansion." Information processing & management
43.5 (2007): 1294-1307.
[12] G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G.Gambosi. FUB,
IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track. In
Proceedings of TREC 2007.
[13] Plachouras, Vassilis, Ben He, and Iadh Ounis. "University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier." TREC. 2004.
[14] Macdonald, C., He, B., Plachouras, V., & Ounis, I. (2005). University of Glasgow at TREC 2005: Experiments in terabyte and enterprise tracks with terrier. In Proceedings of the 14th text retrieval confer- ence (TREC 2005). Gaithersburg, MD.
[15] Harman, Donna. "Relevance Feedback and Other Query Modification
Techniques." (1992): 241-263.
[16] Salton, Gerard, and Chris Buckley. "Improving retrieval performance by relevance feedback." Readings in information retrieval 24.5 (1997).
[17] Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006, August). “Terrier: A high performance and scalable infor- mation retrieval platform”. In Proceedings of the OSIR Workshop (pp. 18-25).
[18] Rocchio, J. (1971). Relevance feedback in Information Retrieval. In G.
Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 313–323). Prentice-Hall Englewood Cliffs.
[19] McCreadie, R., Macdonald, C., Ounis, I., Peng, J., & Santos, R. L. (2009). University of glassgow at trec 2009: Experiments with terrier. GLASGOW UNIV (UNITED KINGDOM).
IJSER © 2015 http://www.ijser.org