International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 981
ISSN 2229-5518
Detection Inflection s-shaped model Using SPRT
based on Order Statistics
1
Associate Professor, Dept. of Computer Science & Engg., Acharya Nagarjuna University
Nagarjuna Nagar, Guntur, Andhra Pradesh, India.
Asst Professor, Dept. of IT, VRSEC, Vijayawada Andhra Pradesh, India
Sangeetha18.yalamanchili@gmail.com
ABSTRACT- To assess the software reliability by statistical means yields efficient results. In this paper, for an effective monitoring of failure process we have opted Sequential Probability Ratio Test (SPRT) over the time between every rth failure (r is a natural number >=2) instead of inter-failure times. This paper projects a controlling framework based on order statistics of the cumulative quantity between observations of time domain failure data using mean value function of Inflection S-Shaped Model. The two unknown parameters can be estimated using the Maximum Likelihood Estimation (MLE).
Order statistics deals with properties and
applications of ordered random variables and functions of these variables. The use of order statistics is significant when inter failure time is less or failures are frequent. Let A denote a continuous random variable with probability density function (pdf), f(a) and cumulative distribution function (cdf), F(a), and let (A1 , A2
, …, Ak) denote a random sample of size k drawn on A. The original sample observations may be unordered with respect to magnitude. A transformation is required to produce a corresponding ordered sample. Let (A(1) , A(2) ,
…, A(k)) denote the ordered random sample
such that A(1) < A(2) < … < A(k); then (A(1), A(2), …, A(k)) are collectively known as the order statistics derived from the parent A. The various distributional characteristics can be known from Bala Krishnan and Cohen(1991).
The sequential probability ratio test (SPRT) was
developed by A. Wald at Columbia University in 1943. Due to its usefulness in development work on military and naval equipment it was classified as “Restricted‟ by the Espionage Act
[6]. A big advantage of sequential tests is that they require fewer observations (time) on the average than fixed sample size tests. SPRTs are widely used for statistical quality control in manufacturing processes. An SPRT for homogeneous Poisson processes is described below.
Let {N (t), t≥ 0} be a homogeneous Poisson process with rate ‘λ’. In our case, N(t)=number of failures up to time ‘t’ and ‘λ’ is the failure rate (failures per unit time ). Suppose that we put a system on test (for example a software system, where testing is done according to a usage profile and no faults are corrected) and that we want to estimate its failure rate ‘λ ’. We cann ot expect to estimate ‘λ’ precisely. But we want to reject the system with a high probability if our data suggest that the failure rate is larger than λ1 and accept it with a high probability, if
it’s smaller than λ0 (0 < λ0 < λ1 ). As always
with statically tests, there is some risk to get the
wrong answers. So we have to specify two (small) numbers ‘α’ and ‘β’, where ‘α’ is the probability of falsely rejecting the system. That
is rejecting the system even if 𝜆 ≤ 𝜆0 . This is
the “producer’s” risk. β is the probability of
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 982
ISSN 2229-5518
falsely accepting the system. That is accepting the system even if λ ≥ 1. This is the “consumer’s” risk. With specified choices of λ0 and λ1 such that 0 < λ0 < λ1, the probability of finding N(t) failures in the time span (0,t ) with λ1 , λ0 as the failure rates are respectively given by
𝑒 −𝜆1 [𝜆1𝑡]𝑁(𝑡)
above decision processes is to reject the system as unreliable if N(t) falls for the first time above the line
𝑁𝑢 (𝑡) = 𝑎. 𝑡 + 𝑏2 (2.4)
to accept the system to be reliable if N(t) falls
for the first time below the line
𝑃1 =
(2.1)
𝑁(𝑡)!
𝑁 (𝑡) = 𝑎. 𝑡 − 𝑏
(2.5)
𝐿 1
𝑃0 =
𝑒 −𝜆0𝑡[𝜆0𝑡]𝑁(𝑡)
(2.2)
𝑁(𝑡)!
To continue the test with one more observation on [t, N(t)] as the random graph of [t, N(t)] is between the two linear boundaries given by
The ratio 𝑃1
𝑃0
at any time‘t’ is considered at a
equations (2.4) and (2.5)
measure of deciding the truth towards 𝜆0 or 𝜆1 , given a sequence of time instants say 𝑡1 < 𝑡2 <
𝑡3 <……….<𝑡𝑘 and the ccorresponding
Where 𝛼 = 𝜆1−𝜆0
log�𝜆1�
𝜆0
1−𝛼
realizations
N(𝑡1 ), 𝑁(𝑡2 ), ……..N(𝑡𝑘 ) of N(t). simplification
𝑏1
log
= 𝛽
log�𝜆1�
𝜆0
of 𝑝1
gives
log�1−𝛽
𝑝0
𝑝1
= exp (𝜆
𝑁(𝑡)
− 𝜆 )𝑡 + � 1 �
(2.3)
𝑏2 =
𝛼
log�𝜆1�
𝜆0
𝑝0
0 1 𝜆0
The parameters 𝛼, 𝛽, 𝜆0 , 𝜆1 can be chosen in
The decision rule of SPRT is to decide in favor
of 𝜆1 , in favor of 𝜆0 or to continue by observing
several ways. One way suggested by [5] is
the number of failures at a later time than 't'
𝜆0 =
𝜆.log(𝑝)
, 𝜆1 = 𝑞.
𝜆.log 𝑞
where 𝑞 =
𝜆1
according as 𝑝1
𝑝0
is greater than or equal to a
𝑞 −1
𝑞−1
𝜆0
constant say A, less than or equal to a constant
say B or in between the constants A and B. That
is, we decide the given software product as unreliable, reliable or continue the test process with one more observation in failure data, according as
𝑝1 ≥ A
𝑝0
𝑝1 ≤ 𝐵
𝑝0
B< 𝑝1 <A
𝑝0
The approximate values of the constants A and
B are taken as
A≅ 1−𝛽 , B≅ 𝛽
𝛼 1−𝛼
Where 𝛼 and ′𝛽′ are the risk probabilities as
defined earlier. A simplified version of the
If 𝜆0 and 𝜆1 are chosen in this way, the slope of
NU (t) and NL (t) equals λ. The other two ways
of choosing 𝜆0 and 𝜆1 are from past projects (for
a comparison of the projects) and from part of
the data to compare the reliability of different
functional areas (components).
III. ILLUSTRATING THE MLE METHOD.
Based on the inter failure data given in Data set
#1 & Data Set#2, we demonstrate the software failures process through failure control chart. We used cumulative time between failures data for software reliability monitoring. The use of cumulative quality is a different and new approach, which is of particular advantage in reliability. ‘a’ and ‘b’ are Maximum Likely hood Estimates (MLEs) of parameters ‘a’ and ‘b’ and the values can be computed using iterative
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 983
ISSN 2229-5518
method for the given cumulative time between failures data.
𝑏𝑛−1 = 𝑏𝑛 −
𝑔(𝑏𝑛 )
𝑔′(𝑏𝑛 )
, which is substituted in
The probability density function of a
two-parameter inflection S-shaped model has the form:
finding ‘a’. Where g (b) and g’(b) are expressed
as follows.
Taking the partial derivative w.r.t ‘b’ and equating to ‘0’.
f (t) = 𝑏𝑒
−𝑏𝑡
(1+𝛽 )
−�𝑏𝑡 �
−�𝑏𝑡 �
(1+𝛽𝑒 −𝑏𝑡)2
𝑛
g(b) =
+ ∑𝑛
( −𝑡
𝑡𝑖 𝑒
+
(𝑟−1) + 𝑖
𝑖 𝑟(+1)𝛽
The corresponding cumulative distribution
𝑏 𝑖=0 𝑖
−𝑏𝑡𝑛
1−𝑒 −(𝑏𝑡𝑖)
1+𝛽𝑒 −(𝑏𝑡𝑖) )
function is:
- 𝑛𝑟 𝑡𝑛𝑒
(1+ 𝛽 )
F (t) = = 1
(1- 𝑒−𝑏𝑡 )
� 1−𝑒 −𝑏𝑡𝑛 � �1+𝛽 𝑒 −𝑏𝑡𝑛 �
(3.2)
1+𝛽𝑒 −𝑏𝑡
Mean value function of the model is
Again partially differentiating w.r.t ‘b’ and
equating to 0
g’(b)=- 𝑛 ∑𝑛
(−𝑡 2 𝑒−(𝑏𝑡𝑖 ) )[ 𝑟−1
]+nr𝑡
2(1+β)
m (t) = 𝑎
1+𝛽𝑒 −𝑏𝑡
(1- 𝑒−𝑏𝑡 )
𝑏2
𝑖=0 𝑖
(1−𝑒 −𝑏𝑡𝑖 )2 𝑛
For rth order statistics, the mean value function is expressed as
𝑒 −(𝑏𝑡𝑛) (�1−𝑒 −(𝑏𝑡𝑛) �+ 𝑒 −(𝑏𝑡𝑛) �1+𝛽𝑒 −(𝑏𝑡𝑛) )�
[ ]
(1−𝑒 −𝑏𝑡𝑛 )2 (1−𝛽 𝑒 −𝑏𝑡𝑛 )2
m r (t) = ( 𝑎(1−𝑒
−𝑏𝑡
) )r
(3.3)
Iterative Newton-Raph son method is used to
1+𝛽𝑒 −𝑏𝑡
The failure intensity function of r th order is
given as: λ r (t) = [ m r (t) ]
To estimate ‘a’ and ‘b’, for a sample of n units, first obtain the likelihood function:
The likelihood function
Solve the equations (3.1), (3.2), (3.3) in order to
get the approximated values of a & b for the given failure data sets.
L = 𝑒−𝑚𝑟 (𝑡𝑛) ∏𝑛
λ𝑟
𝑖=1 (𝑡𝑖 )
Take the natural logarithm on both sides, The
Log Likelihood function is given as (Pham,
2006):
The Poisson process we know that the expected value of N(t) = λ t called the average number of failures experienced in time 't' .This is also called the mean value function of the Poisson process. On the other hand if we consider a
Log L = ∑𝑛
𝑚𝑟 (𝑡𝑛 )=∑𝑛
log[λ𝑟 (𝑡𝑖 ) ] –
log(𝑎𝑟 𝑟 𝑏𝑒 −�𝑏𝑡𝑖�
𝑟−1
( 1−𝑒 −�𝑏𝑡𝑖 �)𝑟−1 (1+𝛽))
Poisson process with a general function (not necessarily linear) m (t) as its mean value function the probability equation of a such a process is
𝑖=1
(1+𝛽𝑒 −𝑏𝑡𝑖) )𝑟+1
𝑟 −(𝑏𝑡𝑛) 𝑟
[ 𝑚(𝑡)]
−𝑚(𝑡)
- 𝑎 [1−𝑒 ] )
P[𝑁(𝑡) = 𝑌] =
. 𝑒
𝑦!
, 𝑦 = 0,1,2, − − −
(1+𝛽𝑒 −𝑏𝑡𝑛) )𝑟
−𝑏𝑡𝑛
Depending on the forms of m (t) we get various
𝑎𝑟 = n (
1+𝛽𝑒
r (3.1)
Poisson processes called NHPP for our model
1−𝑒 −(𝑏𝑡𝑛 ) )
The parameter ‘a’ is estimated by taking the
partial derivative w.r.t ‘a’ and equating to ‘0’.
the mean value function is Inflection S-Shaped:
1−𝑒 −𝑏𝑡
The parameter ‘b’ is estimated by iterative
Newton Rap son Method using
m (t) =a�
1+𝑒 −𝑏𝑡
we may write
� where a>0, b>0, t>0
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 984
ISSN 2229-5518
𝑒 −𝑚1(𝑡).�𝑚1(𝑡) �
Rejection region:
𝑝1 =
𝑁(𝑡)!
𝑁(𝑡)
N(𝑡) ≥
log�1−𝛽�+𝑎�
𝛼
𝑒−𝑏0𝑡 − 𝑒−𝑏1𝑡 +𝛽�𝑒−𝑏0𝑡−𝑒−𝑏1𝑡�
�
�1+𝛽𝑒−𝑏1𝑡��1+𝛽𝑒−𝑏0𝑡�
(4.5)
𝑒 −𝑚0(𝑡).�𝑚0(𝑡)�
log�� 1−𝑒−𝑏1𝑡 ��1+𝛽𝑒−𝑏0𝑡��
𝑝0 =
𝑁(𝑡)!
1+𝛽𝑒−𝑏1𝑡
1−𝑒−𝑏0𝑡
Where 𝑚1 (𝑡), 𝑚0 (𝑡) , are values of the mean value
function at specified sets of its parameters indicating
reliable software and unreliable software
Continuation region:
−𝑏0𝑡
log( )+𝑎�
− 𝑒−𝑏1𝑡
+𝛽�𝑒−𝑏0𝑡
−𝑒−𝑏1𝑡��
�
respectively. For instance the model we have been considering their m(t) function, contains a pair of
1−𝛼
�1+𝛽𝑒−𝑏1𝑡��1+𝛽𝑒−𝑏0𝑡�
<
log�� 1−𝑒−𝑏1𝑡 ��1+𝛽𝑒−𝑏0𝑡��
parameters a, b with ‘a’ as a multiplier. Also a, b are
positive. Let 𝑝0, 𝑝1 , be values of the NHPP at two
specifications of b say 𝑏0 , 𝑏1 (𝑏0 <
1+𝛽𝑒−𝑏1𝑡
1−𝑒−𝑏0𝑡
𝑏1 ) respectively. It can be shown that for our model
m(t) at b1 is greater than that at b0. Symbolically
log 1−𝛽�+𝑎�
𝛼
N (t) <
𝑒−𝑏0𝑡 − 𝑒−𝑏1𝑡 +𝛽�𝑒−𝑏0𝑡−𝑒−𝑏1𝑡�
�
�1+𝛽𝑒−𝑏1𝑡��1+𝛽𝑒−𝑏0𝑡�
1−𝑒−𝑏1𝑡
1+𝛽𝑒−𝑏0𝑡
m0 (t)<m1 (t). Then the SPRT procedure is as follows:
Accept the system to be reliable 𝑝1 ≤ 𝐵
𝑝0
log�� �� ��
1+𝛽𝑒−𝑏1𝑡 1−𝑒−𝑏0𝑡
(4.6)
It may be noted that in the above two models the
i.e. 𝑒
−𝑚1(𝑡)
.[𝑚1(𝑡)]𝑁(𝑡)
≤ 𝐵
decision rules are exclusively based on the
𝑒 −𝑚0(𝑡).[𝑚0(𝑡)]𝑁(𝑡)
strength of the sequential procedure (𝛼, 𝛽) and
the values of the respective mean value
i.e. N(t)≤
log� 𝛽 �+𝑚 (𝑡)−𝑚 (𝑡)
1−𝛼
(4.1)
functions namely 𝑚0
(𝑡). 𝑚1 (𝑡) . If the mean
𝑙𝑜𝑔𝑚1(𝑡)−𝑙𝑜𝑔 𝑚0(𝑡)
Decide the system to be unreliable and reject if
𝑝1 ≥ 𝐴
𝑝0
value function is linear in ‘t’ passing through
origin, that is, m (t) = λ t the decision rules become decision lines as described by [5]. In that sense equations (4.1), (4.2), (4.3) can be
log 1−𝛽
�+𝑚1(𝑡)−𝑚0(𝑡)
regarded as generalizations to the decision
i.e. N(t)≥ 𝛼
𝑙𝑜𝑔𝑚1 (𝑡)−𝑙𝑜𝑔𝑚0(𝑡)
(4.2)
procedure of [5].The applications of these results for live software failure data are presented with
Continue the test procedure as long as
analysis in Section 5.
log� 𝛽 �+𝑚1 (𝑡)−𝑚0(𝑡)
1−𝛽
log�
�+𝑚1(𝑡)−𝑚0(𝑡)
1−𝛼 < 𝑁(t)< 𝛼
𝑙𝑜𝑔𝑚1(𝑡)−𝑙𝑜𝑔𝑚0(𝑡)
𝑙𝑜𝑔𝑚1 (𝑡)−𝑙𝑜𝑔𝑚0(𝑡)
(4.3)
Substituting the appropriate expressions of the respective mean value functions – m (t) of Inflection S-Shaped we get the decision rules and is given in followings lines
−𝑏𝑡
We see that the developed SPRT methodology is for a software failure data which is of the form [t, N(t)] where N(t) is the observed number of failures of software system or its sub system in
‘t’ units of time. In this section we evaluate the
m (t)=𝛼 �1−𝑒
1+𝑒 −𝑏𝑡
� where a>0, b>0 and t>0
decision rules based on the considered mean value functions for two different data sets of the
Acceptance regions:
−𝑏0𝑡
log( )+𝑎�
− 𝑒−𝑏1𝑡
+𝛽�𝑒−𝑏0𝑡
−𝑒−𝑏1𝑡��
�
above form, borrowed from [2] [7] [8]. Based on the estimates of the parameter ‘b’ in each mean value function, we have chosen the
N(𝑡) ≤
1−𝛼
�1+𝛽𝑒−𝑏1𝑡��1+𝛽𝑒−𝑏0𝑡�
log�� 1−𝑒−𝑏1𝑡 ��1+𝛽𝑒−𝑏0𝑡��
specifications of b0=b-δ, b1=b+ δ equidistant on
either side of estimate of b obtained through a
1+𝛽𝑒−𝑏1𝑡
1−𝑒−𝑏0𝑡
(4.4)
Data Set to apply SPRT such that b0 < b < b1.
Assuming the value of δ =0.000002, the choices
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 985
ISSN 2229-5518
are given in the following table.
The data sets were listed in "DATA" directory
Containing 45 industry project failure data sets
in the Handbook of Software Reliability
Engineering (Lyu, 1996).
Table 5.1: Data Set #1
In this section we evaluate the decision rules based on the given mean value function for two different data sets. Based on the estimated value of the parameter ‘b’, we have chosen the specifications of b0, b1 that are to be equidistant such that b0< b < b1. The choices are given in the following table.
Table 5.3: Specifications of b0, b1 for Data set # I
Order | Estimat e of a | Estimate of b | b0 | b1 |
4th | 2.49733 | 0.000005 | 0.000003 | 0.000007 |
5th | 1.99883 | 0.000007 | 0.000005 | 0.000009 |
Table 5.4: Specifications of b0, b1 for Data Set # II
Table 5.2: Data Set #2
Using the selected b0, b1 and m0 (t), m1 (t) we have calculated the decision rules given by Equations (4.4), (4.5), sequentially at each ‘t’ of the data sets taking the strength ( α, β ) as (0.05,0.05). These are presented for the model in Table 5.8.
Table 5.4: 4th order statistics for table 5.1 data set # 1
IJSER © 2014 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 986
ISSN 2229-5518
FNO | CFT | FNO | CFT | FNO | CFT |
1 | 1557 | 13 | 7564 | 23 | 34077 |
2 | 1639 | 14 | 7612 | 24 | 35422 |
3 | 1973 | 15 | 8496 | 25 | 37476 |
4 | 2183 | 16 | 9356 | 26 | 39336 |
5 | 2714 | 17 | 10662 | 27 | 47688 |
6 | 3455 | 18 | 12523 | 28 | 50119 |
7 | 5045 | 19 | 13656 | 29 | 58707 |
8 | 5087 | 20 | 24480 | 30 | 69259 |
9 | 5222 | 19 | 13656 | 31 | 78723 |
10 | 5608 | 20 | 24480 | 32 | 88694 |
11 | 6599 | 21 | 26136 | ||
12 | 7042 | 22 | 31174 |
Table 5.5: 5th order statistics for table 5.1 data set # 1
Table 5.6: 4th order statistics for table 5.2 data set # 2
Table 5.7: 5th order statistics for table 5.2 data set # 2
Table 5.8: SPRT Analysis for Inflection S-Shaped model
In this paper we have monitored two failure live data sets using SPRT. We are greatly succeeded in applying SPRT analysis over order statistic approach. We have observed that through order statistic approach we can have an early decision about acceptance/rejection of the software system being tested.
[1]. Balakrishnan. N., Clifford Cohen. A., (1991). “Order
Statistics and Inference”, Academic Press, INC. page 13.
[2]. Macgregor, J.F., Kourti, T., (1995). “Statistical process control of multivariate processes”. Control Engineering
Practice Volume 3, Issue 3, 403-414.
[3]. Musa, J.D., Iannino, A., Okumoto, k., (1987). “Software
Reliability: Measurement Prediction Application”.
McGraw-Hill, New York.
[4]. Ohba, M., (1984). “Software reliability analysis model”.
IBMJ. Res. Develop. 28. 428-443.
[5]. Pham. H., (2006). “System software reliability”, Springer.
[6]. Swapna S. Gokhale and Kishore S.Trivedi, (1998). “Log- Logistic Software Reliability Growth Model”. The 3rd
IEEE International Symposium on High-Assurance Systems
Engineering. IEEE Computer Society.
[7]. Lyu, M. R., (1996). “Handbook of Software Reliability
Engineering”. McGraw-Hill publishing, ISBN 0-07-039400-
8.
[8]. Oakland. J., (2008). “Statistical Process Control”, Sixth
Edition, Butterworth-Heinemann, Elsevier.
[9]. Malaiya, Y. K., Sumitsur, Karunanithi, N. and Sun, Y. (1990). “Implementation considerations for software
reliability”, 8th Annual Software Reliability Symposium, June 1.
Data set | Ord er | T | N(t) | Acceptan ce Region(≤) | Rejection Region (≥) | Decision |
1 | 4 | 1557 | 1 | -3.469260 | 3.504097 | Rejected |
1 | 4 | 1639 | 2 | -3.468960 | 3.505623 | Rejected |
1 | 4 | 1973 | 3 | -3.467736 | 3.511835 | Rejected |
1 | 4 | 2183 | 4 | -3.466971 | 3.515739 | Rejected |
1 | 5 | 1579 | 1 | -5.013463 | 5.054164 | Rejected |
1 | 5 | 1738 | 2 | -5.013897 | 5.058674 | Rejected |
1 | 5 | 2030 | 3 | -5.014706 | 5.066956 | Rejected |
IJSER © 2014 http://www.ijser.org