International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 1

ISSN 2229-5518

Application of Reliability Analysis:

A Technical Survey

Dr. Anju Khandelwal

Abstract— The objective of this paper is to present a survey of recent research work of high quality that deal with reliability in different fields of engineering and physical sciences. This paper covers several important areas of reliability, significant research efforts being made all over the world. The survey provides insight into past, current and future trends of reliability in different fields of Engineering, Technology and medical sciences with applications with specific problems.

Index Terms— CCN, Coherent Systems, Distributed Computing Systems, Grid Computing, Nanotechnology, Network

Reliability, Reliability.

—————————— • ——————————

HIS Traditional system-reliability measures include reliability, availability, and interval availability. Re- liability is the probability that a system operates without interruption during an interval of interest under specified conditions. Reliability can be extended to in- clude several levels of system performance. A first per- formance-oriented extension of reliability is to replace a single acceptable-level-of-operation by a set of perfor- mance-levels. This approach is used for evaluating net- work performance and reliability. The performance-level is based on metrics derived from an application- dependent performance model. For example, the perfor- mance-level might be the rate of job completion, the re- sponse time, or the number of jobs completed in a given time-interval. Availability is the probability that the sys- tem is in an operational state at the time of interest. Availability can be computed by summing the state prob- abilities of the operational states. Reliability is the proba- bility that the system stays in an operational state throughout an interval. In a system without repair, relia- bility and availability are easily related. In a system with repair, if any repair transitions that leave failed states are deleted, making failure states absorbing states, reliability can be computed using the same methods as availability. Interval availability is the fraction of time the system spends in an operational state during an interval of inter- est. The mean interval availability can be computed by determining the mean time the system spends in opera- tional states. Mean interval availability is a cumulative measure that depends on the cumulative amount of time

spent in a state.

For example, how can traditional reliability assessment

techniques determine the dependability of manned space

vehicle designed to explore Mars, given that humanity

has yet to venture that far into space? How can one de-

termine the reliability of a nuclear weapon, given that the

world has in place test-ban treaties and international

agreements? And, finally, how can one decide which ar- tificial heart to place into a patient, given neither has ever been inside a human before? To resolve this dilemma, reliability must be: 1) reinterpreted, and then 2) quanti- fied. Using the scientific method, researchers use evi- dence to determine the probability of success or failure. Therefore, reliability can be seen as an image of probabili- ty. The redefined concept of reliability incorporates aux- iliary sources of data, such as expert knowledge, corpo- rate memory, and mathematical modeling and simula- tion. By combining both types of data, reliability assess- ment is ready to enter the 21st century. Thus, reliability is a quantified measure of uncertainty about a particular type of event (or events). Reliability can also be seen as a probability.

2 APPLICATIONS OF RELIABILITY IN DIFFERENT FIELDS

Reliability is a charged word guaranteed to get attention at its mere mention. Bringing with it a host of connota- tions, reliability, and in particular its appraisal faces a critical dilemma at the dawn of a new century. Tradition- al reliability assessment consists of various real-world assessments driven by the scientific method; i.e., conduct- ing extensive real-world tests over extensive time periods (often years) enabled scientists to determine a product’s reliability under a host of specific conditions. In this 21st century, humanity’s technology advances walk hand in hand with myriad testing constraints, such as political and societal principles, economic and time considerations, and lack of scientific and technology knowledge. Because of these constraints, the accuracy and efficiency of tradi- tional methods of reliability assessment become much more questionable. Applications are the important part of research. Any theory has importance, if it is useful and applicable. Many researchers are busy these days apply-

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 2

ISSN 2229-5518

ing concepts of Reliability in various fields of Engineering and Sciences. Some important applications are given here:

Network analysis is also an important approach to model real-world systems. System reliability and system unre- liability are two related performance indices useful to measure the quality level of a supply-demand system. For a binary-state network without flow, the system unrelia- bility is the probability that the system can not connect the source and the sink. Extending to a limited-flow net- work in the single-commodity case, the arc capacity is stochastic and the system capacity (i.e. the maximum flow) is not a fixed number. The system unreliability for (+ 1), the probability that the upper bound of the system capacity equals can be computed in terms of upper boun- dary points. An upper boundary point is the maximal system state such that the system fulfills the demand. In his paper **Yi-Kuei Lin [3] **discusses about multicommodi- ty limited-flow network (MLFN) in which multicommod- ity are transmitted through unreliable nodes and arcs. Nevertheless, the system capacity is not suitable to be treated as the maximal sum of the commodity because each commodity consumes the capacity differently. In this paper, Yi-Kuei Lin defines the system capacity as a demand vector if the system fulfils at most such a de- mand vector. The main problem of this paper is to meas- ure the quality level of a MLFN. For this he proposes a new performance index, the probability that the upper bound of the system capacity equals the demand vector subject to the budget constraint, to evaluate the quality level of a MLFN. A branch-and-bound algorithm based on minimal cuts is also presented to generate all upper boundary points in order to compute the performance index.

In a computer network there are several reliabili- ty problems. The probabilistic events of interest are:

* Terminal-pair connectivity

* Tree (broadcast) connectivity

* Multi-terminal connectivity

These reliability problems depend on the net-

work topology, distribution of resources, operating envi-

ronment, and the probability of failures of computing

nodes and communication links. The computation of the

reliability measures for these events requires the enume-

ration of all simple paths between the chosen set of nodes. The complexity of these problems, therefore, increases very rapidly with network size and topological connectiv- ity. The reliability analysis of computer communication

A computer communication network (CCN) can

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 3

ISSN 2229-5518

GRID computing is a newly developed technology for complex systems with large-scale resource sharing, wide- area communication, and multi-institutional collabora- tion. This technology attracts much attention. Many ex- perts believe that the grid technologies will offer a second chance to fulfill the promises of the internet. The real, specific problem that underlies the Grid concept is coor- dinated resource sharing, and problem solving in dynam- ic, multi-institutional virtual organizations. Grid technol- ogy is a newly developed method for large-scale distri- buted systems. This technology allows effective distribu- tion of computational tasks among different resources presented in the grid.

which failure correlation in common a communication channel is taken into account. In this reliability optimiza- tion problem, the assessment on the grid service reliabili- ty & performance is a critical component in obtaining the objective function. However, due to the size & complexity of grid service systems, the existing models for distri- buted systems cannot be directly applied. Thus, a virtual tree structure model was developed as the basis of the optimization model and a genetic algorithm was adapted to solve this type of optimization problem for grid task partition and distribution. In this paper author studied a case considering different numbers of resources. The Ge- netic Algorithm proved to be effective in accommodating various conditions including limited or insufficient re- sources.

—Comparison of different resource management alterna- tives (subtask assignment to different resources),

—Making decisions aimed at service performance im-

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 4

ISSN 2229-5518

provement based on comparison of different grid struc- ture alternatives and

—Estimating the effect of reliability and performance var- iation of grid elements on service reliability and perfor- mance.

drawing inference about the reliability of a 1-component

system whose failure mechanism is simple stress-

strength. The Bayes estimator of system reliability is ob-

tained from data consisting of random samples from the

stress and strength distributions, assuming each one is

Weibull. The Bayes estimators of the four unknown shape

and scale parameters of stress and strength distributions

are also considered and these estimators are used in esti-

mating the system reliability. The priors of the parameters

of stress and strength distributions are assumed to be

independent. The Bayes credibility interval of the scale and shape parameters is derived using the joint posterior of the parameters.

Distributed Systems (DS) have become increasingly popular in recent years. The advent of VLSI technology and low-cost microprocessors has made distributed com- puting economically practical. Distributed systems can provide appreciable advantages including high perfor- mance, high reliability, resource sharing and extensibility. The potential reliability improvement of a distributed system is possible because of program and data-file re- dundancies. To evaluate the reliability of a distributed system, including a given distribution of programs and data-files, it is important to obtain a global reliability

Design of experiments is a useful tool for improving the quality and reliability of products. **Designed experiments **are widely used in industries for quality improvement. A designed experiment can also be used to efficiently search over a large factor space affecting the product’s perfor- mance, and identify their optimal settings in order to im- prove reliability. Several case studies are available in the literature.

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 5

ISSN 2229-5518

affecting the product are classified into two groups which led to a Brownian motion model for the degradation cha- racteristic. A simple optimization procedure for finding the best control factor setting is developed using an inte- grated loss function. In general, reliability improvement experiments are more difficult to conduct than the quality improvement experiments. This is mainly due to the diffi- culty of obtaining the data. Reliability can be defined as quality over time and therefore in reliability improvement experiments we need to study the performance of the product over time as opposed to just measuring the quali- ty at a fixed point of time. Two types of data are usually gathered in reliability experiments: lifetime data and de- gradation data. * Lifetime data gives the information about the time-to-failure of the product*. In the degradation da- ta,

• lexicographic order and

• an upper bound on the objective function,

for solving redundancy allocation problems in

coherent systems. Such problems generally belong to the

class of nonlinear integer programming problems with

separable constraints and non decreasing functions. For illustration, 3 types of problems are solved using his me- thod. A majority of problems concerning system reliabili- ty optimization are nonlinear programming problems involving integer variables. The solution methods for such problems can be categorized into:

i) Exact methods based on dynamic programming, impli- cit enumeration and branch-and-bound technique

ii) Approximate methods based on linear and nonlinear programming techniques,

iii) Heuristic methods which yield reasonably good solu- tions with little computation

Each category has both advantages and disadvantages. Due to the tremendous increase in the available compu- ting power, the exact solution deserves attention from researchers.

To derive an exact solution for a reliability opti- mization problem, dynamic programming can be used only for some particular structures of the objective func- tion and constraints. It is not useful for reliability optimi- zation of a general system, and its utility decreases with the number of constraints.

The Reliability of a Distributed Computing System is the probability that a distributed program which runs on multiple processing elements and needs to communicate with other processing elements for remote data files will be executed successfully. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication links, (3) the data files and program distribution among processing ele- ments, and (4) the data files required to execute a pro- gram. Thus, the problem of analyzing the reliability of a distributed computing system is more complicated than the K-terminal reliability problem. In his paper, Lin et al. [13] describe several reduction methods for computing the reliability of distributed computing systems. These reduction methods can dramatically reduce the size of a distributed computing system, and therefore speed up the reliability computation.

The reliability of a distributed computing system de- pends on the reliability of its communication links and nodes and on the distribution of its resources, such as programs and data files. Many algorithms have been proposed for computing the reliability of distributed computing systems, but they have been applied mostly to distributed computing systems with perfect nodes. How- ever, in real problems, nodes as well as links may fail. Min-Sheng Lin et al. [14, 15] propose in his paper, two new algorithms for computing the reliability of a distri- buted computing system with imperfect nodes. Algo- rithm I is based on a symbolic approach that includes two passes of computation. Algorithm II employs a general factoring technique on both nodes and edges. He also shows the Comparisons between both algorithms. It shows the usefulness of the proposed algorithms for computing the reliability of large distributed computing

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 6

ISSN 2229-5518

systems.

In his paper, Raghavendra, C. S. et al. [16] present a relia-

bility of a distributed processing system is an important

design parameter that can be described in terms of the

reliability of processing elements and communication

links and also of the redundancy of programs and data files. The traditional terminal-pair reliability does not cap- ture the redundancy of programs and files in a distri- buted system. Two reliability measures are introduced:

distributed program reliability, which describes the prob- ability of successful execution of a program requiring cooperation of several computers, and distributed system reliability, which is the probability that all the specified distributed programs for the system are operational. These two reliability measures can be extended to incor- porate the effects of user sites on reliability. An efficient approach based on graph traversal is developed to eva- luate the proposed reliability measures.

In his paper Deng-Jyi Chen et al. [17] Presents an al-

gorithm for computing the reliability of distributed com-

puting systems (DCS). The algorithm, called the Fast Re-

liability Evaluation Algorithm, is based on the factoring

theorem employing several reliability preserving reduc-

tion techniques. The effect of file distributions, program

distributions, and various topologies on reliability of the

DCS is studied in brief using the Fast Reliability Evalua-

tion algorithm. Compared with existing algorithms on

various network topologies, file distributions, and pro-

gram distributions, the proposed algorithm in his paper is

much more economical in both time and space.

In his paper, Chiu, Chin Ching et al. [18] describe

about Distributed system that provides a cost-effective

means of enhancing a computer system's performance in

areas such as throughput, fault-tolerance, and reliability

optimization. Consequently, the reliability optimization

of a distributed system has become a critical issue. A K-

terminal reliability is defined as the probability that a

specified set, K, of nodes is connected in a distributed

system. A K-terminal reliability optimization with an or-

der (the number of nodes in K-terminal) constraint prob-

lem is to select a K-terminal of nodes in a distributed sys-

tem such that the K-terminal reliability is maximal and possesses sufficient order. It is evident that this is an NP- hard problem. This paper presents a heuristic method to reduce the computational time and the absolute error

from the exact solution. The method proposed is based on not only a simple method to compute each node’s weight and each link’s weight, but also an effective objective function to evaluate the weight of node sets. Before ap- pending one node to a current selected set, instead of computing the weight of all links and all nodes of each set, only the weight of a node, which is adjacent to the current selected set, and links between the node and the current selected set are accumulated. Then the proposed algorithm depends on the maximum weight to find an adequate node and assign it to the current selected set in a sequential manner until the order of K-terminal con-

straint is satisfied. Reliability computation is performed only once, thereby saving much time and the absolute error of the proposed algorithm from exact solution is very small.

Distributed Computing Systems (DCS) have become a major trend in computer system design today, because of their high speed and reliable performance. Reliability is an important performance parameter in DCS design. In the reliability analysis of a DCS, the term of *K*-Node Re- liability (KNR) is defined as the probability that all nodes in *K *(a subset of all processing elements) are connected. In his paper, **Ruey-Shun Chen et al. ****[19, 20] **proposed a sim- ple, easily programmed heuristic method for obtaining the optimal design of a DCS in terms of maximizing relia- bility subject to a capacity constraint. The first half of his paper presents a heuristic algorithm which selects an op- timal set of *K*-nodes that maximizes the KNR in a DCS subject to the capacity constraint. The second half of the paper describes a new approach that uses a *K*-tree disjoint reduction method to speed up the KNR evaluation. On comparing with existing algorithms on various DCS to- pologies, he found that the proposed algorithm is a sub- optimal design much more efficiently in terms of both execution time and space than an exact and exhaustive method for a large DCS.

In his paper, **Ruey-Shun Chen et al. ****[21] **present a simple, easily programmed exact method for obtaining the optimal design of a distributed computing system in terms of maximizing reliability subject to memory capaci- ty constraints. he assume that a given amount of re- sources are available for linking the distributed compu- ting system. The method is based on the partial order re- lation. To speed up the procedure, some rules are pro- posed to indicate conditions under which certain vectors in the numerical ordering that do not satisfy the capacity constraints can be skipped over. Simulation results show that the proposed algorithm requires less time and space than exhaustive method.

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 7

ISSN 2229-5518

Developments and Improvements in information

& communication technologies in recent years have re-

sulted in increased capacities, and higher concentration of

traffic in telecommunication networks. Operating failures

in such high capacity networks can affect the quality of service of a large number of consumers. Consequently, the careful planning of a network’s infrastructure and the detailed analysis of its reliability become increasingly

important toward ensuring that consumers obtain the best service possible. One of the most basic, useful ap- proaches to network reliability analysis is to represent the network as an undirected graph with unreliable links. The reliability of the network is usually defined as the probability that certain nodes in the graph are connected by functioning links. **Dirk ****et al. ****[24] **is discusses this Network reliability Optimization with network *planning*, where the objective is to maximize the network’s reliabili- ty, subject to a fixed budget. They can develop a number of simulation techniques to address the network reliabili- ty estimation problem.

The behaviour of reliability is much more sensitive to change in different fields of engineering and sciences. Around twenty five papers from leading journals in dif- ferent fields of reliability have been covered in this paper. This paper is reviewed various aspects of reliability re- search in the field of **Nano-Technology, Computer Communication Network, Grid Computing System, Statistical Moments / Bayes Approach, Genetic Algo- rithm **etc. We have broken down our survey into the dif-

ferent topic such as **Distributed and Coherent Systems, Network Reliability Optimization, Tele- Communication / Neural Network **etc.

4 PROPOSED FUTURE WORK

The topic reliability remains an interesting challenge for future research. The possible direction is to investigate whether reliability ranking for improving the perfor- mance of the system algorithm. Also to find the better reliability system in the sense of Distributed computing System/ Network system it is necessary to improve the time complexity for such system. In the sence of nano- technology much work is needed in particular nano- reliability field to ensure the product reliability and safety is various use condition. Some other meta-heuristic ap- proaches besides the above may also be applicable such as Tabu-Search, Hybrid Optimization Technique and Ant Colony Optimization etc. Future aspects also exist in var- ious communication schemes in Wireless (COBRA: Common Object Request Broker Architecture). This is also easily extensible to generic wireless network system. From the point of view of quality management, treat the system reliability as a performance index, and conduct the sensitive analysis to improve the most important component (e.g. transmission line, switch or server) will increase the system reliability most significantly. Future research can extend the problem from the single com- modity case to the multicommodity case. Besides, trans- mission time reduction is a very important issue for an information system. Therefore researcher can extend the work to include the time attribute for each component. Reliability can also be extended for hybrid Fault-Tolerant embedded system architecture in the form of hybrid re- covery Block (RB). Future research will also improved the automated controller abilities, and the human machine interface, in order to increase the efficiency of the human reasoning assistance, and to decrease the human response time.

The author wish to thank Prof. P. N. Tondon for the support and the encouragement which he yields to them during their research work.

[1] Shuen-Lin Jeng, Jye-Chyi Lu, and Kaibo Wang, “A Review of Reliability Research on Nanotechnology”, IEEE Transactions on Reliability, Vol. 56, No. 3, Pg. 401-410, September 2007.

[2] J. Keller, A. Gollhardt, D. Vogel, and B. Michel, “Nanoscale Deformation Measurements for Reliability Analysis of Sen- sors,” presented at the Proceedings of the SPIE—The Interna- tional Society for Optical Engineering, 2005.

[3] Yi-Kuei Lin, “System Reliability of a Limited-Flow Network in

Multicommodity Case”, IEEE Transactions on Reliability, Vol.

56, No. 1, Pg. 17-25, March 2007.

IJSER © 2011 http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 8

ISSN 2229-5518

[4] Raghavendra, C. S., Kumar, V. K. P. and Hariri S., “Reliability analysis in Distributed systems”, IEEE Transactions on Com- puters, Volume 37, Issue 3, Pg. 352 – 358, March 1988.

[5] Kin-Sun-Wah and McAlister D.F, Reliability optimization of computer communication network, IEEE Trans. On Reliability, Vol.37, No. 2, Pp.275- 287 (1998) Dccember.

[6] Yuan-Shun Dai, “Optimal Resource Allocation for Maximizing Performance and Reliability in Tree-Structured Grid Services”, IEEE Transactions on Reliability, Vol. 56, No. 3, Pg. 444-453, September 2007.

[7] Gregory Levitin, Yuan-Shun Dai, and Hanoch Ben-Haim, “Re-

liability and Performance of Star Topology Grid Service With Precedence Constraints on Subtask Execution”, IEEE Transac- tions on Reliability, Vol. 55, No. 3, Pg. 507-515, September 2006.

[8] Gerard L. Reijns and Arjan J. C. Van Gemund, “Reliability Analysis of Hierarchical Systems Using Statistical Moments”, IEEE Transactions On Reliability, Vol. 56, No. 3, Pg 525-533, September 2007.

[9] Pandey, M. and Upadhayay S. K., “Reliability Estimation in

Stress- Strength Models: A Bayes Approach”, IEEE Transac- tions on Reliability, December 1985.

[10] Chin-Ching Chiu, Chung-Hsien Hsu, and Yi-Shiung Yeh, “A Genetic Algorithm for Reliability-Oriented Task Assignment With k Duplications in Distributed Systems”, IEEE Transactions On Reliability, Vol. 55, No. 1, Pg. 105-117, March 2006.

[11] V. Roshan Joseph and I-Tang Yu, “Reliability Improvement Experiments With Degradation Data”, IEEE Transactions On Reliability, Vol. 55, No. 1, Pg. 149-157, March 2006.

[12] V. Rajendra Prasad and Way Kuo, “Reliability Optimization of Coherent Systems”, IEEE Transactions On Reliability, Vol. 49, No. 3, Pg. 323-330, September 2000.

[13] M.-S. Lin and D.-J. Chen, “General Reduction Methods for the Reliability Analysis of Distributed Computing Systems”, The Computer Journal, Volume 36, Issue 7, Pages 631-644, 1993.

[14] Min-Sheng Lin, Deng-Jyi Chen and Maw-Sheng Horng, “The

Reliability Analysis of Distributed Computing Systems with Imperfect Nodes”, The Computer Journal, Volume 42, Issue 2, Pages 129-141, 1999.

[15] MIN-SHENG LIN AND DENG-JYI CHEN, “GENERAL REDUCTION METHODS FOR THE RELIABILITY ANALYSIS OF DISTRIBUTED COMPUTING SYSTEMS”, COMPUTER JOURNAL ISSN 00104620

CODEN CMPJA6, VOL. 36, NO. 7, PP. 631-644, 1993.

[16] Raghavendra, C. S. and Makam, S. V., “Reliability Modeling and Analysis of Computer Networks”, IEEE Transactions on Reliability, Vol. R-35, No. 2, Pg. 156-160, 1986 June.

[17] Deng-Jyi Chen and Min-Sheng Lin, “On Distributed Compu- ting Systems Reliability Analysis Under Program Execution Constraints”, IEEE Transactions on Computers, Volume 43, Is- sue 1, Pages 87-97, 1994.

[18] Chiu, Chin Ching , Yeh, Yi Shiung and Chou, Jue Sam, “An Effective Algorithm for Optimal K-terminal Reliability of Dis- tributed Systems”, Malaysian Journal of Library & Information Science, 6 (2), Pg. 101-118, 2001.

[19] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “A New Heuris-

tic Approach for Reliability Optimization of Distributed Com- puting Systems subject to Capacity Constraints”, Journal of Computers & Mathematics with Applications, Volume 29, Issue

3, Pages 37-47, Feb 1995.

[20] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “Reliability Optimization on the Design of Distributed Computing Systems”, Proceed of Int. Conf. on Computing and Information, Pages 422-437, 1994.

[21] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “Reliability Optimization of Distributed Computing Systems Subject to Capacity Constraints”, Computers & Mathematics with Applications, Volume 29, Issue 4, Pages 93-99,February 1995.

[22] Fulya Altiparmak, Berna Dengiz, and Alice E. Smith, “A General Neural Network Model for Estimating Telecommunications Network Reliability”, IEEE Transactions on Reliability, Vol. 58, No. 1, Pg. 2-9, March 2009.

[23] Debany, W. H. and Varshney, P. K., “Network Reliability Evalu- ation Using Probability Expressions”, IEEE Transactions On Re- liability, Vol. R-35, No. 2, Pg. 161-166, 1986 June.

[24] Dirk P. Kroese, Kin-Ping Hui, and Sho Nariai, “Network Relia-

bility Optimization via the Cross-Entropy Method”, IEEE Transactions on Reliability, Vol. 56, No. 2, Pg. 275-287, June

2007.

IJSER © 2011 http://www.ijser.org