International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 1

ISSN 2229-5518

Application of Reliability Analysis:

A Technical Survey

Dr. Anju Khandelwal

Abstract— The objective of this paper is to present a survey of recent research work of high quality that deal with reliability in different fields of engineering and physical sciences. This paper covers several important areas of reliability, significant research efforts being made all over the world. The survey provides insight into past, current and future trends of reliability in different fields of Engineering, Technology and medical sciences with applications with specific problems.

Index Terms— CCN, Coherent Systems, Distributed Computing Systems, Grid Computing, Nanotechnology, Network

Reliability, Reliability.


—————————— • ——————————
HIS Traditional system-reliability measures include reliability, availability, and interval availability. Re- liability is the probability that a system operates without interruption during an interval of interest under specified conditions. Reliability can be extended to in- clude several levels of system performance. A first per- formance-oriented extension of reliability is to replace a single acceptable-level-of-operation by a set of perfor- mance-levels. This approach is used for evaluating net- work performance and reliability. The performance-level is based on metrics derived from an application- dependent performance model. For example, the perfor- mance-level might be the rate of job completion, the re- sponse time, or the number of jobs completed in a given time-interval. Availability is the probability that the sys- tem is in an operational state at the time of interest. Availability can be computed by summing the state prob- abilities of the operational states. Reliability is the proba- bility that the system stays in an operational state throughout an interval. In a system without repair, relia- bility and availability are easily related. In a system with repair, if any repair transitions that leave failed states are deleted, making failure states absorbing states, reliability can be computed using the same methods as availability. Interval availability is the fraction of time the system spends in an operational state during an interval of inter- est. The mean interval availability can be computed by determining the mean time the system spends in opera- tional states. Mean interval availability is a cumulative measure that depends on the cumulative amount of time
spent in a state.
For example, how can traditional reliability assessment
techniques determine the dependability of manned space
vehicle designed to explore Mars, given that humanity
has yet to venture that far into space? How can one de-
termine the reliability of a nuclear weapon, given that the
world has in place test-ban treaties and international
agreements? And, finally, how can one decide which ar- tificial heart to place into a patient, given neither has ever been inside a human before? To resolve this dilemma, reliability must be: 1) reinterpreted, and then 2) quanti- fied. Using the scientific method, researchers use evi- dence to determine the probability of success or failure. Therefore, reliability can be seen as an image of probabili- ty. The redefined concept of reliability incorporates aux- iliary sources of data, such as expert knowledge, corpo- rate memory, and mathematical modeling and simula- tion. By combining both types of data, reliability assess- ment is ready to enter the 21st century. Thus, reliability is a quantified measure of uncertainty about a particular type of event (or events). Reliability can also be seen as a probability.


Reliability is a charged word guaranteed to get attention at its mere mention. Bringing with it a host of connota- tions, reliability, and in particular its appraisal faces a critical dilemma at the dawn of a new century. Tradition- al reliability assessment consists of various real-world assessments driven by the scientific method; i.e., conduct- ing extensive real-world tests over extensive time periods (often years) enabled scientists to determine a product’s reliability under a host of specific conditions. In this 21st century, humanity’s technology advances walk hand in hand with myriad testing constraints, such as political and societal principles, economic and time considerations, and lack of scientific and technology knowledge. Because of these constraints, the accuracy and efficiency of tradi- tional methods of reliability assessment become much more questionable. Applications are the important part of research. Any theory has importance, if it is useful and applicable. Many researchers are busy these days apply-

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 2

ISSN 2229-5518

ing concepts of Reliability in various fields of Engineering and Sciences. Some important applications are given here:

2.1 Nano-Technology

Nano-reliability measures the ability of a nano-scaled product to perform its intended functionality. At the nano scale, the physical, chemical, and biological properties of materials differ in fundamental, valuable ways from the properties of individual atoms, molecules, or bulk matter. Conventional reliability theories need to be restudied to be applied to Nano-Engineering. Research on Nano- Reliability is extremely important due to the fact that na- no-structure components account for a high proportion of costs, and serve critical roles in newly designed products. In this paper, Shuen-Lin Jeng et al.[1] introduces the con- cepts of reliability to nano-technology; and presents the work on identifying various physical failure mechanisms of nano-structured materials and devices during fabrica- tion process and operation. Modeling techniques of de- gradation, reliability functions and failure rates of nano- systems have also been discussed in this paper.

Engineer’s are required to help increase reliabili- ty, while maintaining effective production chedulesto produce current, and future electronics at the lowest possible cost. Without effective quality control, devices dependent on nanotechnology will experience high man- ufacturing costs, including transistors which could result in a disruption of the continually steady Moore’s law. Nano Technology can potentially transform civilization. Realization of this potential needs a fundamental under- standing of friction at the atomic scale. Furthermore, the tribological considerations of these systems are expected to be an integral aspect of the system design and will de- pend on the training of both existing and future scientists, and engineers in the nano scale. As nanotechnology is gradually being integrated in new product design, it is important to understand the mechanical and material properties for the sake of both scientific interest and engi- neering usefulness. The development of nanotechnology will lead to the introduction of new products to the pub- lic. In the modern large-scale manufacturing era, reliabili- ty issues have to be studied; and results incorporated into the design and manufacturing phases of new products. Measurement and evaluation of reliability of nano- devices is an important subject. New technology is devel- oped to support the achievement of this task. As noted by Keller, et al. [2], with ongoing miniaturization from MEMS towards NEMS, there is a need for new reliability concepts making use of meso-type (micro to nano) or ful- ly nano-mechanical approaches. Experimental verification will be the major method for uvalidating theoretical mod- els and simulation tools. Therefore, there is a need for developing measurement techniques which have capabili- ties of evaluating strain fields with very local (nano-scale) resolution.

2.2 Computer Communication Network

Network analysis is also an important approach to model real-world systems. System reliability and system unre- liability are two related performance indices useful to measure the quality level of a supply-demand system. For a binary-state network without flow, the system unrelia- bility is the probability that the system can not connect the source and the sink. Extending to a limited-flow net- work in the single-commodity case, the arc capacity is stochastic and the system capacity (i.e. the maximum flow) is not a fixed number. The system unreliability for (+ 1), the probability that the upper bound of the system capacity equals can be computed in terms of upper boun- dary points. An upper boundary point is the maximal system state such that the system fulfills the demand. In his paper Yi-Kuei Lin [3] discusses about multicommodi- ty limited-flow network (MLFN) in which multicommod- ity are transmitted through unreliable nodes and arcs. Nevertheless, the system capacity is not suitable to be treated as the maximal sum of the commodity because each commodity consumes the capacity differently. In this paper, Yi-Kuei Lin defines the system capacity as a demand vector if the system fulfils at most such a de- mand vector. The main problem of this paper is to meas- ure the quality level of a MLFN. For this he proposes a new performance index, the probability that the upper bound of the system capacity equals the demand vector subject to the budget constraint, to evaluate the quality level of a MLFN. A branch-and-bound algorithm based on minimal cuts is also presented to generate all upper boundary points in order to compute the performance index.
In a computer network there are several reliabili- ty problems. The probabilistic events of interest are:
* Terminal-pair connectivity
* Tree (broadcast) connectivity
* Multi-terminal connectivity
These reliability problems depend on the net-
work topology, distribution of resources, operating envi-
ronment, and the probability of failures of computing
nodes and communication links. The computation of the
reliability measures for these events requires the enume-
ration of all simple paths between the chosen set of nodes. The complexity of these problems, therefore, increases very rapidly with network size and topological connectiv- ity. The reliability analysis of computer communication

networks is generally based on Boolean algebra and probability theory. Raghavendra, et al. [4] discusses vari- ous reliability problems of computer networks including terminal-pair connectivity, tree connectivity, and multi- terminal connectivity. In his paper he also studies the dynamic computer network reliability by deriving time- dependent expressions for reliability measures assuming Markov behavior for failures and repairs. This allows computation of task and mission related measures such as mean time to first failure(MTFF) and mean time between failures (MTF).

A computer communication network (CCN) can

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 3

ISSN 2229-5518

be represented by a set of centers and a set of communica- tion links connecting centers which are up. The network can be represented mathematically by a graph with nodes representing centers and edges representing links. If links are simplex, then the graph is directed. Various methods have been derived to evaluate the system reliability and terminal reliability of a CCN. The methods usually in- volve the enumeration directly or indirectly of all the events that lead to successful communication between computer centers under consideration. However, little has been done in optimizing networks except for terminal reliability. This is because as the number of links increas- es, the number of possible assignments to links of the sys- tem grows faster than exponentially. Kiu Sun-wah et al. [5] present both mathematical and heuristic rules for op- timizing the system reliability of a CCN with a fixed to- pology when a set of reliabilities is given. The techniques can be used to predict the system reliability for alternative topologies if the network topology is not fixed. The eval- uation of the terminal reliability of a given computer communication network is a NP-hard problem. Hence, the problem of assigning reliabilities to links of a fixed computer communication network topology to optimize the system reliability is also NP-hard. Author develops a heuristic method to assign links to a given topology so that the system reliability of the network is near optimal. His method provides a way to assign reliability measures to the links of a network to increase overall reliability.

2.3 Grid Computing System

GRID computing is a newly developed technology for complex systems with large-scale resource sharing, wide- area communication, and multi-institutional collabora- tion. This technology attracts much attention. Many ex- perts believe that the grid technologies will offer a second chance to fulfill the promises of the internet. The real, specific problem that underlies the Grid concept is coor- dinated resource sharing, and problem solving in dynam- ic, multi-institutional virtual organizations. Grid technol- ogy is a newly developed method for large-scale distri- buted systems. This technology allows effective distribu- tion of computational tasks among different resources presented in the grid.

Yuan-Shun Dai [6], describe a grid computing systems in which the resource management systems (RMS) can divide service tasks into execution blocks (EB) and send these blocks to different resources. To provide a desired level of service reliability, the RMS can assign the same EB to several independent resources for parallel (redundant) execution. According to the optimal schedule for service task partition and distribution among re- sources, one can achieve the greatest possible expected service performance (i.e. least execution time), or reliabili- ty. For solving this optimization problem, the author sug- gests an algorithm that is based on graph theory, Baye- sian approach and the evolutionary optimization ap- proach. A virtual tree-structure model is constructed in

which failure correlation in common a communication channel is taken into account. In this reliability optimiza- tion problem, the assessment on the grid service reliabili- ty & performance is a critical component in obtaining the objective function. However, due to the size & complexity of grid service systems, the existing models for distri- buted systems cannot be directly applied. Thus, a virtual tree structure model was developed as the basis of the optimization model and a genetic algorithm was adapted to solve this type of optimization problem for grid task partition and distribution. In this paper author studied a case considering different numbers of resources. The Ge- netic Algorithm proved to be effective in accommodating various conditions including limited or insufficient re- sources.

Gregory Levitin et al. [7] in his paper described grid computing systems with star architectures in which the resource management system (RMS) divides service tasks into subtasks and sends the subtasks to different specialized resources for execution. To provide the de- sired level of service reliability, the resource management system (RMS) can assign the same subtasks to several independent resources for parallel execution. Some sub- tasks cannot be executed until they have received input data, which can be the result of other subtasks. This im- poses precedence constraints on the order of subtask ex- ecution. Also the service reliability and performance in- dices are introduced and a fast numerical algorithm for their evaluation given any subtask distribution is sug- gested. The sharing that we are concerned with is not primarily file exchange but rather direct access to com- puters, software, data, and other resources. This is re- quired by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering. This sharing is controlled by the Resource Management System (RMS). The Open Grid Services Architecture enables the integration of services and resources across distributed, heterogeneous, dynam- ic virtual organizations; and also provides users a plat- form to easily request grid services. A grid service is de- sired to execute a certain task under the control of the RMS. To provide the desired level of service reliability, the RMS can assign the same subtasks to several inde- pendent resources of the same type. To evaluate the qual- ity of service, its reliability and performance indices should be defined. Author considers the indices service reliability (probability that the service task is accom- plished within a specified time), and conditional expected system time; and presents the numerical algorithm for their evaluation for arbitrary subtask distribution in a given grid with a star architecture taking into account precedence constraints on the sequence of subtask execu- tion. Some of the very helpful practical applications are as

—Comparison of different resource management alterna- tives (subtask assignment to different resources),
—Making decisions aimed at service performance im-

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 4

ISSN 2229-5518

provement based on comparison of different grid struc- ture alternatives and
—Estimating the effect of reliability and performance var- iation of grid elements on service reliability and perfor- mance.

2.4 Statistical Moments / Bayes Approach

In many practical Engineering circumstances, systems reliability analysis is complicated by the fact that the fail- ure time distributions of the constituent subsystems can- not be accurately modeled by standard distributions. Ge- rard L. Reijns et al.[8] in their paper, discuss a low-cost, compositional approach based on the use of the first four statistical moments to characterize the failure time distri- butions of the constituent components, subsystems and top-level system. The approach is based on the use of Pearson Distributions as an intermediate analytical ve- hicle, in terms of which the constituent failure time distri- butions are approximated. The analysis technique is pre- sented for -out-of- systems with identical subsystems, series systems with different subsystems and systems exploiting standby redundancy. The technique consistent- ly exhibits very good accuracy (on average, much less than 1 percent error) at very modest computing cost. In his paper, he present a low-cost, compositional approach based on the use of independent of system size. In addi- tion, to improve their approach numeric implementation details have been outlined by him and a number of ex- ample applications from the aerospace domain have been presented in their paper.

Pandey et al. [9] provides a Bayes approach of

drawing inference about the reliability of a 1-component
system whose failure mechanism is simple stress-
strength. The Bayes estimator of system reliability is ob-
tained from data consisting of random samples from the
stress and strength distributions, assuming each one is
Weibull. The Bayes estimators of the four unknown shape
and scale parameters of stress and strength distributions
are also considered and these estimators are used in esti-
mating the system reliability. The priors of the parameters
of stress and strength distributions are assumed to be
independent. The Bayes credibility interval of the scale and shape parameters is derived using the joint posterior of the parameters.

2.5 Genetic Algorithm

Distributed Systems (DS) have become increasingly popular in recent years. The advent of VLSI technology and low-cost microprocessors has made distributed com- puting economically practical. Distributed systems can provide appreciable advantages including high perfor- mance, high reliability, resource sharing and extensibility. The potential reliability improvement of a distributed system is possible because of program and data-file re- dundancies. To evaluate the reliability of a distributed system, including a given distribution of programs and data-files, it is important to obtain a global reliability

measure that describes the degree of system reliability. Distributed program reliability (DPR) is the probability that a given program can be run successfully and will be able to access all of the files it requires from remote sites in spite of faults occurring among the processing ele- ments & communication links. The second measure, dis- tributed system reliability (DSR), is defined as the proba- bility that all of the programs in the system can be run successfully. A distributed system is a collection of pro- cessor-memory pairs connected by communication links. The reliability of a distributed system can be expressed using the distributed program reliability and distributed system reliability analysis. The computing reliability of a distributed system is an NP-hard problem. The distribu- tion of programs and data-files can affect the system re- liability. The reliability-oriented task assignment prob- lem, which is NP-hard, is to find a task distribution such that the program reliability or system reliability is max- imized. For example, efficient allocation of channels to the different cells can greatly improve the overall network throughput, in terms of the number of calls successfully supported. Chin-Ching Chiu et al.[10] presents a genetic algorithm-based reliability-oriented task assignment me- thodology (GAROTA) for computing the DTA reliability problem. The proposed algorithm uses a genetic algo- rithm to select a program and file assignment set that is maximal, or nearly maximal with respect to system relia- bility. Their numerical results show that the proposed algorithm may obtain the exact solution in most cases, and the computation time seems to be significantly short- er than that needed for the exhaustive method. The tech- nique presented in his paper would be helpful for readers to understand the correlation between task assignment reliability and distributed system topology. A distributed system is defined as a system involving cooperation among several loosely coupled computers (processing elements).The system communicates (by links) over a network. A distributed program is defined as a program of some distributed system which requires one or more files. For a successful distributed program, the local host possesses the program, the processing elements possess the required files and the interconnecting links must be operational.

2.6 Designed Experiments

Design of experiments is a useful tool for improving the quality and reliability of products. Designed experiments are widely used in industries for quality improvement. A designed experiment can also be used to efficiently search over a large factor space affecting the product’s perfor- mance, and identify their optimal settings in order to im- prove reliability. Several case studies are available in the literature.

V. Roshan Joseph et al.[11] presents the devel- opment of a integrated methodology for quality and re- liability improvement when degradation data are availa- ble as the response in the experiments. The noise factors

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 5

ISSN 2229-5518

affecting the product are classified into two groups which led to a Brownian motion model for the degradation cha- racteristic. A simple optimization procedure for finding the best control factor setting is developed using an inte- grated loss function. In general, reliability improvement experiments are more difficult to conduct than the quality improvement experiments. This is mainly due to the diffi- culty of obtaining the data. Reliability can be defined as quality over time and therefore in reliability improvement experiments we need to study the performance of the product over time as opposed to just measuring the quali- ty at a fixed point of time. Two types of data are usually gathered in reliability experiments: lifetime data and de- gradation data. Lifetime data gives the information about the time-to-failure of the product. In the degradation da- ta, a degradation characteristic is monitored throughout the life of the product. Thus, they provide the complete history of the product’s performance in contrast to a sin- gle value reported in the lifetime data. Therefore, the de- gradation data contain more information than the lifetime data. There are some similarities between reliability im- provement, and quality improvement. Generally speak- ing, improving the quality will also improve the reliabili- ty. But this may not be true always. For example, suppose in a printed circuit board (PCB) manufacturing industry, tin plating is a more stable process than gold plating. Therefore in terms of improving quality the industry should prefer tin plating compared to gold plating be- cause a better platted thickness can be achieved using tin plating. On the other hand, during customer usage, the tin will wear out faster than gold and therefore the gold- plated PCB will have higher reliability. Therefore, gold plating should be preferred for improving reliability. Thus, the choice that is good for quality need not always be good for reliability. Because of this reason, the author should find the procedure for the optimal setting of the factors considering both quality and reliability and also interaction between them.

2.7 Distributed and Coherent Systems

A Design engineer often tries to improve system reliabili- ty with a basic design to the largest extent possible subject to several constraints such as cost, weight and volume. The system reliability can be improved either by using more reliable components or by providing redundant components. If for each stage of the system, several com- ponents of different reliabilities and costs are available in the market or redundancy is allowed, then the designer faces a decision exercise which can be formulated as a nonlinear integer programming problem. V. Rajendra Prasad et al. [12] in their paper deals with a search me- thod based on:

• lexicographic order and
• an upper bound on the objective function,
for solving redundancy allocation problems in
coherent systems. Such problems generally belong to the
class of nonlinear integer programming problems with
separable constraints and non decreasing functions. For illustration, 3 types of problems are solved using his me- thod. A majority of problems concerning system reliabili- ty optimization are nonlinear programming problems involving integer variables. The solution methods for such problems can be categorized into:
i) Exact methods based on dynamic programming, impli- cit enumeration and branch-and-bound technique
ii) Approximate methods based on linear and nonlinear programming techniques,
iii) Heuristic methods which yield reasonably good solu- tions with little computation
Each category has both advantages and disadvantages. Due to the tremendous increase in the available compu- ting power, the exact solution deserves attention from researchers.
To derive an exact solution for a reliability opti- mization problem, dynamic programming can be used only for some particular structures of the objective func- tion and constraints. It is not useful for reliability optimi- zation of a general system, and its utility decreases with the number of constraints.
The Reliability of a Distributed Computing System is the probability that a distributed program which runs on multiple processing elements and needs to communicate with other processing elements for remote data files will be executed successfully. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication links, (3) the data files and program distribution among processing ele- ments, and (4) the data files required to execute a pro- gram. Thus, the problem of analyzing the reliability of a distributed computing system is more complicated than the K-terminal reliability problem. In his paper, Lin et al. [13] describe several reduction methods for computing the reliability of distributed computing systems. These reduction methods can dramatically reduce the size of a distributed computing system, and therefore speed up the reliability computation.
The reliability of a distributed computing system de- pends on the reliability of its communication links and nodes and on the distribution of its resources, such as programs and data files. Many algorithms have been proposed for computing the reliability of distributed computing systems, but they have been applied mostly to distributed computing systems with perfect nodes. How- ever, in real problems, nodes as well as links may fail. Min-Sheng Lin et al. [14, 15] propose in his paper, two new algorithms for computing the reliability of a distri- buted computing system with imperfect nodes. Algo- rithm I is based on a symbolic approach that includes two passes of computation. Algorithm II employs a general factoring technique on both nodes and edges. He also shows the Comparisons between both algorithms. It shows the usefulness of the proposed algorithms for computing the reliability of large distributed computing

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 6

ISSN 2229-5518

In his paper, Raghavendra, C. S. et al. [16] present a relia-
bility of a distributed processing system is an important
design parameter that can be described in terms of the
reliability of processing elements and communication
links and also of the redundancy of programs and data files. The traditional terminal-pair reliability does not cap- ture the redundancy of programs and files in a distri- buted system. Two reliability measures are introduced:
distributed program reliability, which describes the prob- ability of successful execution of a program requiring cooperation of several computers, and distributed system reliability, which is the probability that all the specified distributed programs for the system are operational. These two reliability measures can be extended to incor- porate the effects of user sites on reliability. An efficient approach based on graph traversal is developed to eva- luate the proposed reliability measures.
In his paper Deng-Jyi Chen et al. [17] Presents an al-
gorithm for computing the reliability of distributed com-
puting systems (DCS). The algorithm, called the Fast Re-
liability Evaluation Algorithm, is based on the factoring
theorem employing several reliability preserving reduc-
tion techniques. The effect of file distributions, program
distributions, and various topologies on reliability of the
DCS is studied in brief using the Fast Reliability Evalua-
tion algorithm. Compared with existing algorithms on
various network topologies, file distributions, and pro-
gram distributions, the proposed algorithm in his paper is
much more economical in both time and space.
In his paper, Chiu, Chin Ching et al. [18] describe
about Distributed system that provides a cost-effective
means of enhancing a computer system's performance in
areas such as throughput, fault-tolerance, and reliability
optimization. Consequently, the reliability optimization
of a distributed system has become a critical issue. A K-
terminal reliability is defined as the probability that a
specified set, K, of nodes is connected in a distributed
system. A K-terminal reliability optimization with an or-
der (the number of nodes in K-terminal) constraint prob-
lem is to select a K-terminal of nodes in a distributed sys-
tem such that the K-terminal reliability is maximal and possesses sufficient order. It is evident that this is an NP- hard problem. This paper presents a heuristic method to reduce the computational time and the absolute error
from the exact solution. The method proposed is based on not only a simple method to compute each node’s weight and each link’s weight, but also an effective objective function to evaluate the weight of node sets. Before ap- pending one node to a current selected set, instead of computing the weight of all links and all nodes of each set, only the weight of a node, which is adjacent to the current selected set, and links between the node and the current selected set are accumulated. Then the proposed algorithm depends on the maximum weight to find an adequate node and assign it to the current selected set in a sequential manner until the order of K-terminal con-
straint is satisfied. Reliability computation is performed only once, thereby saving much time and the absolute error of the proposed algorithm from exact solution is very small.

2.8 Heuristic and General Reduction Methods

Distributed Computing Systems (DCS) have become a major trend in computer system design today, because of their high speed and reliable performance. Reliability is an important performance parameter in DCS design. In the reliability analysis of a DCS, the term of K-Node Re- liability (KNR) is defined as the probability that all nodes in K (a subset of all processing elements) are connected. In his paper, Ruey-Shun Chen et al. [19, 20] proposed a sim- ple, easily programmed heuristic method for obtaining the optimal design of a DCS in terms of maximizing relia- bility subject to a capacity constraint. The first half of his paper presents a heuristic algorithm which selects an op- timal set of K-nodes that maximizes the KNR in a DCS subject to the capacity constraint. The second half of the paper describes a new approach that uses a K-tree disjoint reduction method to speed up the KNR evaluation. On comparing with existing algorithms on various DCS to- pologies, he found that the proposed algorithm is a sub- optimal design much more efficiently in terms of both execution time and space than an exact and exhaustive method for a large DCS.
In his paper, Ruey-Shun Chen et al. [21] present a simple, easily programmed exact method for obtaining the optimal design of a distributed computing system in terms of maximizing reliability subject to memory capaci- ty constraints. he assume that a given amount of re- sources are available for linking the distributed compu- ting system. The method is based on the partial order re- lation. To speed up the procedure, some rules are pro- posed to indicate conditions under which certain vectors in the numerical ordering that do not satisfy the capacity constraints can be skipped over. Simulation results show that the proposed algorithm requires less time and space than exhaustive method.

2.9 Tele-Communication / Neural Network

In Studies on the design of communications networks, reliability has been defined in a number of ways. Fulya Altiparmak et al. [19], discusses about a probabilistic measure i.e. all-terminal reliability, is considered (this is sometimes termed overall network reliability). All- terminal reliability is the probability that a set of opera- tional edges provides communication paths between every pair of nodes. A communications network is typi- cally modeled as a graph with N-nodes, and L- edges; nodes represent sites (computers), and edges represent communication links. Each node, and each edge has an associated probability of failure, and the reliability of the network is the probability that the network is operational.

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 7

ISSN 2229-5518

In his paper, Fulya Altiparmak et al. [19] propose a new method, based on an artificial neural network (ANN), to estimate the reliability of networks with identical link reliability. There are two significant advantages to this. The first is that a single ANN model can be used for mul- tiple network sizes and topologies. The second advantage is that the input information to the ANN is compact, which makes the method tractable, even for large sized networks. We use the approach for networks of widely varying reliability, and then consider only highly reliable networks.

2.10 Network Reliability Optimization

Evaluation of the reliability of a network is a fundamental problem. It has application in many practical fields such as communication, digital systems, and transportation systems. The physical network is represented by a graph composed of nodes connected by directed and undirected arcs. Associated with each arc and with each node of a graph is a failure probability. Debany, W. H. et al. [23] provide a graph with known failure probabilities of its elements (arcs and nodes) the objective is to find the probability that at least one complete simple path (no node is visited more than once) exists between a source and a terminal node.

Developments and Improvements in information
& communication technologies in recent years have re-
sulted in increased capacities, and higher concentration of
traffic in telecommunication networks. Operating failures
in such high capacity networks can affect the quality of service of a large number of consumers. Consequently, the careful planning of a network’s infrastructure and the detailed analysis of its reliability become increasingly
important toward ensuring that consumers obtain the best service possible. One of the most basic, useful ap- proaches to network reliability analysis is to represent the network as an undirected graph with unreliable links. The reliability of the network is usually defined as the probability that certain nodes in the graph are connected by functioning links. Dirk et al. [24] is discusses this Network reliability Optimization with network planning, where the objective is to maximize the network’s reliabili- ty, subject to a fixed budget. They can develop a number of simulation techniques to address the network reliabili- ty estimation problem.


The behaviour of reliability is much more sensitive to change in different fields of engineering and sciences. Around twenty five papers from leading journals in dif- ferent fields of reliability have been covered in this paper. This paper is reviewed various aspects of reliability re- search in the field of Nano-Technology, Computer Communication Network, Grid Computing System, Statistical Moments / Bayes Approach, Genetic Algo- rithm etc. We have broken down our survey into the dif-
ferent topic such as Distributed and Coherent Systems, Network Reliability Optimization, Tele- Communication / Neural Network etc.


The topic reliability remains an interesting challenge for future research. The possible direction is to investigate whether reliability ranking for improving the perfor- mance of the system algorithm. Also to find the better reliability system in the sense of Distributed computing System/ Network system it is necessary to improve the time complexity for such system. In the sence of nano- technology much work is needed in particular nano- reliability field to ensure the product reliability and safety is various use condition. Some other meta-heuristic ap- proaches besides the above may also be applicable such as Tabu-Search, Hybrid Optimization Technique and Ant Colony Optimization etc. Future aspects also exist in var- ious communication schemes in Wireless (COBRA: Common Object Request Broker Architecture). This is also easily extensible to generic wireless network system. From the point of view of quality management, treat the system reliability as a performance index, and conduct the sensitive analysis to improve the most important component (e.g. transmission line, switch or server) will increase the system reliability most significantly. Future research can extend the problem from the single com- modity case to the multicommodity case. Besides, trans- mission time reduction is a very important issue for an information system. Therefore researcher can extend the work to include the time attribute for each component. Reliability can also be extended for hybrid Fault-Tolerant embedded system architecture in the form of hybrid re- covery Block (RB). Future research will also improved the automated controller abilities, and the human machine interface, in order to increase the efficiency of the human reasoning assistance, and to decrease the human response time.


The author wish to thank Prof. P. N. Tondon for the support and the encouragement which he yields to them during their research work.


[1] Shuen-Lin Jeng, Jye-Chyi Lu, and Kaibo Wang, “A Review of Reliability Research on Nanotechnology”, IEEE Transactions on Reliability, Vol. 56, No. 3, Pg. 401-410, September 2007.

[2] J. Keller, A. Gollhardt, D. Vogel, and B. Michel, “Nanoscale Deformation Measurements for Reliability Analysis of Sen- sors,” presented at the Proceedings of the SPIE—The Interna- tional Society for Optical Engineering, 2005.

[3] Yi-Kuei Lin, “System Reliability of a Limited-Flow Network in

Multicommodity Case”, IEEE Transactions on Reliability, Vol.

56, No. 1, Pg. 17-25, March 2007.

IJSER © 2011

International Journal of Scientific & Engineering Research Volume 2, Issue 4, April-2011 8

ISSN 2229-5518

[4] Raghavendra, C. S., Kumar, V. K. P. and Hariri S., “Reliability analysis in Distributed systems”, IEEE Transactions on Com- puters, Volume 37, Issue 3, Pg. 352 – 358, March 1988.

[5] Kin-Sun-Wah and McAlister D.F, Reliability optimization of computer communication network, IEEE Trans. On Reliability, Vol.37, No. 2, Pp.275- 287 (1998) Dccember.

[6] Yuan-Shun Dai, “Optimal Resource Allocation for Maximizing Performance and Reliability in Tree-Structured Grid Services”, IEEE Transactions on Reliability, Vol. 56, No. 3, Pg. 444-453, September 2007.

[7] Gregory Levitin, Yuan-Shun Dai, and Hanoch Ben-Haim, “Re-

liability and Performance of Star Topology Grid Service With Precedence Constraints on Subtask Execution”, IEEE Transac- tions on Reliability, Vol. 55, No. 3, Pg. 507-515, September 2006.

[8] Gerard L. Reijns and Arjan J. C. Van Gemund, “Reliability Analysis of Hierarchical Systems Using Statistical Moments”, IEEE Transactions On Reliability, Vol. 56, No. 3, Pg 525-533, September 2007.

[9] Pandey, M. and Upadhayay S. K., “Reliability Estimation in

Stress- Strength Models: A Bayes Approach”, IEEE Transac- tions on Reliability, December 1985.

[10] Chin-Ching Chiu, Chung-Hsien Hsu, and Yi-Shiung Yeh, “A Genetic Algorithm for Reliability-Oriented Task Assignment With k Duplications in Distributed Systems”, IEEE Transactions On Reliability, Vol. 55, No. 1, Pg. 105-117, March 2006.

[11] V. Roshan Joseph and I-Tang Yu, “Reliability Improvement Experiments With Degradation Data”, IEEE Transactions On Reliability, Vol. 55, No. 1, Pg. 149-157, March 2006.

[12] V. Rajendra Prasad and Way Kuo, “Reliability Optimization of Coherent Systems”, IEEE Transactions On Reliability, Vol. 49, No. 3, Pg. 323-330, September 2000.

[13] M.-S. Lin and D.-J. Chen, “General Reduction Methods for the Reliability Analysis of Distributed Computing Systems”, The Computer Journal, Volume 36, Issue 7, Pages 631-644, 1993.

[14] Min-Sheng Lin, Deng-Jyi Chen and Maw-Sheng Horng, “The

Reliability Analysis of Distributed Computing Systems with Imperfect Nodes”, The Computer Journal, Volume 42, Issue 2, Pages 129-141, 1999.


CODEN CMPJA6, VOL. 36, NO. 7, PP. 631-644, 1993.

[16] Raghavendra, C. S. and Makam, S. V., “Reliability Modeling and Analysis of Computer Networks”, IEEE Transactions on Reliability, Vol. R-35, No. 2, Pg. 156-160, 1986 June.

[17] Deng-Jyi Chen and Min-Sheng Lin, “On Distributed Compu- ting Systems Reliability Analysis Under Program Execution Constraints”, IEEE Transactions on Computers, Volume 43, Is- sue 1, Pages 87-97, 1994.

[18] Chiu, Chin Ching , Yeh, Yi Shiung and Chou, Jue Sam, “An Effective Algorithm for Optimal K-terminal Reliability of Dis- tributed Systems”, Malaysian Journal of Library & Information Science, 6 (2), Pg. 101-118, 2001.

[19] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “A New Heuris-

tic Approach for Reliability Optimization of Distributed Com- puting Systems subject to Capacity Constraints”, Journal of Computers & Mathematics with Applications, Volume 29, Issue

3, Pages 37-47, Feb 1995.

[20] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “Reliability Optimization on the Design of Distributed Computing Systems”, Proceed of Int. Conf. on Computing and Information, Pages 422-437, 1994.

[21] Ruey-Shun Chen, Deng-Jyi Chen and Y. S. Yeh, “Reliability Optimization of Distributed Computing Systems Subject to Capacity Constraints”, Computers & Mathematics with Applications, Volume 29, Issue 4, Pages 93-99,February 1995.

[22] Fulya Altiparmak, Berna Dengiz, and Alice E. Smith, “A General Neural Network Model for Estimating Telecommunications Network Reliability”, IEEE Transactions on Reliability, Vol. 58, No. 1, Pg. 2-9, March 2009.

[23] Debany, W. H. and Varshney, P. K., “Network Reliability Evalu- ation Using Probability Expressions”, IEEE Transactions On Re- liability, Vol. R-35, No. 2, Pg. 161-166, 1986 June.

[24] Dirk P. Kroese, Kin-Ping Hui, and Sho Nariai, “Network Relia-

bility Optimization via the Cross-Entropy Method”, IEEE Transactions on Reliability, Vol. 56, No. 2, Pg. 275-287, June


About the Author: Dr. Anju Khandelwal is an as- sistant professor and HOD-Dean Academics of S.R.M.S. Women’s College of Engg. & Tech. Bareilly (U.P.) Affiliated by G. B. T. U. Lucknow-India. She received her Bachelor degrees in Mathematics and Physics from Bundelkhand University Jhansi (U.P.), India in 1996, and Master degree in Mathematics with Computer Application from Bundelkhand University Jhansi (U.P.), India in 1998. She then joined the Bun- delkhand University Jhansi as a lecturer under Self finance scheme. She completed his PhD degree in Operations Research from Gurukula kangri Universi- ty Hardwar (Uttaranchal) in 2006. She received Mas- ter degrees in technical field that is M. Tech (Software Engineering) in 2010 from U. P. T. U. Lucknow (U.P.), Her areas of interest include parallel and distributed systems, optimization techniques, CCN and reliabili- ty Analysis. She is a life member of the IAPS.

IJSER © 2011