A New Failure Detector to Detect Failures in a Distributed System [ ]


Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accu-rately, and scalably as possible, even in the face of unreliable message deliveries. Failure detector is a simulation application that is responsible for detection of node failures or crashes in a distributed system. It is impossible to distinguish with certainty a crashed process from a very slow process in a purely asynchronous distributed system. Some parameters are used to evaluate a Failure Detector such as complete, quick, accurate, and scalable even in the face of unreliable message deliveries. In contrast to previous failure detectors that have been used to circumvent impossibility results, the heartbeat failure detector is implementable, and its implementation does not use timeouts. Here we introduce a failure detector which is based on heartbeat message.