An Efficient Fault-Tolerant Routing Strategy for Tori and Meshes


M. E. Gómez
P. López
J. Duato


In massively parallel computing system, high performance interconnection networks
are decisive to get the maximum performance.
While routing is one of the most important design
issues of interconnection networks, fault-tolerance
is another issue of growing importance in these machines, since
the huge amount of hardware increases the probability of failure.
This paper
proposes a mechanism that
provides both, scalable routing and fault-tolerance, for commercial switches
to build direct regular topologies, which are the topologies used in large machines.
The mechanism is very flexible and the hardware required is not complex.
Furthermore, it allows a high number of faults having a minimal
effect on performance.


