New Encoding/Decoding Methods for Designing Fault-Tolerant Matrix Operations
D.L. Tao, C.R.P. Hartmann, and Yunghsing S.(Sam) Han
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 7, NO. 9, SEPTEMBER 1996, pp. 931-937
Abstract
Algorithm-based fault tolerance(ABFT) can provide a low-cost error protection for array processors and multiprocessor systems. Several techniques have been proposed to design fault-tolerant matrix operations. In this paper, new encoding/decoding methods are proposed for designing fault-tolerant matrix operations. The unique feature of these new methods is that only additions and subtractions are used in encoding/decoding.
The advantage of ABFT is that errors which are caused by permanent or transient failures in the system can be detected/corrected by using a very low overhead and at the original throughput.
During system normal operation, the probability of having more than two faults is negligible. Two new encoding/decoding algorithms to construct error detecting/correcting codes with the minimum Hamming distance 3 and 4 are respectively proposed in this paper.
In this section, we evaluate existing ABFT techniques which are used to perform fault tolerant matrix-vector multiplications. The following check matrix, H with dimension
For single error correction, i.e., the minimum Hamming distance is 3, the check matrix given has the following form:
1) Jou-Abraham’s scheme
let
2) Luk and Park’s scheme
let
3) A similar is proposed in [10].
Minimum Hamming distance 4 code is needed.
The overhead of encoding/decoding as given in the previous subsection is shown in Table 1.
The encoding/decoding methods proposed in this paper can be described by a check matrix
In this subsection, we introduce the Lengthened Hamming Codes (LHC) with the minimum Hamming distance 3. So it can correct all single errors. The Lengthened Hamming Codes (LHC) can be constructed by the check matrix
Procedure I
.Step1: Let denote the number of information (check) digits. For a given k, find the minimum r to satisfy:
Step2:
For the r found in Step1, find the minimum r’, whereStep3:
LetStep4: Let
Step5:
Step6: Let
Step7:
Example1.
Let us construct an
j ![]() |
j ![]() |
Elements in S ![]() |
2 2 3 2 2 2 |
0 0 0 1 1 1 |
1100,1010,1001, 0110,0101,0011, 0111,1011,1101,1110 011-1,101-1,110-1,11-10, 01-11,10-11,1-101,1-110, 0-111,-1011,-11-1,-1110. |
The element in
Elements in S ![]() |
Elements in S ![]() |
-1100,-1010,-1001 0-110,0-101,00-11 |
1-100,10-10,100-1 01-10,010-1,001-1 |
An
By using LHC, r additional check digits will be used. The relationship between k and r and the ratio between k and r in LHC are shown in Table2.
Table 2
The Relationship Between k and r in LHC
k |
5 ~10 |
11~36 |
37~116 |
117~358 |
r |
3 |
4 |
5 |
6 |
|
0.6~0.30 |
0.36~0.11 |
0.14~0.04 |
0.05~0.02 |
In this subsection, we construct the a code with the minimum Hamming distance 4, which can correct all single errors and detect all double errors. The SEC/DED can be constructed by the check matrix .
Procedure II.
Step1:
for a given k, find the smallest r to satisfy the following inequality:Step2: Let S be the set containing all r-digit vectors which have odd weight except three, provide that each vector contains exactly one -1.
Step3: Let
Step4:
Example2.
Let us construct an
The relationship between k and r and the ratio between k and r in this new code are shown in Table3.
Table 3
The Relationship Between k and r in the SEC/DED Code
k |
6~30 |
31~112 |
113~436 |
437~787 |
r |
6 |
7 |
8 |
9 |
|
1~0.20 |
0.26~0.06 |
0.07~0.02 |
0.02~0.01 |
where
Table 4
Overhead Comparisons when =3(single error correction) where r satisfies step1 of procedure I
encoding |
matrix-vector multiplication |
decoding |
total |
||
weighted check-sum |
multiplications |
k ![]() |
2k |
k |
k ![]() |
additions |
2k(k-1) |
2(k-1) |
2(k-1) |
2 k ![]() |
|
LHC |
multiplications |
0 |
r*k |
0 |
r*k |
additions |
c*r*k ![]() |
r(k-1) |
c*r*k |
r(c k |
Table 5
Overhead Comparisons when
encoding |
matrix-vector multiplication |
decoding |
total |
||
weighted check-sum |
multiplications |
2k ![]() |
3k |
2k |
2k ![]() |
additions |
3k(k-1) |
3(k-1) |
3(k-1) |
3 k ![]() |
|
SEC/DED |
multiplications |
0 |
r*k |
0 |
r*k |
additions |
r*k ![]() |
r(k-1) |
r*k |
r( k ![]() |
Created by : Chuang Yuh-Jue
Date : June 3, 1997