Smiley face

Experimental Results


Description

To illustrate how the real execution time is affected by similarity, length and alphabet size of input sequences, we perform some simulation experiments on random data. We implemented some of the algorithms by C++ on Code:Block 12.3 software with GCC 4.8.1 compiler and tested on a computer with 64-bit Windows 7 operating system, CPU clock rate of 3.60GHZ (Intel(R) Core(TM) i7-4790) and 4 GB of RAM.

Here, the experiment demonstration involves three edit distance algorithms, Wagn, Myers and Wu , and involve LCS algorithms HirsA, HirsB, HirsC, Hunt and Alli. In all experiments, we set the cost of each character insertion, deletion and replacement to 1, 1 and 2, respectively, for the edit distance algorithm.

For consistent presentation of our experimental results, we define a 4-tuple T(m, n, |Σ|, alg), where |A| = m , |B| = n, n/m = {1, 2, 5, 10} (the ratio), |Σ|= {2, 4, 20, 64, 256, 1000} and alg is the algorithm name. For example, T(500, 2, *, Myers) is used to illustrate the experimental results of algorithm Myers, with m = 500, n = 500*2 = 600 and |Σ|= {2, 4, 20, 64, 256, 1000}.

The performance of the Alli algorithm is outstanding in general cases, since it is a parallel method with the bit-vector string representation. The Wu algorithm performs very well when the similarity is high. The gap between the Myers and Wu algorithms becomes large when n=m is large, because more insertions are required in Myers with large n=m. HirsC is worse than HirsA and HirsA, since some extra cost is required in the divide-and-conquer process of HirsC. Hunt do not performs well when the similarity is low as we think, since low similarity in our definition is not equivalent to a small number of matched pairs. For example, in our data set, the average difference in the number of matches with Similarity(A,B) = 0.55 and Similarity(A,B) = 0.95 is 4%, where n = m = 100, |Σ| = 4.

The selected algorithms

Name year Author(s) Time complexity Keywords
Edit distance algorithms
Wagn 1974 Wagner and Fischer O(mn) DP
Myers 1986 Myers O(nd) shortest edit script, contour
Wu 1990 Wu et al. O(np) shortest edit script, contour
LCS algorithms
HirsA 1975 Hirschberg O(mn) DP
HirsB 1975 Hirschberg O(mn) linear space
HirsC 1975 Hirschberg O(mn) divide and conquer
Hunt 1977 Hunt and Szymanski O((r+n)log n) match list, binary search
Alli 1986 Allison and Dix O(mn/w) bit string, parallel
The results of single algrithm with different alphabet szie

Each figure shows the execution time of an algrithm with length of string A(m) =500, length of string B(n)=500 and alphabet size = {2,4,20,64,256,1000}.


Wagn algorithm with m=500, n=500 Myers algorithm with m=500, n=500 Wu algorithm with m=500, n=500

The results of single algrithm with different length

Each figure shows the execution time of an algrithm with length of string A(m) =500, length of string B(n)={500,1000,2500,5000} and alphabet size = 1000.

Wagn algorithm with m=500, alphabet size=1000 Myers algorithm with m=500, alphabet size=1000 Wu algorithm with m=500, alphabet size=1000
The comparasion results

Each figure shows the execution time of all algrithms of length of string A(m) =500 and alphabet size = 1000 with one of length of string B(n)={500,1000,2500,5000}.

all algorithms with m=500, n=500, alphabet size=1000 all algorithms with m=500, n=1000, alphabet size=1000 all algorithms with m=500, n=2500, alphabet size=1000 all algorithms with m=500, n=5000, alphabet size=1000