Pseudorandom dataset :
We classify our string data by length of three strings and similarity. A csv file "table_r_m_n_a_s.csv" has 100 sets of string data with length of string T = "r",length of string A = "m", length of string B = "n", alphabet size = "a" and similarity of T, A and B = "s"(%). For example, A csv file "table_1000_200_800_1000_45.csv" has 100 sets of string data with length of string T = "1000",length of string A = "200", length of string B = "800", alphabet size = "1000" and similarity of T, A and B = "45"(%)
Real DNA Sequences dataset :
The experimental materials are 31 sets of real DNA sequences, coming from two yeast species Saccharomyces cerevisiae and Kluyveromyces waltii. Each file represents a string data. For example, A file "A004_II_9961-38682.txt" has a string data. A004_XXXXX, B004_XXXXX and T004_XXXXX are represent a set of real DNA sequences.
An example of a row in "table_10_4_6_20_30.csv" is shown as follows.
Here we use the numbers to represent the symbels. In this example, string T is 7 8 11 4 18 5 7 8 9 11, string A is "2 14 4 5", string B is "1 5 20 7 4 10", used symbels are in range [1, 20] and similarity of T, A and B = "30"(%).
Index,stringT,stringA,stringB |
---|
39,-1|7|8|11|4|18|5|7|8|9|11,-2|2|14|4|5,-3|1|5|20|7|4|10 |
Length of string T(r) | Length of string A(m) | Length of string B(n) | Alphabet size | |
---|---|---|---|---|
download | all | all | all | all |
download | Real DNA | Real DNA | Real DNA | Real DNA |
download | 1000 | 100 | 900 | 4 |
download | 1000 | 100 | 900 | 64 |
download | 1000 | 100 | 900 | 1000 |
download | 1000 | 200 | 800 | 4 |
download | 1000 | 200 | 800 | 64 |
download | 1000 | 200 | 800 | 1000 |
download | 1000 | 300 | 700 | 4 |
download | 1000 | 300 | 700 | 64 |
download | 1000 | 300 | 700 | 1000 |
download | 1000 | 400 | 600 | 4 |
download | 1000 | 400 | 600 | 64 |
download | 1000 | 400 | 600 | 1000 |
download | 1000 | 500 | 500 | 4 |
download | 1000 | 500 | 500 | 64 |
download | 1000 | 500 | 500 | 1000 |
download | 2000 | 200 | 1800 | 4 |
download | 2000 | 200 | 1800 | 64 |
download | 2000 | 200 | 1800 | 1000 |
download | 2000 | 400 | 1600 | 4 |
download | 2000 | 400 | 1600 | 64 |
download | 2000 | 400 | 1600 | 1000 |
download | 2000 | 600 | 1400 | 4 |
download | 2000 | 600 | 1400 | 64 |
download | 2000 | 600 | 1400 | 1000 |
download | 2000 | 800 | 1200 | 4 |
download | 2000 | 800 | 1200 | 64 |
download | 2000 | 800 | 1200 | 1000 |
download | 2000 | 1000 | 1000 | 4 |
download | 2000 | 1000 | 1000 | 64 |
download | 2000 | 1000 | 1000 | 1000 |
download | 5000 | 500 | 4500 | 4 |
download | 5000 | 500 | 4500 | 64 |
download | 5000 | 500 | 4500 | 1000 |
download | 5000 | 1000 | 4000 | 4 |
download | 5000 | 1000 | 4000 | 64 |
download | 5000 | 1000 | 4000 | 1000 |
download | 5000 | 1500 | 3500 | 4 |
download | 5000 | 1500 | 3500 | 64 |
download | 5000 | 1500 | 3500 | 1000 |
download | 5000 | 2000 | 3000 | 4 |
download | 5000 | 2000 | 3000 | 64 |
download | 5000 | 2000 | 3000 | 1000 |
download | 5000 | 2500 | 2500 | 4 |
download | 5000 | 2500 | 2500 | 64 |
download | 5000 | 2500 | 2500 | 1000 |
download | 10000 | 1000 | 9000 | 4 |
download | 10000 | 1000 | 9000 | 64 |
download | 10000 | 1000 | 9000 | 1000 |
download | 10000 | 2000 | 8000 | 4 |
download | 10000 | 2000 | 8000 | 64 |
download | 10000 | 2000 | 8000 | 1000 |
download | 10000 | 3000 | 7000 | 4 |
download | 10000 | 3000 | 7000 | 64 |
download | 10000 | 3000 | 7000 | 1000 |
download | 10000 | 4000 | 6000 | 4 |
download | 10000 | 4000 | 6000 | 64 |
download | 10000 | 4000 | 6000 | 1000 |
download | 10000 | 5000 | 5000 | 4 |
download | 10000 | 5000 | 5000 | 64 |
download | 10000 | 5000 | 5000 | 1000 |