Smiley face

Data Download


Description

Real protein Sequences Dataset :

The data sets are obtained from the National Center for Biotechnology Information (NCBI) site.

Data format

The format of each data set is the FASTA format.

An example of FASTA format of human protein sequences "pdb|1HNL|A" is shown as follow.
In this example, pdb|1HNL|A is the protein name, and the next lines after ">" are the sequences of protein.


>"name" "other information"
sequence...
>pdb|1HNL|A Chain A, Human Lysozyme
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINSRYWCNDGKT
PGAVNAAHLSCSALLQDNIADAVACAKRVVRDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
The files


Download data set query size database size
download COVID-19 100 5231
download EUMAT 100 36398
download Human1 to Human5 50 3000
download Each identity pairs 1 1