With the advent of next generation sequencing in 2008, the volume of biological data is growing exponentially. The sequence data are retained by NCBI Sequence Read Archive (SRA) at National Institutes of Health’s (NIH), EBI Sequence Read Archive (ERA) at European Bioinformatics Institute (EBI) and DDBJ Sequence Read Archive (DRA) at DNA Data Bank of Japan (DDJB) in compressed formats. As of 2021, each institute harbors more than 15 petabytes of compressed raw data in form of SRA. The storage of such a huge amount of data is a challenge for the institutes.
Only a handful of groups are able to explore such a huge amount of data as it requires huge infrastructures.
NepSeq algorithm, reduces the data volume of specific type of the data thereby decreasing the storage of the data and making the downstream analysis extremely rapid.
Please understand that due to possible intellectual property right, we are not able to disclose the algorithm in details.
