microRNAs (miRNA) are class of small noncoding functional RNAs. The regulatory role of miRNA has been studied in a wide variety of biological processes. Evidences suggesting roles of variation in difference in gene regulation have been shown by many groups including Hu et al. (2008), Jazdzewski et al. (2008), Frazer et al. (2009), Jazdzewski et al. (2009), Tian et al. (2009), Hu et al. (2009), Yang et al. (2009), Mencia et al. (2009) and Sun et al. (2009). Variation or Single Nucleotide Polymorphisms (SNPs) in small RNA can lead to severe defects to the functions of small RNA and might result in diseases.
There have been some effort to understand and analyze variations in miRNA. With the sequence of human genome sequencing projects, variants were curated, which were seen at the genomic level.
There are several studies which tried to identify variations in small RNAs. Bhartiya et al. (2011) identified 106 SNPs. Leu et al. (2012) identified 594 SNPs in 169 mature miRNA and 54 in seed region. Zorc et al. (2012) identified 129 SNPs in the seed region. Gong et al. (2012) identified 757 SNPs in 440 pre-miRNAs. Han et. al. (2013) was able to identify a total of 1899 SNPs in 961 pre-miRNAs , 601 SNPs in mature miRNA and 203 SNPs in seed region (2-8 nt from 5′ end).
NepSeqVar is a collection of all variants of Human small RNAs at transcription level. It was generated by analyzing thousands of small RNAseq human samples, mapping the sequences. Samtools‘s mpileup was used to generate .vcf file from the mapped file. We were able to identify up-to mpileup vcf file with 24389 of variants of mature small RNA variants and 41362 variants in pre-small RNAs, which strongly suggests that the sequence of small RNA are not conserved. This implies the previous studies could reveal only the tip of iceberg.

Fig. 1: Comparison of global Human small RNA variants with NepSeqVar and other studies from Han et al. (2013), Gong et al. (2012), Zorc et al. (2012), Lu et al. (2012) and Bharitiya et al. (2011).
Human Genomic Region | NepSeqVar | NepSeq-AS + | NepSeq-AS– |
---|---|---|---|
region | 388224 | 230309 | 39147 |
gene | 291524 | 138656 | 23507 |
exon | 212864 | 22095 | 3467 |
mRNA | 133058 | 110508 | 18713 |
match | 120015 | 66763 | 11552 |
lnc_RNA | 75580 | 34784 | 5971 |
primary_transcript | 65024 | 29 | 4 |
miRNA | 64659 | 18 | 4 |
rRNA | 63915 | 102 | 1 |
transcript | 35105 | 27971 | 4831 |
tRNA | 29539 | 3 | 0 |
biological_region | 22195 | 8685 | 1473 |
sequence_feature | 21447 | 4317 | 640 |
enhancer | 19303 | 8076 | 1428 |
pseudogene | 14947 | 2161 | 325 |
CDS | 13980 | 7712 | 1270 |
snoRNA | 11830 | 9 | 1 |
snRNA | 2226 | 3 | 1 |
silencer | 2093 | 330 | 27 |
Y_RNA | 869 | 0 | 0 |
centromere | 845 | 181 | 21 |
transcriptional_cis_regulatory_region | 713 | 100 | 9 |
cDNA_match | 534 | 188 | 37 |
ncRNA | 441 | 1 | 0 |
vault_RNA | 330 | 0 | 0 |
scRNA | 257 | 0 | 0 |
RNase_P_RNA | 191 | 0 | 0 |
origin_of_replication | 157 | 4 | 0 |
RNase_MRP_RNA | 144 | 0 | 0 |
D_loop | 82 | 0 | 0 |
antisense_RNA | 73 | 52 | 5 |
promoter | 61 | 76 | 8 |
non_allelic_homologous_recombination_region | 54 | 44 | 5 |
sequence_alteration_artifact | 38 | 72 | 15 |
meiotic_recombination_region | 26 | 30 | 0 |
mitotic_recombination_region | 15 | 32 | 0 |
DNaseI_hypersensitive_site | 13 | 23 | 3 |
locus_control_region | 12 | 19 | 4 |
telomerase_RNA | 10 | 0 | 0 |
protein_binding_site | 3 | 6 | 2 |
repeat_instability_region | 3 | 6 | 0 |
tandem_repeat | 3 | 6 | 2 |
V_gene_segment | 3 | 2 | 2 |
CAGE_cluster | 2 | 1 | 0 |
replication_regulatory_region | 2 | 0 | 0 |
conserved_region | 1 | 4 | 0 |
enhancer_blocking_element | 1 | 5 | 1 |
microsatellite | 1 | 0 | 0 |
mobile_genetic_element | 1 | 5 | 0 |
nucleotide_motif | 1 | 1 | 0 |
direct_repeat | 0 | 1 | 0 |
imprinting_control_region | 0 | 1 | 0 |
insulator | 0 | 3 | 1 |
matrix_attachment_site | 0 | 1 | 0 |
recombination_feature | 0 | 2 | 1 |