NepSeqVar

microRNAs (miRNA) are class of small noncoding functional RNAs. The regulatory role of miRNA has been studied in a wide variety of biological processes. Evidences suggesting roles of variation in difference in gene regulation have been shown by many groups including Hu et al. (2008), Jazdzewski et al. (2008), Frazer et al. (2009), Jazdzewski et al. (2009), Tian et al. (2009), Hu et al. (2009), Yang et al. (2009), Mencia et al. (2009) and Sun et al. (2009). Variation or Single Nucleotide Polymorphisms (SNPs) in small RNA can lead to severe defects to the functions of small RNA and might result in diseases.

There have been some effort to understand and analyze variations in miRNA. With the sequence of human genome sequencing projects, variants were curated, which were seen at the genomic level.

There are several studies which tried to identify variations in small RNAs. Bhartiya et al. (2011) identified 106 SNPs. Leu et al. (2012) identified 594 SNPs in 169 mature miRNA and 54 in seed region. Zorc et al. (2012) identified 129 SNPs in the seed region. Gong et al. (2012) identified 757 SNPs in 440 pre-miRNAs. Han et. al. (2013) was able to identify a total of 1899 SNPs in 961 pre-miRNAs , 601 SNPs in mature miRNA and 203 SNPs in seed region (2-8 nt from 5′ end).

NepSeqVar is a collection of all variants of Human small RNAs at transcription level. It was generated by analyzing thousands of small RNAseq human samples, mapping the sequences. Samtools‘s mpileup was used to generate .vcf file from the mapped file. We were able to identify up-to mpileup vcf file with 24389 of variants of mature small RNA variants and 41362 variants in pre-small RNAs, which strongly suggests that the sequence of small RNA are not conserved. This implies the previous studies could reveal only the tip of iceberg.

Fig. 1: Comparison of global Human small RNA variants with NepSeqVar and other studies from Han et al. (2013), Gong et al. (2012), Zorc et al. (2012), Lu et al. (2012) and Bharitiya et al. (2011).


Human Genomic RegionNepSeqVarNepSeq-AS +NepSeq-AS
region38822423030939147
gene29152413865623507
exon212864220953467
mRNA13305811050818713
match1200156676311552
lnc_RNA75580347845971
primary_transcript65024294
miRNA64659184
rRNA639151021
transcript35105279714831
tRNA2953930
biological_region2219586851473
sequence_feature214474317640
enhancer1930380761428
pseudogene149472161325
CDS1398077121270
snoRNA1183091
snRNA222631
silencer209333027
Y_RNA86900
centromere84518121
transcriptional_cis_regulatory_region7131009
cDNA_match53418837
ncRNA44110
vault_RNA33000
scRNA25700
RNase_P_RNA19100
origin_of_replication15740
RNase_MRP_RNA14400
D_loop8200
antisense_RNA73525
promoter61768
non_allelic_homologous_recombination_region54445
sequence_alteration_artifact387215
meiotic_recombination_region26300
mitotic_recombination_region15320
DNaseI_hypersensitive_site13233
locus_control_region12194
telomerase_RNA1000
protein_binding_site362
repeat_instability_region360
tandem_repeat362
V_gene_segment322
CAGE_cluster210
replication_regulatory_region200
conserved_region140
enhancer_blocking_element151
microsatellite100
mobile_genetic_element150
nucleotide_motif110
direct_repeat010
imprinting_control_region010
insulator031
matrix_attachment_site010
recombination_feature021
Table 1: Total number of small RNA variants observed in human genomic regions