Tag Name | Description |
---|---|
AA, AF, AMR_AF, ASN_AF, AFR_AF, EUR_AF, VT, SNPSOURCE, AC, AN, AVGPOST, ERATE, LDAF, RSQ, THETA, VA | Annotations from VAT |
Ancestral | Determines whether ancestral allele is the same as reference (Ref, Alt, or Neither) |
GERPscore | Gives the GERP score associated with the variant position |
SegDup | Gives the number of segmental duplications associated with the variant position |
1000GPhase1 | YES if variant present in 1000Genomes Phase1 dataset and NO if absent |
1000GPhase1_AF | Global allele frequency |
1000GPhase1_ASN_AF | Allele frequency for samples from ASN population |
1000GPhase1_AFR_AF | Allele frequency for samples from AFR population |
1000GPhase1_EUR_AF | Allele frequency for samples from EUR population |
ESP6500 | YES if allele is present in ESP6500 cohort and N if absent |
ESP6500_AAF | Minor allele frequencies for European Americans, African Americans, All people |
GERPelement | YES if variant/transcript is within a constrained GERP element and NO otherwise |
exoncounts | Number of coding exons lost due to the truncating variant: total number of coding exons in the transcript. For splice site variants the number of exons removed is replaced with NA |
Tag Name | Description |
---|---|
near_start | YES if variant is within the first 5% of the coding sequence (i.e beginning of the coding sequence), NO otherwise |
near_stop | YES if variant in the last 5% of the coding sequence (i.e. towards the end of the coding sequence), NO otherwise |
canonical | YES if the 5’ flanking splice site of the exon containing the LoF variant and the 3’ flanking splice site are both canonical, NO otherwise |
XX/XX | <5’ flanking splice site>/<3’ flanking splice site> of the exon containing the LoF variant |
lofposition | calculated one indexed coding sequence position of the premature stop |
nmd | YES if the transcript is predicted to be a candidate for nonsense mediated decay due to the premature stop, NO if not |
lof_anc | YES if the LoF allele is same as the ancestral allele |
in_segdups | YES if the region containing the variant has at least two other duplicated regions in the genome, NO otherwise |
disorder_prediction | Percentage of amino acid residues that are disordered in the reference sequence : percentage of disordered residues in the peptide lost due to truncation. If the transcript is not associated with disordered regions, “.” is output |
PF | PFAM protein domains. YES if variant intersects PFAM domain, NO if no intersection and NA if no PFAM annotations exist for the particular transcript |
SSF | SSF SCOP superfamily protein domains. YES if variant intersects SCOP domain, NO if no intersection and NA if no SCOP annotation exists for the particular transcript |
SM | SM Smart protein domains. YES if variant intersects region, NO if no intersection and NA if no Smart domain annotation exists for the particular transcript |
Tmhmm | Transmembrane helix domains. YES if variant intersects region, NO if no intersection and NA if transcript does not contain a predicted transmembrane region |
Sigp | Signal peptide. YES if variant intersects region, NO if no intersection and NA if no regions exist for the particular transcript |
PTM | Post translational modifications. YES if variant intersects any post translational modification regions, NO if no intersections and NA if no regions exist for the particular transcript |
Tag Name | Description |
---|---|
XX/XX | <acceptor_site>:<donor_site> splice sites in reference genome |
is_canonical | YES if variant intersecting splice site is canonical, NO if not |
other_canonical | YES if other splice site in the intron, not the splice site intersected by the variant is canonical, NO otherwise |
intron_length | Gives the length of the intron containing the splice site variant |
small_intron | YES if the length of intron is less than 15bp, NO otherwise |
in_segdups | YES if the variant containing region is duplicated at least twice in the genome, NO otherwise |
lof_anc | YES if the LoF allele is same as the ancestral allele and NO otherwise |
alternate_acceptor_site | YES if there are potential neighboring splice sites that could be an alternate acceptor splice site and NO otherwise. This is the NAGNAG case. |
Column | Field | Description |
---|---|---|
8 | details (VAT Features) | includes all features from VAT’s snpMapper and indelMapper. This includes allele frequencies, variant type, etc. This is the “details” column in the tabbed delineated output and the first part of the details section of each transcript in the vcf output. |
9 | gene | gene name |
10 | gene_ id | Ensembl gene ID |
11 | partial/full | full if all transcripts of the gene are affected, partial otherwise |
12 | transcript | Ensembl transcript ID |
13 | coding_transcript_length | length of the transcript in nucleotides |
14 | longest_coding_transcript | YES if transcript is the longest transcript of the gene, NO otherwise |
25 | GERP_score | GERP score of the variant position |
26 | GERP_element | GERP constrained element region containing the variant, Rejection score (start,end, RS score) |
27 | percentage_gerp_elements_in_truncated_exons | Percentage of exons removed due to truncation that are in GERP constrained elements |
28 | coding_exons_lost:total_exons | number of coding exons lost due to truncation : total number of coding exons in th transcript |
29 | segmental_duplications | Gives the position of associated segdups as a bracketed list, or a period if none exist |
59 | 1000GPhase1 | Yes if variant present in 1000Genomes Phase1 dataset and No if absent |
60 | 1000GPhase1_AF | Global allele frequency |
61 | 1000GPhase1_ASN_AF | Allele frequency for samples from ASN population |
62 | 1000GPhase1_AFR_AF | Allele frequency for samples from AFR population |
63 | 1000GPhase1_EUR_AF | Allele frequency for samples from EUR population |
64 | ESP6500 | YES if allele is present in ESP6500 cohort and N if absent |
65 | ESP6500_AAF | Minor allele frequencies for European Americans, African Americans, All people |
66 | #_pseudogenes_associated_to_transcript | number of pseudogenes or a period if none |
67 | #_paralogs_associated_to_gene | number of paralogs or a period if needed |
68 | dN/dS_(macaque) | ratio of missense to synonymous substitution rates computed from human-macaque ortholog alignments |
69 | dN/dS_(mouse) | ratio of missense to synonymous substitution rates computed from human-mouse ortholog alignments |
70 | shortest_path_to_recessive_gene | shortest path to a recessive gene in protein interaction network, or NA |
71 | recessive_neighbors | gives the number of directly connected recessive gene neighbors in the protein-protein interaction network, or NA |
72 | shortest_path_to_dominant_gene | minimum of length of a shortest path to a dominant gene in the protein interaction network, or NA |
73 | dominant_neighbors | gives the number of directly connected dominant gene neighbors in the protein-protein interaction network, or NA |
Column | Field | Description |
---|---|---|
15 | is_single_coding_exon | YES if the variant intersects a transcript with only one coding exon, otherwise NO |
16 | variant_position_in_CDS | Gives the one indexed position of the indel/SNP in the coding sequence |
17 | stop_position_in_CDS | Gives the one indexed position of the premature stop in the coding sequence (note that in the case of indels, this provides the position of the premature stop due to the frameshift and will be different from the variant position) |
18 | causes_NMD | YES if the LoF variant-containing transcript is predicted to undergo nonsense mediated decay, calculated by default 50 base pair proximity of the premature stop to the last exon-exon junction, otherwise NO |
19 | 5’_flanking_splice_site | Gives the upstream 5’ splice site of the exon that the variant intersects |
20 | 3’_flanking_splice_site | Gives the downstream 3’ splice site of the exon that the variant intersects |
21 | canonical_splice_flank | YES if the 5’ flanking splice site is ‘AG’ and the 3’ flanking splice site is ‘GT’, otherwise NO |
22 | ancestral_allele | Gives the nucleotide at the variant position in the ancestral reference genome |
23 | num_of_lof_flags | Number of LoF flags |
24 | lof_flags | List of LoF Flags: in_segdups if LoF variant in region that has two or more segmental duplications in the human genome, lof_anc if LoF allele is same as the ancestral allele, near_start if variant is in the first 5% of the coding transcript, near_stop if variant is in the last 5% |
30 | disorder_residue/disorder_residue2 | Percentage of disordered residues in protein/Percentage of disordered residues in the peptide lost due to the truncating mutation. Or a “.” if transcript has no residues predicted to be in disordered regions |
For the following features:
region_id:count is output if variant intersects feature region
NO_<feature> is output if variant does not intersect feature regions
NA_<feature> is output if variant’s transcript has no feature regions
Column | Field | Description |
---|---|---|
31 | PF | Determines whether the variant intersects a PFAM domain |
32 | PFtruncated | Determines whether PFAM domains are lost due to truncation |
33 | SSF | Determines whether the variant intersects a SCOP domain |
34 | SSFtruncated | Determines whether SCOP domains are lost due to truncation |
35 | SM | Determines whether the variant intersects a SMART domain |
36 | SMtruncated | Determines whether SMART domains are lost due to truncation |
37 | Tmhmm | Determines whether the variant intersects a transmembrane segment |
38 | Tmhmmtruncated | Determines whether transmembrane domains are lost due to truncation |
39 | Sigp | Determines whether the variant intersects a signal peptide |
40 | Sigptruncated | Determines whether signal peptide is lost due to truncation |
41 | ACETYLATION | |
42 | ACETYLATIONtruncated | |
43 | DI-METHYLATION | |
44 | DI-METHYLATIONtruncated | |
45 | METHYLATION | |
46 | METHYLATIONtruncated | |
47 | MONO-METHYLATION | 41 - 58 pertains to various post translational modification sites |
48 | MONO-METHYLATIONtruncated | |
49 | O-GlcNAc | |
50 | O-GlcNActruncated | |
51 | PHOSPHORYLATION | |
52 | PHOSPHORYLATIONtruncated | |
53 | SUMOYLATION | |
54 | SUMOYLATIONtruncated | |
55 | TRI-METHYLATION | |
56 | TRI-METHYLATIONtruncated | |
57 | UBIQUITINATION | |
58 | UBIQUITINATIONtruncated |
Column | Field | Description |
---|---|---|
15 | donor | Gives the nucleotide sequence of the donor splice site |
16 | acceptor | Gives the nucleotide sequence of the acceptor splice site |
17 | SNP_in_canonical_site | Determines whether the splice site that the SNP affects is canonical (YES/NO) |
18 | other_splice_site_canonical | Determines whether the other splice site is canonical (YES/NO) |
19 | SNP_location | Determines whether the SNP affects the donor or acceptor |
20 | alt_donor | Gives the nucleotide seqeunce of the donor splice site after the SNP change has been made |
21 | alt_acceptor | Gives the nucleotide sequence of the acceptor splice site after the SNP change has been made |
22 | nagnag_positions | Determines possible nearby canonical splice sites to the SNP location. Alternative splice sites |
23 | intron_length | Gives the length of the intron containing the splice SNP |
24 | num_of_lof_flags | Number of LoF flags |