All LoF Variants

Tag Name Description
AA, AF, AMR_AF, ASN_AF, AFR_AF, EUR_AF, VT, SNPSOURCE, AC, AN, AVGPOST, ERATE, LDAF, RSQ, THETA, VA Annotations from VAT
Ancestral Determines whether ancestral allele is the same as reference (Ref, Alt, or Neither)
GERPscore Gives the GERP score associated with the variant position
SegDup Gives the number of segmental duplications associated with the variant position
1000GPhase1 YES if variant present in 1000Genomes Phase1 dataset and NO if absent
1000GPhase1_AF Global allele frequency
1000GPhase1_ASN_AF Allele frequency for samples from ASN population
1000GPhase1_AFR_AF Allele frequency for samples from AFR population
1000GPhase1_EUR_AF Allele frequency for samples from EUR population
ESP6500 YES if allele is present in ESP6500 cohort and N if absent
ESP6500_AAF Minor allele frequencies for European Americans, African Americans, All people
GERPelement YES if variant/transcript is within a constrained GERP element and NO otherwise
exoncounts Number of coding exons lost due to the truncating variant: total number of coding exons in the transcript. For splice site variants the number of exons removed is replaced with NA

Premature Stop and Frameshift-Causing Indel Variants

Tag Name Description
near_start YES if variant is within the first 5% of the coding sequence (i.e beginning of the coding sequence), NO otherwise
near_stop YES if variant in the last 5% of the coding sequence (i.e. towards the end of the coding sequence), NO otherwise
canonical YES if the 5’ flanking splice site of the exon containing the LoF variant and the 3’ flanking splice site are both canonical, NO otherwise
XX/XX <5’ flanking splice site>/<3’ flanking splice site> of the exon containing the LoF variant
lofposition calculated one indexed coding sequence position of the premature stop
nmd YES if the transcript is predicted to be a candidate for nonsense mediated decay due to the premature stop, NO if not
lof_anc YES if the LoF allele is same as the ancestral allele
in_segdups YES if the region containing the variant has at least two other duplicated regions in the genome, NO otherwise
disorder_prediction Percentage of amino acid residues that are disordered in the reference sequence : percentage of disordered residues in the peptide lost due to truncation. If the transcript is not associated with disordered regions, “.” is output
PF PFAM protein domains. YES if variant intersects PFAM domain, NO if no intersection and NA if no PFAM annotations exist for the particular transcript
SSF SSF SCOP superfamily protein domains. YES if variant intersects SCOP domain, NO if no intersection and NA if no SCOP annotation exists for the particular transcript
SM SM Smart protein domains. YES if variant intersects region, NO if no intersection and NA if no Smart domain annotation exists for the particular transcript
Tmhmm Transmembrane helix domains. YES if variant intersects region, NO if no intersection and NA if transcript does not contain a predicted transmembrane region
Sigp Signal peptide. YES if variant intersects region, NO if no intersection and NA if no regions exist for the particular transcript
PTM Post translational modifications. YES if variant intersects any post translational modification regions, NO if no intersections and NA if no regions exist for the particular transcript

Splice Variants

Tag Name Description
XX/XX <acceptor_site>:<donor_site> splice sites in reference genome
is_canonical YES if variant intersecting splice site is canonical, NO if not
other_canonical YES if other splice site in the intron, not the splice site intersected by the variant is canonical, NO otherwise
intron_length Gives the length of the intron containing the splice site variant
small_intron YES if the length of intron is less than 15bp, NO otherwise
in_segdups YES if the variant containing region is duplicated at least twice in the genome, NO otherwise
lof_anc YES if the LoF allele is same as the ancestral allele and NO otherwise
alternate_acceptor_site YES if there are potential neighboring splice sites that could be an alternate acceptor splice site and NO otherwise. This is the NAGNAG case.

All LoF Variants

Column Field Description
8 details (VAT Features) includes all features from VAT’s snpMapper and indelMapper. This includes allele frequencies, variant type, etc. This is the “details” column in the tabbed delineated output and the first part of the details section of each transcript in the vcf output.
9 gene gene name
10 gene_ id Ensembl gene ID
11 partial/full full if all transcripts of the gene are affected, partial otherwise
12 transcript Ensembl transcript ID
13 coding_transcript_length length of the transcript in nucleotides
14 longest_coding_transcript YES if transcript is the longest transcript of the gene, NO otherwise
25 GERP_score GERP score of the variant position
26 GERP_element GERP constrained element region containing the variant, Rejection score (start,end, RS score)
27 percentage_gerp_elements_in_truncated_exons Percentage of exons removed due to truncation that are in GERP constrained elements
28 coding_exons_lost:total_exons number of coding exons lost due to truncation : total number of coding exons in th transcript
29 segmental_duplications Gives the position of associated segdups as a bracketed list, or a period if none exist
59 1000GPhase1 Yes if variant present in 1000Genomes Phase1 dataset and No if absent
60 1000GPhase1_AF Global allele frequency
61 1000GPhase1_ASN_AF Allele frequency for samples from ASN population
62 1000GPhase1_AFR_AF Allele frequency for samples from AFR population
63 1000GPhase1_EUR_AF Allele frequency for samples from EUR population
64 ESP6500 YES if allele is present in ESP6500 cohort and N if absent
65 ESP6500_AAF Minor allele frequencies for European Americans, African Americans, All people
66 #_pseudogenes_associated_to_transcript number of pseudogenes or a period if none
67 #_paralogs_associated_to_gene number of paralogs or a period if needed
68 dN/dS_(macaque) ratio of missense to synonymous substitution rates computed from human-macaque ortholog alignments
69 dN/dS_(mouse) ratio of missense to synonymous substitution rates computed from human-mouse ortholog alignments
70 shortest_path_to_recessive_gene shortest path to a recessive gene in protein interaction network, or NA
71 recessive_neighbors gives the number of directly connected recessive gene neighbors in the protein-protein interaction network, or NA
72 shortest_path_to_dominant_gene minimum of length of a shortest path to a dominant gene in the protein interaction network, or NA
73 dominant_neighbors gives the number of directly connected dominant gene neighbors in the protein-protein interaction network, or NA

Premature Stop and Frameshift-Causing Indel Variants

Column Field Description
15 is_single_coding_exon YES if the variant intersects a transcript with only one coding exon, otherwise NO
16 variant_position_in_CDS Gives the one indexed position of the indel/SNP in the coding sequence
17 stop_position_in_CDS Gives the one indexed position of the premature stop in the coding sequence (note that in the case of indels, this provides the position of the premature stop due to the frameshift and will be different from the variant position)
18 causes_NMD YES if the LoF variant-containing transcript is predicted to undergo nonsense mediated decay, calculated by default 50 base pair proximity of the premature stop to the last exon-exon junction, otherwise NO
19 5’_flanking_splice_site Gives the upstream 5’ splice site of the exon that the variant intersects
20 3’_flanking_splice_site Gives the downstream 3’ splice site of the exon that the variant intersects
21 canonical_splice_flank YES if the 5’ flanking splice site is ‘AG’ and the 3’ flanking splice site is ‘GT’, otherwise NO
22 ancestral_allele Gives the nucleotide at the variant position in the ancestral reference genome
23 num_of_lof_flags Number of LoF flags
24 lof_flags List of LoF Flags: in_segdups if LoF variant in region that has two or more segmental duplications in the human genome, lof_anc if LoF allele is same as the ancestral allele, near_start if variant is in the first 5% of the coding transcript, near_stop if variant is in the last 5%
30 disorder_residue/disorder_residue2 Percentage of disordered residues in protein/Percentage of disordered residues in the peptide lost due to the truncating mutation. Or a “.” if transcript has no residues predicted to be in disordered regions

Protein Families & Post Translational Modifications

For the following features:

region_id:count is output if variant intersects feature region

NO_<feature> is output if variant does not intersect feature regions

NA_<feature> is output if variant’s transcript has no feature regions

Column Field Description
31 PF Determines whether the variant intersects a PFAM domain
32 PFtruncated Determines whether PFAM domains are lost due to truncation
33 SSF Determines whether the variant intersects a SCOP domain
34 SSFtruncated Determines whether SCOP domains are lost due to truncation
35 SM Determines whether the variant intersects a SMART domain
36 SMtruncated Determines whether SMART domains are lost due to truncation
37 Tmhmm Determines whether the variant intersects a transmembrane segment
38 Tmhmmtruncated Determines whether transmembrane domains are lost due to truncation
39 Sigp Determines whether the variant intersects a signal peptide
40 Sigptruncated Determines whether signal peptide is lost due to truncation
41 ACETYLATION  
42 ACETYLATIONtruncated  
43 DI-METHYLATION  
44 DI-METHYLATIONtruncated  
45 METHYLATION  
46 METHYLATIONtruncated  
47 MONO-METHYLATION 41 - 58 pertains to various post translational modification sites
48 MONO-METHYLATIONtruncated  
49 O-GlcNAc  
50 O-GlcNActruncated  
51 PHOSPHORYLATION  
52 PHOSPHORYLATIONtruncated  
53 SUMOYLATION  
54 SUMOYLATIONtruncated  
55 TRI-METHYLATION  
56 TRI-METHYLATIONtruncated  
57 UBIQUITINATION  
58 UBIQUITINATIONtruncated  

Splice SNPs

Column Field Description
15 donor Gives the nucleotide sequence of the donor splice site
16 acceptor Gives the nucleotide sequence of the acceptor splice site
17 SNP_in_canonical_site Determines whether the splice site that the SNP affects is canonical (YES/NO)
18 other_splice_site_canonical Determines whether the other splice site is canonical (YES/NO)
19 SNP_location Determines whether the SNP affects the donor or acceptor
20 alt_donor Gives the nucleotide seqeunce of the donor splice site after the SNP change has been made
21 alt_acceptor Gives the nucleotide sequence of the acceptor splice site after the SNP change has been made
22 nagnag_positions Determines possible nearby canonical splice sites to the SNP location. Alternative splice sites
23 intron_length Gives the length of the intron containing the splice SNP
24 num_of_lof_flags Number of LoF flags

Excel