Variant Annotation

This page lists the currently used sources for the annotation process.

Genomatix sources

Source Description Version
ElDorado Genomatix Genome Annotation Oct 2016
ElDorado is the Genomatix genome annotation. Information from a variety of different sources together with data generated by Genomatix’ proprietary algorithms is used to set up a database of quality checked data.
Genomatix GmbH

  • Ensembl Human Genome Annotation, Release 86, European Bioinformatics Institute (EBI)
  • NCBI Homo sapiens Annotation (NCBI RefSeq), Release 108, National Center for Biotechnology Information (NCBI)

Genomatix Genomatix Variant Annotation
Genomatix GmbH
MatBase Transcription factor knowledge base 9.3
MatBase is a database containing information on transcription factors and the corresponding weight matrices used by MatInspector to locate potential binding sites of these transcription factors in DNA sequences.
Genomatix GmbH

Cartharius K, Frech K, Grote K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics. 21(13), 2933-2942 (2005). Abstract
Quandt K, Frech K, Karas H, Wingender E, Werner T. Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Research. 23(23), 4878-4884 (1995). Abstract

LitInspector Genomatix Literature and Pathway Mining Nov 2016
LitInspector performs large scale text mining on more than 20 million PubMed entries.
Genomatix GmbH

Frisch M, Klocke B, Haltmeier M, Frech K LitInspector: literature and signal transduction pathway mining in PubMed abstracts. Nucleic Acids Research. 37(suppl 2), W135-W140 (2009). Abstract

Panels Popular gene panels 2016
This compilation of gene lists covers sequencing panels from external providers and vendors.
Genomatix GmbH

  • American College of Medical Genetics and Genomics (ACMG), 2013
  • Clinical Genome Resource (ClinGen), 2016
  • Catalogue Of Somatic Mutations In Cancer (COSMIC), Release 74, Wellcome Trust Sanger Institute
  • xGen Lockdown Panels, 2016, Integrated DNA Technologies, Inc.
  • TruSight Sequencing Panels, 2016, Illumina, Inc.
  • Ion AmpliSeq Panels (AmpliSeq), 2016, Thermo Fisher Scientific Inc.
  • GeneRead DNAseq Targeted Panels, 2016, QIAGEN GmbH
  • NimbleGen SeqCap EZ Designs (SeqCap), 2016, Roche Sequencing

Thesaurus Genomatix Thesaurus 2016
This reference is a combined thesaurus based on MeSH and NCIt.
Genomatix GmbH

  • Medical Subject Headings (MeSH), 2016, National Library of Medicine (NLM), National Institutes of Health (NIH)
  • Unified Medical Language System (UMLS), 2016AA, National Library of Medicine (NLM), National Institutes of Health (NIH)

GePS Genomatix Pathway System May 2013
The Genomatix Pathway System (GePS) uses information extracted from public and proprietary databases to display canonical pathways or to create and extend networks based on literature data.
Genomatix GmbH

  • BioCarta Proteomic Pathway Project, Jun 2004, BioCarta LLC
  • The Cancer Cell Map (CellMap), May 2006, Memorial Sloan-Kettering Cancer Center
  • INOH Pathway Database, Mar 2011
  • NCI-Nature Curated Pathways, Aug 2012, Nature Publishing Group
  • Pathway Interaction Database (PID), Aug 2012, National Cancer Institute (NCI)
  • Reactome, Dec 2007

External sources

Source Description Version
dbSNP Database of Single Nucleotide Polymorphisms Build 147
The NCBI Short Genetic Variations (SNV) database, also known as dbSNP, catalogs short variations in nucleotide sequences from a wide range of organisms.
National Center for Biotechnology Information (NCBI), National Library of Medicine

Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 29(1), 308-311 (2001). Abstract

1000 Genomes Project Aug 2015
The 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts. This resource will support genome-wide association studies and other medical research studies.
1000 Genomes Project Consortium

Auton A, Brooks LD, Durbin RM, et al. A global reference for human genetic variation. Nature. 526(7571), 68-74 (2015). Abstract
Abecasis GR, Auton A, Brooks LD, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 491(7422), 56-65 (2012). Abstract

ESP6500 Grand Opportunity Exome Sequencing Project Aug 2014
The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.
National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health (NIH)
ExAC Exome Aggregation Consortium Release 0.3.1
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
Exome Aggregation Consortium (ExAC), Cambridge, MA

Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 536(7616), 285-91 (2016). Abstract

BLOSUM Blocks Substitution Matrix

Henikoff S, Henikoff JG Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences. 89(22), 10915-10919 (1992). Abstract

SIFT Sorting Intolerant From Tolerant Nov 2014
SIFT is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. Pre-calculated SIFT scores using SIFT version 5.2.2 were downloaded from Ensembl.
Ensembl, EMBL-EBI

Cunningham F, Amode MR, Barrell D, et al. Ensembl 2015. Nucleic Acids Research. 43 (D1), D662-D669 (2014). Abstract
Kumar P, Henikoff S, Ng PC Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. 4(8), 1073-1081 (2009). Abstract
Ng PC, Henikoff S SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research. 31(13), 3812-3814 (2003). Abstract

PolyPhen Polymorphism Phenotyping 2.2.2
PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Pre-calculated PolyPhen scores using PolyPhen-2 version 2.2.2 were downloaded from Ensembl.
Ensembl, EMBL-EBI

Liu X, Jian X, Boerwinkle E dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Human Mutation. Epub ahead of print (2013). Abstract
Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nature Methods. 7(4), 248-249 (2010). Abstract

PhyloP Nov 2009
Computation of p-values for conservation or acceleration, either lineage-specific or across all branches.
UC Santa Cruz

Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A Detection of nonneutral substitution rates on mammalian phylogenies. Genome Research. 20(1), 110-121 (2010). Abstract
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D The Human Genome Browser at UCSC. Genome Research. 12(6), 996-1006 (2002). Abstract

GERP Genomic Evolutionary Rate Profiling Dec 2011
GERP identifies constrained elements in multiple alignments by quantifying substitution deficits.
Stanford University

Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Computational Biology. 6(12), e1001025 (2010). Abstract

29 Mammals Project Oct 2011
The mammalian genome project is a NIH-funded effort to expand the current genome coverage of the mammals (human, chimpanzee, mouse, dog, opposum) by sequencing 24 additional mammals to low-coverage (2x).
Broad Institute

Lindblad-toh K, Garber M, Zuk O, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 478(7370), 476-482 (2011). Abstract
Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics. 25(12), i54-62 (2009). Abstract

Ensembl Regulatory Build Release 80
The Ensembl Regulatory Build provides a genome-wide set of regions that are likely to be involved in gene regulation.
European Bioinformatics Institute (EBI)

Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR The Ensembl Regulatory Build. Genome Biology. 16(1), 56 (2015). Abstract

GTR Genetic Testing Registry Nov 2016
The Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test information by providers.
National Center for Biotechnology Information (NCBI), National Library of Medicine

Rubinstein WS, Maglott DR, Lee JM, et al. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Research. 41(D1), D925 (2013). Abstract

ClinVar Nov 2016
ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.
National Center for Biotechnology Information (NCBI)

Landrum MJ, Lee JM, Riley GR, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research. gkt1113 (2013). Abstract

ClinVar and OMIM Comprehensive gene-based disease database from ClinVar and OMIM Nov 2016
This product contains information from the Online Mendelian Inheritance in Man® (OMIM®) database, which has been obtained under a license from the Johns Hopkins University. This product does not represent the entire, unmodified OMIM® database, which is available in its entirety at http://omim.org/downloads. OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes. Copyright © 1966 – 2015, Johns Hopkins University. All rights reserved.
National Center for Biotechnology Information (NCBI)

  • ClinVar, Nov 2016, National Center for Biotechnology Information (NCBI)
  • Online Mendelian Inheritance in Man (OMIM), Nov 2016, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (JHU)

COSMIC Catalogue Of Somatic Mutations In Cancer Release 77
COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
Wellcome Trust Sanger Institute

Forbes SA, Bindal N, Bamford S, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research. 39(suppl 1), D945-D950 (2011). Abstract

GO Gene Ontology Nov 2016
The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases.
GO Consortium

Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 25(1), 25-29 (2000). Abstract

HGVS Recommendations for the description of sequence variants 2.0
Human Genome Variation Society

Den Dunnen JT, Antonarakis SE Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion. Human Mutation. 15(1), 7-12 (2000). Abstract

Ensembl Ensembl Human Genome Annotation Release 86
The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
European Bioinformatics Institute (EBI)

Cunningham F, Amode MR, Barrell D, et al. Ensembl 2015. Nucleic Acids Research. 43 (D1), D662-D669 (2014). Abstract

OMIM Online Mendelian Inheritance in Man Nov 2016
This product contains information from the Online Mendelian Inheritance in Man® (OMIM®) database, which has been obtained under a license from the Johns Hopkins University. This product does not represent the entire, unmodified OMIM® database, which is available in its entirety at http://omim.org/downloads. OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes. Copyright © 1966 – 2015, Johns Hopkins University. All rights reserved.
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (JHU)

Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 33(suppl 1), D514-D517 (2005). Abstract