Tutorial

Introduction

GeneGrid enables you to quickly reduce millions of variants to a few or even the single relevant one(s). All known & novel SNVs and InDels in your results can be annotated using our extensive annotation. This tutorial describes step by step how to annotate and filter the list for those variants of interest to you. You will perform a trio analysis and easily identify the most likely disease causing variants. This tutorial should take less than 15 minutes.

Trio Analysis Workflow

This example will show you how GeneGrid can be applied for a trio analysis looking for the most likely disease causing variants. The data in this example comes from a whole exome sequencing study, originally published by Falk et al. [1]. The authors found a novel homozygous missense mutation (M1=NMNAT1 c.25G>A, p.Val9Met) in NMNAT1 that likely causes Leber congenital amaurosis (LCA) which is a form of inherited retinal degeneration characterized by severe vision loss or blindness.

The following figure shows a large consanguineous Pakistani pedigree, including five children affected with LCA. Exome sequencing also confirmed the presence of a second homozygous mutation (M2=GJB2 c.71G>A, p.Trp24*) in GJB2, in children who were affected with Deafness.

Pedigree

Source: Falk et al. [1]

Exome sequence data for the following individuals is available at the NCBI Sequence Read Archive, accession SRP013517:

  • III-4 mother (M1/+, M2/+)
  • III-5 father (M1/+, M2/+)
  • IV-1 daughter (M1/M1, M2/M2)
  • IV-2 son (+/+, M2/M2)
  • IV-3 son (M1/M1, +/+)

Sequence reads of the individuals IV-1, III-4 and III-5 were mapped to the human reference genome (NCBI build 37 / hg19) using the Genomatix Mining Station (GMS) [2]. SAMtools version 1.0 [3] was used to call SNVs and InDels jointly with all three samples. The output format of the variant calling step is VCF which stands for Variant Call Format. It contains all the genomic positions of the variants and the genotypes of the samples and is the required input format to get started with GeneGrid.

1. Import samples

The first step to use GeneGrid is to import and annotate the predicted variants from the VCF file. After you login the samples view can be accessed by either selecting Variant Annotation from the Welcome page or from the main menu at the top of the page.

variant_annotation

Most accounts come preloaded with the tutorial samples and you'll see three 3 samples IV-1, III-4 and III-5 in the right view. In this case you can directly skip this section and continue now with Compare samples.

The final VCF file, containing the predicted variants (SNPs, small insertions and deletions) in all three individuals, can be downloaded here: LCA047_Trio_Demo.vcf.gz. (right click on that link and save the file on your local computer)

VCF files can be imported from the left sidebar. In the first step it is possible to define any pre-filters if it is desired to skip certain variants completely. Defining pre-filters is completely optional and based on the default settings all variants having at least a coverage of one will be imported. The second step is to select and upload the VCF file from your local computer. Click on choose file or browse (the actual label depends on your browser) and select the example VCF file (LCA047_Trio_Demo.vcf.gz) you've downloaded just before. If the upload process does not start automatically please hit the submit button.

upload_vcf

Annotation of variants will take up to one hour and depends on the number of samples and variants that are in the input file. You will receive an email notification when the annotation step finishes.

annotation_running

2. View samples

Annotated samples are accessible from two different views. The samples view contains a table with a detailed listing of imported samples. From this view it is possible to show details about the sample, open the variants view for the sample or start a comparison analysis. An additional separate page where all the samples can be found is the Result Management which is the interface to rename sample names, edit comments for samples and most importantly to remove samples. The Result Management is also directly accessible from the main menu.

For the next step we decide to use the first view and load the samples view by selecting Variant Annotation from the main menu. All imported samples will be displayed with information such as source file, sample name, number of non-ref variants, class and activation status.

list_of_samples

In order to continue with our analysis, we'll now activate all samples. If you use your mouse to hover of the rows you'll notice a small Activate link that appears on the left side of the row. If you click this link the purchase view will be loaded to purchase the sample with credits. Since this is a demo example you don't have to pay credits for any of the three samples. Repeat this activation procedure for the rest of the samples. Having activated all single sample analyses, you should now see a check mark for each sample in the Activated column.

activated_all

When you now point your mouse cursor over one the samples rows the link on the left side has changed from Activate to Open. At this point we are not interested in viewing individual samples by themselves. Therefore we will continue to compare these samples among each other.

3. Compare samples

After importing and annotating variants, we're ready to perform a sample comparison analysis. If you are on the main page, you can either click on the Sample Comparison field or again use the main menu select. On the sample view, we use this time the Compare samples section in the left sidebar.

First, we select the type of comparison study we want to perform. Here we select Trio and assign our samples to the groups that appear below. Drag the sample IV-1 (this is the affected daughter) from the table on the right side and drop it in the Offspring (affected) group. Repeat this for the sample III-4 and III-5 (mother and father) but drop both samples in the Parents (not affected) group. Finally, give the analysis a name (My first trio analysis is used as title in this example) and hit the Submit button.

trio_analysis

The sample comparison will start and you should see the following progress info:

job_status_comparison

You can see that the analysis has been submitted successfully. The analysis will take a couple of minutes. Once the comparative analysis is finished, you will be redirected automatically to the activation page.

job_status_comparison

Comparison analyses are always free to generate and will be stored for 30 days at no cost. After 30 days a small retention fee will be deducted unless the analysis is removed from the Result Management.

After proceeding the result table with the variants is loaded. This table is our workspace and contains several general columns and an additional column for each sample at the very end. The sample column will display the genotype of each sample using a symbol. A homozygous variant call is a filled black square, a heterozygous variant call is a half-filled square and a reference call is a white square. If GeneGrid has no or low quality information about the genotype of a certain sample, an empty cell is displayed to indicate a no call.

genegrid_view_trio

4. Filter variants

Let's see how we can filter for the most likely disease causing variants. The main table of the comparative sample analysis contains 113,697 variants. You find the total number of variants at the bottom right of the table.

filter_summary_total

Above the table there is a filter bar, through which users can define filter criteria for each column. In some cases drop-down boxes lists the available search terms that can be filtered for, in other cases typing a free text and clicking enter will activate the filter. If filter criteria for multiple columns are defined, a row has to match all criteria of the filter bar otherwise it will be hidden.

First, we consider only variants that are deleterious (e.g. non-synonymous or frameshift mutations) and alter the protein sequence or hit a canonical splice-site. Please select Yes from the drop-down box in the Deleterious variant column. The number of filtered variants should go down to 22,104.

deleterious_filter

When searching for rare diseases, it is very helpful to compare the variants found in affected individuals against background populations. For each variant within a background population, a global allele frequency can be calculated. In this example, we use the allele frequencies from the 1000 Genomes Project [4]. Enter 0.01 in the filter field of the gAF column. This filter removes all variants that occurred in more than 1% of the 1000 Genomes Project. The number of filtered variants should now be 5,039.

gmaf_filter

In our trio example, both parents are unaffected but the daughter is affected. In this case our genotype search strategy includes all filters for an autosomal recessive disease.

First, we will search for a homozygous mutation in the daughter. Select Homozygous in the drop-down box of the IV-1 column, and Heterozygous for both parental genotypes. This reduces the number of variants to 44.

genotype_filter

We will use gene-disease associations to filter the variant list and link the remaining variants to the diagnosed disease Leber congenital amaurosis. Disease association columns are optional columns and are not visible by default. Click on the settings wheel above the table on the left. Additional annotations (e.g. SIFT, PhyloP, BLOSUM) will be displayed.

annotation_wheel

Select the annotation Literature diseases. A new column is now added to the the main table. In the filter bar enter the disease term Leber congenital amaurosis and while you enter the term a suggestion will pop up.

disease_filter

Choose the very first term and hit enter to start the filter process. One variant remains. It is a homozygous missense mutation on chromosome 1 at position 10,032,156 in the gene NMNAT1.

Clicking on the variant row, additional details are shown at the bottom. You will find more detailed information on the variant, like the coverage, SIFT, PhyloP and other conservation or protein effect scores. As you can see in the Sample details table the position of the NMNAT1 mutation is well covered in all three samples (>30 reads), evolutionary conserved (PhyloP > 0) and predicted as damaging (SIFT < 0.05).

NMNAT1_details

Clicking on the Transcript effects tab, you get additional transcript information, including the position of the amino acid change in the protein sequence.

transcript_details

The tab Literature diseases lists all disease associations of that particular gene, including descriptions and links to Genomatix LitInspector [5,6] results. There you can find all publications where the gene disease association has been found.

literature_diseases_details

We can also get a list of all publications where the gene has been linked to a given disease. By moving the mouse over the row with Leber Congenital Amaurosis a small link called Review appears. From there you can jump directly to the evidence listing from LitInspector.

LCA_LitInspector_link

As of October 2014, there are 10 different publications containing both, NMNAT1 and Leber congenital amaurosis, in the same abstract. Scroll down to the last entry.

NMNAT1_paper

This is the paper Falk et al. [1] from our example describing the exact same missense mutation in NMNAT1 (c.25G>A, p.Val9Met).

Back in our variants view we can take a look at our current filter definition shown in the left sidebar.

filter_definition

Another convenient tool in the sidebar is the filter history. It lists in reverse chronological order the filter steps we have performed so far up until we filtered down the variants to the single one.

filter_history

On the left side of the main table you will find a Report generator tab. It allows for generating reports for up to 10 filtered variants. Type in a title and hit the Generate button and wait for the PDF file.

report_generator

In our example, the daughter is also affected with Deafness. Enter the disease term Congenital deafness into the filter field of Literature diseases. From the popup list of suggested terms we select the only term that comes up.

disease_filter_deafness

Again, one variant remains in the list. It is a homozygous nonsense mutation in the gene GJB2 which has also been described in the publication.

Advanced usage

Let's try a completely different strategy where we are mainly interested to see if we have variants that had been discovered already by other researchers. To restart our filters, you can use the Reset button on the left sidebar.

reset_filter

We are going select different columns from the optional column settings. Please select all 4 columns from the section Clinical and diagnostic annotation and the column Diff. between groups from the section Comparison summary.

annotation_diagnostic_columns

The Clinical significance can be used to check whether there exists any annotation in ClinVar [7] at the genomic positions of our variants regardless if the actual variants in our tables are synonymous, missense or any other kind of effect category.

filter_pathogenic

After applying this filter the list of variants shrinks to merely 74. This filter depends vastly on the content of ClinVar which is steadily increasing and if one of our variants had not been reported in ClinVar we would have missed it at that point. Nevertheless it is a valid strategy to quickly overlap for known variants in the ClinVar set.

As second filter setting we make sure that our affected sample has to have a different genotype that the unaffected parents. We set 1 as number of Diff. between groups which is a very general column to filter for the number of samples that are at different between both groups.

filter_group

We are now at 14 variants which is already a feasible number of variants to go through individually. Adding the genotype filter Homozygous for the affected sample IV-1 reduces the list further to 4 variants. This list still includes both previously indicated variants p.Val9Met in NMNAT1 and p.Trp24* in GJB2.

Further inspection gives additional valuable information. GJB2 has more than 90 diagnostic tests available in Genetic Testing Registry (GTR) [8]. The majority of tests relate to some type of deafness.

Common practice is to examine variants in context of the underlying alignments. A sample can be associated with a BAM file containing all the alignments. To directly access the Genome Browser from the variant list just move the mouse over a variant row and the hit the Browse button.

variant_browse

The associated BAM files are automatically loaded in the browser.

genome_browser_view

Manage results

Imported samples and generated comparison analyses can be administered in the Result Management. It can be opened through the main menu:

result_management_menu

There are 3 sections available in the result interface:

result_management_project

The section Variant Annotation contains all the imported samples, the section Sample Comparison lists all your generated comparison analyses and in the section BAM files you find associated BAM files in case you have uploaded any. You can also directly open either a sample or a comparison by clicking on the title. The Result Management makes it possible to rename the sample or comparison and to edit the comment or description for the result so that it's easier to find them later.

result_management_administration

The button with the X lets you delete any sample or comparison analysis from the server so you won't be charged any credits for the retention fee. Please remember that comparison analyses can be generated as many as you want at no cost and will be stored for free for 30 days. After 30 days the retention will automatically be deducted. If you don't like to keep those data you can easily delete them here. The user menu right next to the main menu allows to check your current credit balance and review the transaction history.

Conclusion

That's it! You're done with the GeneGrid Tutorial. Thank you for following along all the way. Hopefully you feel familiar enough now to run your own analyses on the system. For a more detailed description on any of the columns, annotation or functions in GeneGrid please refer to the manual which is accessible via the main menu on the top of every page.

Was this tutorial helpful to you? Did you encounter any problems while going through the tutorial? Any topic for which you'd like to have another tutorial? For any questions or comments, you're most welcome to contact us.

References

  1. Falk MJ, Zhang Q, Nakamaru-Ogiso E, et al. NMNAT1 mutations cause Leber congenital amaurosis. Nat Genet. 2012;44(9):1040-5. doi: 10.1038/ng.2361. PMC
  2. Genomatix Mining Station (GMS). web: https://www.genomatix.de/solutions/genomatix-mining-station.html
  3. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. PMC
  4. Abecasis GR, Altshuler D, Auton A, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061-73. doi: 10.1038/nature09534. Project page
  5. Frisch M, Klocke B, Haltmeier M, Frech K. LitInspector: literature and signal transduction pathway mining in PubMed abstracts. Nucleic Acids Res. 2009;37(suppl 2):W135-40. doi: 10.1093/nar/gkp303
  6. Genomatix Software Suite. web: http://www.genomatix.de/solutions/genomatix-software-suite.html
  7. Landrum MJ, Lee JM, Riley GR, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42(1):D980-5. doi: 10.1093/nar/gkt1113
  8. Rubinstein WS, Maglott DR, Lee JM, et al. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Res. 2013;41(D1):D925-35. doi: 10.1093/nar/gks1173