GeneGrid enables you to quickly reduce millions of variants to a few or even the single relevant one(s). All known & novel SNVs and InDels in your results can be annotated using our extensive annotation. This tutorial describes step by step how to annotate and filter the list for those variants of interest to you. You will perform a trio analysis and easily identify the most likely disease causing variants. This tutorial should take less than 15 minutes.
This example will show you how GeneGrid can be applied for a trio analysis looking for the most likely disease causing variants. The data in this example comes from a whole exome sequencing study, originally published by Falk et al. . The authors found a novel homozygous missense mutation (M1=NMNAT1 c.25G>A, p.Val9Met) in NMNAT1 that likely causes Leber congenital amaurosis (LCA) which is a form of inherited retinal degeneration characterized by severe vision loss or blindness.
The following figure shows a large consanguineous Pakistani pedigree, including five children affected with LCA. Exome sequencing also confirmed the presence of a second homozygous mutation (M2=GJB2 c.71G>A, p.Trp24*) in GJB2, in children who were affected with Deafness.
Exome sequence data for the following individuals is available at the NCBI Sequence Read Archive, accession SRP013517:
Sequence reads of the individuals IV-1, III-4 and III-5 were mapped to the human reference genome (NCBI build 37 / hg19) using the Genomatix Mining Station (GMS) . SAMtools version 1.0  was used to call SNVs and InDels jointly with all three samples. The output format of the variant calling step is VCF which stands for Variant Call Format. It contains all the genomic positions of the variants and the genotypes of the samples and is the required input format to get started with GeneGrid.
The first step to use GeneGrid is to import and annotate the predicted variants from the VCF file. After you login the samples view can be accessed by either selecting Variant Annotation from the Welcome page or from the main menu at the top of the page.
Most accounts come preloaded with the tutorial samples and you'll see three 3 samples IV-1, III-4 and III-5 in the right view. In this case you can directly skip this section and continue now with Compare samples.
The final VCF file, containing the predicted variants (SNPs, small insertions and deletions) in all three individuals, can be downloaded here: LCA047_Trio_Demo.vcf.gz. (right click on that link and save the file on your local computer)
VCF files can be imported from the left sidebar. In the first step it is possible to define any pre-filters if it is desired to skip certain variants completely. Defining pre-filters is completely optional and based on the default settings all variants having at least a coverage of one will be imported. The second step is to select and upload the VCF file from your local computer. Click on choose file or browse (the actual label depends on your browser) and select the example VCF file (LCA047_Trio_Demo.vcf.gz) you've downloaded just before. If the upload process does not start automatically please hit the submit button.
Annotation of variants will take up to one hour and depends on the number of samples and variants that are in the input file. You will receive an email notification when the annotation step finishes.
Annotated samples are accessible from two different views. The samples view contains a table with a detailed listing of imported samples. From this view it is possible to show details about the sample, open the variants view for the sample or start a comparison analysis. An additional separate page where all the samples can be found is the Result Management which is the interface to rename sample names, edit comments for samples and most importantly to remove samples. The Result Management is also directly accessible from the main menu.
For the next step we decide to use the first view and load the samples view by selecting Variant Annotation from the main menu. All imported samples will be displayed with information such as source file, sample name, number of non-ref variants, class and activation status.
In order to continue with our analysis, we'll now activate all samples. If you use your mouse to hover of the rows you'll notice a small Activate link that appears on the left side of the row. If you click this link the purchase view will be loaded to purchase the sample with credits. Since this is a demo example you don't have to pay credits for any of the three samples. Repeat this activation procedure for the rest of the samples. Having activated all single sample analyses, you should now see a check mark for each sample in the Activated column.
When you now point your mouse cursor over one the samples rows the link on the left side has changed from Activate to Open. At this point we are not interested in viewing individual samples by themselves. Therefore we will continue to compare these samples among each other.
After importing and annotating variants, we're ready to perform a sample comparison analysis. If you are on the main page, you can either click on the Sample Comparison field or again use the main menu select. On the sample view, we use this time the Compare samples section in the left sidebar.
First, we select the type of comparison study we want to perform. Here we select Trio and assign our samples to the groups that appear below. Drag the sample IV-1 (this is the affected daughter) from the table on the right side and drop it in the Offspring (affected) group. Repeat this for the sample III-4 and III-5 (mother and father) but drop both samples in the Parents (not affected) group. Finally, give the analysis a name (My first trio analysis is used as title in this example) and hit the Submit button.
The sample comparison will start and you should see the following progress info:
You can see that the analysis has been submitted successfully. The analysis will take a couple of minutes. Once the comparative analysis is finished, you will be redirected automatically to the activation page.
Comparison analyses are always free to generate and will be stored for 30 days at no cost. After 30 days a small retention fee will be deducted unless the analysis is removed from the Result Management.
After proceeding the result table with the variants is loaded. This table is our workspace and contains several general columns and an additional column for each sample at the very end. The sample column will display the genotype of each sample using a symbol. A homozygous variant call is a filled black square, a heterozygous variant call is a half-filled square and a reference call is a white square. If GeneGrid has no or low quality information about the genotype of a certain sample, an empty cell is displayed to indicate a no call.
Let's see how we can filter for the most likely disease causing variants. The main table of the comparative sample analysis contains 113,697 variants. You find the total number of variants at the bottom right of the table.
Above the table there is a filter bar, through which users can define filter criteria for each column. In some cases drop-down boxes lists the available search terms that can be filtered for, in other cases typing a free text and clicking enter will activate the filter. If filter criteria for multiple columns are defined, a row has to match all criteria of the filter bar otherwise it will be hidden.
First, we consider only variants that are deleterious (e.g. non-synonymous or frameshift mutations) and alter the protein sequence or hit a canonical splice-site. Please select Yes from the drop-down box in the Deleterious variant column. The number of filtered variants should go down to 22,104.
When searching for rare diseases, it is very helpful to compare the variants found in affected individuals against background populations. For each variant within a background population, a global allele frequency can be calculated. In this example, we use the allele frequencies from the 1000 Genomes Project . Enter 0.01 in the filter field of the gAF column. This filter removes all variants that occurred in more than 1% of the 1000 Genomes Project. The number of filtered variants should now be 5,039.
In our trio example, both parents are unaffected but the daughter is affected. In this case our genotype search strategy includes all filters for an autosomal recessive disease.
First, we will search for a homozygous mutation in the daughter. Select Homozygous in the drop-down box of the IV-1 column, and Heterozygous for both parental genotypes. This reduces the number of variants to 44.
We will use gene-disease associations to filter the variant list and link the remaining variants to the diagnosed disease Leber congenital amaurosis. Disease association columns are optional columns and are not visible by default. Click on the settings wheel above the table on the left. Additional annotations (e.g. SIFT, PhyloP, BLOSUM) will be displayed.
Select the annotation Literature diseases. A new column is now added to the the main table. In the filter bar enter the disease term Leber congenital amaurosis and while you enter the term a suggestion will pop up.
Choose the very first term and hit enter to start the filter process. One variant remains. It is a homozygous missense mutation on chromosome 1 at position 10,032,156 in the gene NMNAT1.
Clicking on the variant row, additional details are shown at the bottom. You will find more detailed information on the variant, like the coverage, SIFT, PhyloP and other conservation or protein effect scores. As you can see in the Sample details table the position of the NMNAT1 mutation is well covered in all three samples (>30 reads), evolutionary conserved (PhyloP > 0) and predicted as damaging (SIFT < 0.05).
Clicking on the Transcript effects tab, you get additional transcript information, including the position of the amino acid change in the protein sequence.
The tab Literature diseases lists all disease associations of that particular gene, including descriptions and links to Genomatix LitInspector [5,6] results. There you can find all publications where the gene disease association has been found.
We can also get a list of all publications where the gene has been linked to a given disease. By moving the mouse over the row with Leber Congenital Amaurosis a small link called Review appears. From there you can jump directly to the evidence listing from LitInspector.
As of October 2014, there are 10 different publications containing both, NMNAT1 and Leber congenital amaurosis, in the same abstract. Scroll down to the last entry.
Back in our variants view we can take a look at our current filter definition shown in the left sidebar.
Another convenient tool in the sidebar is the filter history. It lists in reverse chronological order the filter steps we have performed so far up until we filtered down the variants to the single one.
On the left side of the main table you will find a Report generator tab. It allows for generating reports for up to 10 filtered variants. Type in a title and hit the Generate button and wait for the PDF file.
In our example, the daughter is also affected with Deafness. Enter the disease term Congenital deafness into the filter field of Literature diseases. From the popup list of suggested terms we select the only term that comes up.
Again, one variant remains in the list. It is a homozygous nonsense mutation in the gene GJB2 which has also been described in the publication.
Let's try a completely different strategy where we are mainly interested to see if we have variants that had been discovered already by other researchers. To restart our filters, you can use the Reset button on the left sidebar.
We are going select different columns from the optional column settings. Please select all 4 columns from the section Clinical and diagnostic annotation and the column Diff. between groups from the section Comparison summary.
The Clinical significance can be used to check whether there exists any annotation in ClinVar  at the genomic positions of our variants regardless if the actual variants in our tables are synonymous, missense or any other kind of effect category.
After applying this filter the list of variants shrinks to merely 74. This filter depends vastly on the content of ClinVar which is steadily increasing and if one of our variants had not been reported in ClinVar we would have missed it at that point. Nevertheless it is a valid strategy to quickly overlap for known variants in the ClinVar set.
As second filter setting we make sure that our affected sample has to have a different genotype that the unaffected parents. We set 1 as number of Diff. between groups which is a very general column to filter for the number of samples that are at different between both groups.
We are now at 14 variants which is already a feasible number of variants to go through individually. Adding the genotype filter Homozygous for the affected sample IV-1 reduces the list further to 4 variants. This list still includes both previously indicated variants p.Val9Met in NMNAT1 and p.Trp24* in GJB2.
Further inspection gives additional valuable information. GJB2 has more than 90 diagnostic tests available in Genetic Testing Registry (GTR) . The majority of tests relate to some type of deafness.
Common practice is to examine variants in context of the underlying alignments. A sample can be associated with a BAM file containing all the alignments. To directly access the Genome Browser from the variant list just move the mouse over a variant row and the hit the Browse button.
The associated BAM files are automatically loaded in the browser.
Imported samples and generated comparison analyses can be administered in the Result Management. It can be opened through the main menu:
There are 3 sections available in the result interface:
The section Variant Annotation contains all the imported samples, the section Sample Comparison lists all your generated comparison analyses and in the section BAM files you find associated BAM files in case you have uploaded any. You can also directly open either a sample or a comparison by clicking on the title. The Result Management makes it possible to rename the sample or comparison and to edit the comment or description for the result so that it's easier to find them later.
The button with the X lets you delete any sample or comparison analysis from the server so you won't be charged any credits for the retention fee. Please remember that comparison analyses can be generated as many as you want at no cost and will be stored for free for 30 days. After 30 days the retention will automatically be deducted. If you don't like to keep those data you can easily delete them here. The user menu right next to the main menu allows to check your current credit balance and review the transaction history.
That's it! You're done with the GeneGrid Tutorial. Thank you for following along all the way. Hopefully you feel familiar enough now to run your own analyses on the system. For a more detailed description on any of the columns, annotation or functions in GeneGrid please refer to the manual which is accessible via the main menu on the top of every page.
Was this tutorial helpful to you? Did you encounter any problems while going through the tutorial? Any topic for which you'd like to have another tutorial? For any questions or comments, you're most welcome to contact us.