Reference genome of homo sapiens for GRCh37 and GRCh38 are supported.
GeneGrid annotates SNVs (single-nucleotide variants) and small indels (insertion and deletions).
The default input is VCF (Variant Call Format). This is the format used by the 1000 Genomes Project and is a standard format that can be also used by other tools and software packages. For details, see Requirements.
Please note that the genotype information is required for all samples and for each position. Thus, for each sample at least the genotype (GT) in the genotype field must be given. The genotype quality (GQ), the read depth per sample (DP), and the allelic read depths (AD) will also be imported, but are not mandatory. The presence of these fields depend on the variant caller software. For details, see Requirements.
In principle, you can load as many samples as you like into GeneGrid. In terms of the variants, GeneGrid accepts VCF files with up to 10,000,000 variants and 20 samples per VCF file. For details, see Limitations.
This depends on the number of variants and samples in your VCF file. If you upload a WGS with millions of variants, the annotation and upload can take several hours. If you upload a targeted sequencing dataset you should have your results within minutes.
Empty column entries occur in all cases when the source of the respective column does not provide information on the current variant position. For instance, an empty column entry in the gAF column indicates that for this position no frequencies are known, which in turn indicates that no variants have been called at this position in the 1000 Genomes Project population.
The quality scores depend on the Variant caller used to generate the VCF files. In case of using the variant detection pipeline of SAMtools, the quality scores are encoded in phred-like format. The quality score is given as an integer which is -10 log 10 (p-value). These values are integers and are limited to a maximum of 999. Thus, a quality of 999 means that the variant quality is very good.
This is the genotype quality score that is encoded in phred-like format. The quality score is given as -10 log 10 (1-Pr(called genotype)). These values are integers and are limited to a maximum of 999.
Compound heterozygous are automatically determined when a comparative Trio study is started. The prerequisite for their calculation is that one sample (the offspring) is assigned to case and two samples (mother and father) to control. A compound heterozygous candidate is detected when there are two variants in the same gene that are both deleterious and the combination of both variants does not occur in any of the control samples. One variant can be a partner in many compound heterozygous cases, which can lead to a large number of compound heterozygous annotations within one gene (e.g. often the case for TTN). Since we do not want to list all possible combinations we annotate the individual variants as compound heterozygous. This can lead to cases in which after filtering for one gene only one variant is annotated as compound heterozygous, however, in all such cases removing the filters will reveal at least a second variant with the compound heterozygous annotation.
Population allele frequencies can be calculated from different background populations (e.g. HapMap, ESP), depending on the choice of these populations MAF scores can deviate. GeneGrid uses the alternative allele frequencies directly from the 1000 Genomes Project (gAF) and the NHLBI GO Exome Sequencing Project (espAF).
In GeneGrid, there is one row per variant and gene. If a variant affects different genes, it is displayed several times. The effect of a single variant that overlaps multiple genes depends strongly on the gene and its underlying transcripts. Therefore we list such variants separately for each gene.
Entries from a VCF file might be skipped for different reasons, e.g.:
In any case, a warning message is printed if a variant is skipped.
GeneGrid makes a strong distinction between an no call (unknown) and a confident homozygous reference call. If GeneGrid has no or low quality information on the genotype of a certain sample, an no call will be displayed. This mostly occurs when variants from different samples have been called independently. For variants that have been called on multiple samples at the same time (e.g. SAMtools and GATK have this option), a reference call (0/0 in the GT field of the VCF file) can be determined.
In human most genes have multiple alternative transcripts. A variant might affect different transcripts in different ways, as a result of different reading frames.
This happens if variants have been predicted by variant calling tools that allow for multisample variant calling (such as SAMtools). For example, using multiple samples, earlier versions of SAMtools sometimes call a genotype as a heterozygous or homozygous variant even though the coverage is given as 0. Coverage in this case means the number of high-quality bases and can be lower than the actual read depth because low quality bases have been filtered out. To avoid importing such calls in general, we recommend to set the minimum coverage at the pre-filter setting at least to 1 which is also the default setting.
The c.C200T is a cDNA (actually, mRNA) level change. VCF input (G to A) has to be in the forward strand, and if the transcript is in the reverse strand, there will be a C to T change in the mRNA.
There are multiple gene definition systems (such as RefSeq, Ensembl, GENCODE, GenBank, etc), and each of them have multiple versions. When people calculate SIFT or other predicition scores, it is typically based on a specific gene definition system and a specific gene build version. For example, GeneGrid uses SIFT scores based on Ensembl transcripts. Additionally, some genes won't have these functional prediction scores just because the gene does not have any homologues. So it is very normal that some variants do not have the scores in one specific build of the database that GeneGrid provides.
It is possible to define mutliple rules for a single filter column, see combined filtering. This allows to filter for up to 10 genes with one single filter. Additionally, some popular larger gene panels are already pre-defined and included in GeneGrid under the filter column popular gene panels. This includes for instance the list of 56 genes from the ACMG Incidental Findings recommendations.
The comaprison analysis is limited to ten samples.
The samples are compared pairwise to each other. In other words the first sample in the affected group is compared to the first sample in the non-affected group, the second affected sample is compared to the second non-affected sample.
At this moment a trio study is limited to one affected child. In order to compare samples from two or more siblings we recommend to use other as study type. Filtering is possible with the optional columns Diff. between groups and Diff. in case group. For details, see Sample Comparison.
Everything that does not fit into one of the two study types trio and cancer can be analyzed as other study which performs a general comparison without any assumptions about the samples. In this case it is still possible to get valuable information about how many samples differ between the two groups (Diff. between groups) and even in the case group itself (Diff. in case group). However, study-specific columns like somatic mutation or compound heterozygosity that are available for the other study types are omitted here. For details, see Sample Comparison.
Yes, the VCF file will be loaded automatically when you start the Genome Browser from the result table with the variants.
Yes, each activated sample can be associated with an additional BAM file for viewing the alignments in the genome browser.
The BAM file needs to be associated with the sample, so that it can be loaded automatically when you switch to the Genome Browser. This can be done on the samples page where you select the sample in the same way as creating a comparison analysis. For details, see BAM upload
Yes, we have a SOAP API that allows to fetch all the data that are needed for reports from a filtered down list of variants. In fact, it is the same API that we use internally for the variant reports that you can generate in GeneGrid. The feature is available to the public through a early access program. If you are interested to get access, please get in contact with us. The WSDL files can be found here: Sample and Comparison.
GeneGrid follows the pay-per-sample model. Cost depends on the number of results you import (samples) or generate (comparisons) and how long they are stored in the system. From the Pricing overview you can see how much the annotation of variants from a targeted, exome of whole genome sequencing project cost.
In addition to the data storage of the samples and comparisons, the monthly storage fee also includes the reanalysis of variants when annotation sources are updated. This helps to apply the latest available annotation data at hand for the assessment of pathogenicity and also provides a consistent view across all your samples at any time. For details, see Continuous Annotation.
No, one BAM file is already included for every activated sample.
Absolutely, we love input from users, please submit your requests or feedback using the contact form. The GeneGrid team regularly reviews the ideas and incorporates them into future product planning and discussions.
For a more detailed description on any of the columns, annotation or functions in GeneGrid please refer to the manual. Another great source to get started is the tutorial. For any questions or comments, you're most welcome to contact us here.