Frequently Asked Questions

General

Which genomes are supported?

Reference genome of homo sapiens for GRCh37 and GRCh38 are supported.

What variant types are supported?

GeneGrid annotates SNVs (single-nucleotide variants) and small indels (insertion and deletions).

Which input format is supported?

The default input is VCF (Variant Call Format). This is the format used by the 1000 Genomes Project and is a standard format that can be also used by other tools and software packages. For details, see Requirements.

Which VCF format is supported and what fields and meta-information will be imported?

The Version 4.1 and Version 4.2 of the VCF format are supported. The following fields are imported into GeneGrid: CHROM, POS, REF, ALT, QUAL and FILTER.

Please note that the genotype information is required for all samples and for each position. Thus, for each sample at least the genotype (GT) in the genotype field must be given. The genotype quality (GQ), the read depth per sample (DP), and the allelic read depths (AD) will also be imported, but are not mandatory. The presence of these fields depend on the variant caller software. For details, see Requirements.

Are there any limitations in terms of samples and variants?

In principle, you can load as many samples as you like into GeneGrid. In terms of the variants, GeneGrid accepts VCF files with up to 10,000,000 variants and 20 samples per VCF file. For details, see Limitations.

How long does it take to import a VCF file?

This depends on the number of variants and samples in your VCF file. If you upload a WGS with millions of variants, the annotation and upload can take several hours. If you upload a targeted sequencing dataset you should have your results within minutes.

Variant Annotation

What does an empty column entry mean?

Empty column entries occur in all cases when the source of the respective column does not provide information on the current variant position. For instance, an empty column entry in the gAF column indicates that for this position no frequencies are known, which in turn indicates that no variants have been called at this position in the 1000 Genomes Project population.

What does a Quality of 999 mean?

The quality scores depend on the Variant caller used to generate the VCF files. In case of using the variant detection pipeline of SAMtools, the quality scores are encoded in phred-like format. The quality score is given as an integer which is -10 log 10 (p-value). These values are integers and are limited to a maximum of 999. Thus, a quality of 999 means that the variant quality is very good.

What does the GQ value mean?

This is the genotype quality score that is encoded in phred-like format. The quality score is given as -10 log 10 (1-Pr(called genotype)). These values are integers and are limited to a maximum of 999.

When is a compound heterozygous displayed?

Compound heterozygous are automatically determined when a comparative Trio study is started. The prerequisite for their calculation is that one sample (the offspring) is assigned to case and two samples (mother and father) to control. A compound heterozygous candidate is detected when there are two variants in the same gene that are both deleterious and the combination of both variants does not occur in any of the control samples. One variant can be a partner in many compound heterozygous cases, which can lead to a large number of compound heterozygous annotations within one gene (e.g. often the case for TTN). Since we do not want to list all possible combinations we annotate the individual variants as compound heterozygous. This can lead to cases in which after filtering for one gene only one variant is annotated as compound heterozygous, however, in all such cases removing the filters will reveal at least a second variant with the compound heterozygous annotation.

Why do the population allele frequencies frequencies differ between GeneGrid and dbSNP?

Population allele frequencies can be calculated from different background populations (e.g. HapMap, ESP), depending on the choice of these populations MAF scores can deviate. GeneGrid uses the alternative allele frequencies directly from the 1000 Genomes Project (gAF) and the NHLBI GO Exome Sequencing Project (espAF).

Why do I see the same variant multiple times in different rows?

In GeneGrid, there is one row per variant and gene. If a variant affects different genes, it is displayed several times. The effect of a single variant that overlaps multiple genes depends strongly on the gene and its underlying transcripts. Therefore we list such variants separately for each gene.

Why does the total number of variants in the GeneGrid table not match with the number of variants in my VCF file?

Entries from a VCF file might be skipped for different reasons, e.g.:

  • The reference allele given in the VCF file does not comply with the reference sequence (from the NCBI build)
  • The variant reference and/or alternative alleles are too long (currently, the limit is 200bp)
  • The coverage for a variant in a sample is given as 0 (zero)

In any case, a warning message is printed if a variant is skipped.

What exactly is a Reference call? How is this different from an Unknown call?

GeneGrid makes a strong distinction between an no call (unknown) and a confident homozygous reference call. If GeneGrid has no or low quality information on the genotype of a certain sample, an no call will be displayed. This mostly occurs when variants from different samples have been called independently. For variants that have been called on multiple samples at the same time (e.g. SAMtools and GATK have this option), a reference call (0/0 in the GT field of the VCF file) can be determined.

Why does GeneGrid report a synonymous and a non-synonymous effect on the same gene?

In human most genes have multiple alternative transcripts. A variant might affect different transcripts in different ways, as a result of different reading frames.

Why does GeneGrid report a variant with 0 coverage as a homozygous or heterozygous variant?

This happens if variants have been predicted by variant calling tools that allow for multisample variant calling (such as SAMtools). For example, using multiple samples, earlier versions of SAMtools sometimes call a genotype as a heterozygous or homozygous variant even though the coverage is given as 0. Coverage in this case means the number of high-quality bases and can be lower than the actual read depth because low quality bases have been filtered out. To avoid importing such calls in general, we recommend to set the minimum coverage at the pre-filter setting at least to 1 which is also the default setting.

Why does GeneGrid report c.C200T when my input is G to A change

The c.C200T is a cDNA (actually, mRNA) level change. VCF input (G to A) has to be in the forward strand, and if the transcript is in the reverse strand, there will be a C to T change in the mRNA.

Why do some missense and nonsense variants not have SIFT or other functional prediction scores?

There are multiple gene definition systems (such as RefSeq, Ensembl, GENCODE, GenBank, etc), and each of them have multiple versions. When people calculate SIFT or other predicition scores, it is typically based on a specific gene definition system and a specific gene build version. For example, GeneGrid uses SIFT scores based on Ensembl transcripts. Additionally, some genes won't have these functional prediction scores just because the gene does not have any homologues. So it is very normal that some variants do not have the scores in one specific build of the database that GeneGrid provides.

Which gene annotations are supported?

  • intergenic
  • promoter: sequence 500/100 bp up/downstream of the TSS
  • CDS: coding sequence of a transcript
  • 5'UTR
  • 3'UTR
  • exon
  • intron
  • splice site: donor and acceptor
  • splice region: region surrounding the splice site (3 bp in the exon and 8 bp in the intron)

How can I filter for multiple genes at once?

It is possible to define mutliple rules for a single filter column, see combined filtering. This allows to filter for up to 10 genes with one single filter. Additionally, some popular larger gene panels are already pre-defined and included in GeneGrid under the filter column popular gene panels. This includes for instance the list of 56 genes from the ACMG Incidental Findings recommendations.

Sample Comparison

How many samples can be compared?

The comaprison analysis is limited to ten samples.

How does GeneGrid compare samples in the Cancer mode?

The samples are compared pairwise to each other. In other words the first sample in the affected group is compared to the first sample in the non-affected group, the second affected sample is compared to the second non-affected sample.

How can I compare samples from two or more siblings?

At this moment a trio study is limited to one affected child. In order to compare samples from two or more siblings we recommend to use other as study type. Filtering is possible with the optional columns Diff. between groups and Diff. in case group. For details, see Sample Comparison.

What does the Other option mean?

Everything that does not fit into one of the two study types trio and cancer can be analyzed as other study which performs a general comparison without any assumptions about the samples. In this case it is still possible to get valuable information about how many samples differ between the two groups (Diff. between groups) and even in the case group itself (Diff. in case group). However, study-specific columns like somatic mutation or compound heterozygosity that are available for the other study types are omitted here. For details, see Sample Comparison.

Genome Browser

Can I view my VCF file in the Genome Browser?

Yes, the VCF file will be loaded automatically when you start the Genome Browser from the result table with the variants.

Can I add a BAM file to my samples?

Yes, each activated sample can be associated with an additional BAM file for viewing the alignments in the genome browser.

How do I upload my BAM file?

The BAM file needs to be associated with the sample, so that it can be loaded automatically when you switch to the Genome Browser. This can be done on the samples page where you select the sample in the same way as creating a comparison analysis. For details, see BAM upload

Report Integration

Do you offer an API?

Yes, we have a SOAP API that allows to fetch all the data that are needed for reports from a filtered down list of variants. In fact, it is the same API that we use internally for the variant reports that you can generate in GeneGrid. The feature is available to the public through a early access program. If you are interested to get access, please get in contact with us. The WSDL files can be found here: Sample and Comparison.

Pricing

How much will my analysis cost me?

GeneGrid follows the pay-per-sample model. Cost depends on the number of results you import (samples) or generate (comparisons) and how long they are stored in the system. From the Pricing overview you can see how much the annotation of variants from a targeted, exome of whole genome sequencing project cost.

What else is included in the storage fee?

In addition to the data storage of the samples and comparisons, the monthly storage fee also includes the reanalysis of variants when annotation sources are updated. This helps to apply the latest available annotation data at hand for the assessment of pathogenicity and also provides a consistent view across all your samples at any time. For details, see Continuous Annotation.

Does a BAM file add extra cost?

No, one BAM file is already included for every activated sample.

Support

Can I submit feature requests?

Absolutely, we love input from users, please submit your requests or feedback using the contact form. The GeneGrid team regularly reviews the ideas and incorporates them into future product planning and discussions.

Where can I get more information on GeneGrid?

For a more detailed description on any of the columns, annotation or functions in GeneGrid please refer to the manual. Another great source to get started is the tutorial. For any questions or comments, you're most welcome to contact us here.