How Deep should I Sequence?
Kristine Angevine, PhD, Product Manager I on November 18, 2016
With the launch of ThruPLEX® Tag-seq, there are questions about how deep you need to sequence to detect the Minor Allele Frequency (MAF) of interest. In order to determine the detectable MAF there are several factors to consider, including input amount, depth of sequencing and capture panel size. ThruPLEX Tag-seq provides confident MAF detection by including 16 million unique molecular tags (UMTs) to label each DNA molecule. Using the UMTs, bioinformatics software groups the duplicates into amplification families and constructs a consensus sequence, thus reducing false positives.
First, input amount plays a critical role in MAF. An appropriate input amount should be selected to ensure an adequate number of copies of the variant in question are present for detection. Below is a table indicating the input amount, total haploid genome copies and total variant copies available for library preparation at various allele frequencies. Note that the number of copies available for detection will be lower than shown in the table as there is loss during the library preparation and enrichment process.
|Estimated Genome Copies Available for Library Preparation|
|Input Amount||Total Haploid Genome Copies*||Total Variant Copies at Indicated Allele Frequency|
|*Calculated using 3 pg as the mass of a haploid genome. The genomic complexity of plasma samples is highly variable. All numbers rounded down to nearest whole number.|
Another factor that affects detection sensitivity is sequencing depth. Generally, to detect lower MAFs, a greater amount of sequencing is required. ThruPLEX Tag-seq uses UMTs to bioinformatically group duplicates into amplification families. An amplification family size of 8 – 10 reads is recommended for maximum specificity, but can be changed based on experimental needs (1). In order to estimate the amount of sequencing needed to detect the MAF desired, the number of unique molecules required to make a variant call must be determined. For example, for an allele frequency of 1%, if 3 unique molecules are required to make a variant call and each amplification family has approximately 10 reads, then you would need to sequence to roughly 3,000x coverage. The chart and equation below can be used as a reference.
Sequencing Depth= (number of unique variants required to make a variant call/allele frequency)*approximate number of reads in each amplification family
Example: (3/0.01)*10= 3,000X Coverage Required
|Estimated Mean Raw Sequencing Depth Required*|
|Minimum number of Unique Molecules to Make a Variant Call||Allele Frequency|
|*Raw sequencing depth includes all reads prior to removal of duplicates. This is calculated using a target peak amplification family size of 10 reads per unique molecule.|
One way to decrease the amount of sequencing needed is to perform target enrichment using hybrid capture. Targeted panels enrich the genes of interest which decreases the total amount of bases needing coverage. For example, if 600x coverage is required and the target panel size is 5 Mb with an on target fraction of .50 (50%) and sequence read length will total 300 (2×150 bp), roughly 20 million reads would be required. To estimate the number of reads required for each sample, the following equation is used:
Millions of Reads required = (coverage*target panel size)/ (read length*on-target fraction)
Example: (600* 5)/ (300* 0.50) = 20 Million Reads Required
In summary, input amount, depth of sequencing and capture panel size are all factors to consider when determining MAF. These are just guidelines and it is important to note that other factors like sample quality and data processing algorithms used can also play a role. If you still have questions regarding sequencing depth, please reach out to customer support at firstname.lastname@example.org.
1 Scott R Kennedy et al. Nature Protocols 9, 2586–2606 (2014). doi:10.1038/nprot.2014.170 Published online 09 October 2014. Corrected online 22 October 2014.