Choosing your read depth

When planning your repertoire sequencing experiment, you’ll need to consider the appropriate read depth for your samples. Sequencing depth/coverage is impacted by the length of the reads, the sequencing platform, and whether you’re using single end or paired end sequencing.

More coverage is always better, but costs tend to increase with sequencing coverage. Additionally, pooling samples decreases the coverage for each individual sample, but reduces overall cost.  This page will help you determine the sequencing strategy with respect to read depth that is best suited to your project.

For more background information on next-generation sequencing (NGS) workflows, see our NGS overview post

Paired-end versus single-end

We recommend paired-end sequencing for immune repertoire applications. In paired-end sequencing, fragments are sequenced from both ends, usually at equal cycle lengths. For example, if a 500-cycle kit is used, each sample will be read 250 cycles from read one, and then 250 cycles from read two.

Paired-end reads facilitate sequence assembly and error correction. This information is useful for extending sequencing reads to cover the entire CDR3 region and for identifying V and J germline segments reliably. We also use the paired-end information to verify whether a CDR3 fragment is authentic.

Single-end sequencing may be necessary when the product length exceeds the number of cycles needed to read both directions equally. While libraries generated with iRepertoire’s primer systems are compatible with Illumina paired-end sequencing, some projects – such as custom Phage Display – may require single-end sequencing. You may also opt for single-end sequencing on your repertoire library in order to reduce the cost of sequencing, pool more samples onto a flow cell, or a combination of the two. See our Sequencing Page for details on which Illumina platforms can perform SER and/or PER.

Coverage and read estimation

We recommend planning your experiment so as to obtain 5-10 reads for each cell so that, theoretically, every cell will be sequenced according to the Poisson model. For instance, if your sample contains about 500,000 T cells, we recommend you allocate about 2.5 million reads for this sample.

To calculate the estimated read output (ERO), the average number of estimated sequencing reads after filtering is divided by the number of samples pooled (S).  

 Illumina NextSeq Mid-throughput lane:  1 x 10^8 / S = ERO

 Illumina MiSeq V2-500 cycle Flow Cell: 1 x 10^6 / S = ERO

For example, an Illumina NextSeq Mid-throughput lane with 40 samples pooled has an ERO of 2.5 million reads per sample. Due to variations in sequencing run parameters beyond the control of the operator, estimates may be higher or lower than actual output.

Read length

iRepertoire’s different amplification systems are designed to produce 100/150 or 250 bp paired end reads (PER), and all systems cover the CDR3 region. While long read primers cover more of the B cell receptor or T cell receptor transcript, a short read system may be acceptable for some applications and is generally more cost effective.

If V-gene usage is of interest, our software can typically call the V gene with both long and short read formats; however, the accuracy of the call for closely related V alleles increases with the long-read format. For researchers who are looking to recreate the receptor with tight binding specificity, having a nearly complete V region (provided by long reads) would be advantageous.

For more information about our specific amplification options, see our primer systems page.

Unique CDR3 discovery

If your goal is unique CDR3 discovery, there are other elements besides read depth that are important to consider, namely exhaustion of the reaction chemistry. Sequencing depth matters, but RNA input and enzyme activity are upstream limiting factors. Once these components have been accounted for, the depth of sequencing can improve discovery. For unique CDR3 discovery, the five major factors listed in importance are 1) the sample source; 2) RNA quality 3) RNA concentration; 4) reaction size; and 5) sequencing depth.

If the sample or the RNA source is expected to have a restricted repertoire, it is unnecessary to sequence very deeply. For example, if you have a hybridoma sample, you could have 2 million cells, but they are all the same clone—in which case it is unnecessary to sequence deeply.

If discovery of unique CDR3s is the goal and the input number of unique cells is high, it is recommended to double the reaction volume, and thus, double the RNA input. For FFPE specifically, there are typically only a few thousand (~5k) uCDR3s discovered from 1 curl of FFPE sample. To enable rare clone discovery, we recommend 500K to 1 million reads per FFPE sample.