2 research outputs found
A common flanking variant is associated with enhanced stability of the FGF14-SCA27B repeat locus
The factors driving or preventing pathological expansion of tandem repeats remain largely unknown. Here, we assessed the FGF14 (GAA)·(TTC) repeat locus in 2,530 individuals by long-read and Sanger sequencing and identified a common 5'-flanking variant in 70.34% of alleles analyzed (3,463/4,923) that represents the phylogenetically ancestral allele and is present on all major haplotypes. This common sequence variation is present nearly exclusively on nonpathogenic alleles with fewer than 30 GAA-pure triplets and is associated with enhanced stability of the repeat locus upon intergenerational transmission and increased Fiber-seq chromatin accessibility
Recommended from our members
Detailed tandem repeat allele profiling in 1,027 long-read genomes reveals genome-wide patterns of pathogenicity
Tandem repeats are a highly polymorphic class of genomic variation that play causal roles in rare diseases but are notoriously difficult to sequence using short-read techniques1,2. Most previous studies profiling tandem repeats genome-wide have reduced the description of each locus to the singular value of the length of the entire repetitive locus3,4. Here we introduce a comprehensive database of 3.6 billion tandem repeat allele sequences from over one thousand individuals using HiFi long-read sequencing. We show that the previously identified pathogenic loci are among the most variable tandem repeat loci in the genome, when incorporating nucleotide resolution sequence content to measure the longest pure motif segment. More broadly, we introduce a novel measure, ‘tandem repeat constraint’, that assists in distinguishing potentially pathogenic from benign loci. Our approach of measuring variation as ‘the length of the longest pure segment’ successfully prioritizes pathogenic repeats within their previously published linkage regions. We also present evidence for two novel pathogenic repeat expansion candidates. In summary, this analysis significantly clarifies the potential for short tandem repeat pathogenicity at over 1.7 million tandem repeat loci and will aid the identification of disease-causing repeat expansions
