VCU Bioinformatics and Bioengineering Summer Institute
Virginia Commonwealth University
imageimageHomeBio What?The InstituteThe People
The Institute
Goals of the Institute
Two-year Plan
Course web pages
News
Archives
Application process
About the BBSI

Research Simulation Scenario
Position-specific scoring matrices
to search for repeated sequences

Multiple insertons of 24-bp repeat in genome of Nostoc
... lots of 24-bp sequences!

At first glance, it looks like there are 25 nearly identical copies of the sequence in the genome. At second glance, however, you realize there may be many more. Blast pulled up many partial copies, copies that matched in, for example, 19 out of 24 positions. You wish you could see the remaining positions to see how close they were to matching.
Query:       1 gtacagacgtgattaatcgcgtct
               ||||||||| ||||||||||||||
Sbjct: 4514834 gtacagacgcgattaatcgcgtct

Score = 40.1 bits (20), Expect = 7e-05
Identities = 23/24 (95%)
Strand = Plus / Minus

Query:       1 gtacagacgtgattaatcgcgtct
               ||||||||| ||||||||||||||
Sbjct: 5930050 gtacagacgcgattaatcgcgtct

(23 more)

In fact, you're troubled by the fact that all the partial matches are truncations. Why aren't there partial matches where the discrepancies are dispersed over the 24-bp sequence? With this in mind, you wonder if there may be many more nearly identical matches besides the 25 found by Blast.

What's a better way of identifying families of similar sequences?

Back to main Scenario page      back one page