find a sequence that may be gapped with other characters

2013-12-01

azim58 - find a sequence that may be gapped with other characters


search for a special regex version of the sequence you want in a program
that can handle regex find and replex
For example, say you want to find the sequence
TTGATACCACTGCTT
in this subject sequence
GGGGTAC:GATGAGATTG:ATACCACTGCTTGGGATCCTCTAGAGTC

change this sequence
TTGATACCACTGCTT
into
T(:+)?T(:+)?G(:+)?A(:+)?T(:+)?A(:+)?C(:+)?C(:+)?A(:+)?C(:+)?T(:+)?G(:+)?C(:
+)?T(:+)?T(:+)?

using find
(\w)

and replace
$1(:+)?
or
$1\(:+\)?

Then search the subject sequence
GGGGTAC:GATGAGATTG:ATACCACTGCTTGGGATCCTCTAGAGTC
with the query sequence
T(:+)?T(:+)?G(:+)?A(:+)?T(:+)?A(:+)?C(:+)?C(:+)?A(:+)?C(:+)?T(:+)?G(:+)?C(:
+)?T(:+)?T(:+)?