Alignment of Sequences

2014-08-29

azim58 - Alignment of Sequences


Some Sequence Alignment programs

Some Sequence Alignment Visualization Programs

I would like to find some free sequence alignment visualization software which has a nice overview graphic


Nice powerpoint that has some information about scoring the alignment of sequences


Sequences can be aligned using Clustal Omega found here:


Analyzing alignment results
When I align sequences, I find that it can often be difficult to navigate
to a specific desired location of the alignment. There are two reasons
this is difficult. The first reason is that with the default output
format of Clustal Omega, the sequence spans over many lines (carriage
returns). Therefore if you try to search for
"AATTTCCGGCA"
you may not find it because the sequence may span many lines so that it
would be something like this
"AATTTC
CGGCA"

This issue can be resolved by setting the output format of clustal to
Vienna. This format puts the whole sequence on one line which I find much
easier to deal with.

The second reason that navigating to specific locations of the alignment
can be difficult is due to the fact that a particular sequence could be
separated by any number of "-" to indicate that the sequence must be
spaced to align with another sequence. For example the desired sequence of
"AATTTCCGGCA"
could be in a form like this
"AATTTCC--------GGCA"
This problem can be circumvented by searching for the sequence you want
to locate with a regular expression that will match regardless of the
position or number of "-". To use regular expressions use something like
the website regexr (http://www.gskinner.com/RegExr/) or Notepad 2. Note
that Notepad 2 does not support "grouping", therefore something like
regexr must be used to create at least the initial regular expression
from the original sequence you want to search for.
For example, paste the sequence (quotations should be omitted)
"AATTTCCGGCA"
into regexr, switch to the find and replace mode, enter "(\w)" (without
the quotations) into the find box, and "$1\-*" into the replace box. The
"(\w)" turns any word character "\w" into a a group indicated with the
parenthesis "()". In the replace box the "$1" calls these groups back
(there was only one group defined so that's why "1" is used). The "\-"
indicates that a "-" should be placed after the group (the "\" is used to
escape the "-" so that the regex machine does not interpret the "-" to
mean that a character class should be spanned). The "*" indicates that
the previous token (the "-") can occur 0 or more times to yield a match.
So the final output of the replace function is a string which can be used
as a regular expression for another string. The output looks like this
A\-*A\-*T\-*T\-*T\-*C\-*C\-*G\-*G\-*C\-*A\-*

Now the output in the Vienna format from the clustal program can be
pasted into Notepad. I'll paste
"AATTTCC--------GGCA"
I will then press Ctrl+H to bring up the find and replace dialog. I will
check the "Regular expression search" box to turn it on. I will then
enter my regular expression search string that we've created
"A\-*A\-*T\-*T\-*T\-*C\-*C\-*G\-*G\-*C\-*A\-*"
and hit Find Next. Notepad 2 then highlights my text, and I can now see
where the sequence I was interested in lies in the alignment output.


===========================================================================
email to Hojoon about online program that can do shotgun type alignment
7-30-13