Work 092612

2015-01-13

azim58 - Work 092612


Another thing I need to figure out is how to retrieve a sequence from a
database with the accession number from the command line with blast.

Here is an example command that retrieves sequences from a database.
blastdbcmd -db nr.00 -entry ZP_01172638.1 -outfmt %f -out sequence.fsa

Now that I have all of the capabilities that I need, I can now continue
writing the Java code. I think I'll start with the blasting rather than
bepipred.

Created a Blastp_Handler class.

copied code over from glam2scan handler class. Deleted much of it
deleted these lines
console_output = system_command_string;
useful_tools.createTextFile(directory, glam2scan_output_filename,
system_command_string);

I then added them back in and wrote the output to a console file that I
could check later.

There was a problem executing the code because there cannot be a space in
a directory name. It looks like I have to escape the space with a
backslash.
http://forums.macrumors.com/showthread.php?t=291346

I tried escaping the space as well as putting quotes around the whole
path, but nothing seems to work. Maybe I will just move everything to a
directory without spaces.
Moved everything to this directory
S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-26-12
It appears to be working now.

Now blasting the most representative motif against the whole nr database
appears to work
Now I can see if there are any overlaps in the different blast result
lists from the different motif groups. These overlap proteins may be
proteins which the antibodies binding to these random peptides originally
bound to.

For testing purposes, I will have to create some artificial scenarios. In
reality with my testing I only had two motifs: PMRE, and HEE. However,
HEE does not align with anything in the nr database. Therefore, for
testing I will probably compare the blast results from PMRE and PQREGS
from another search that I did before.

Started making the compareAllSequencesInFile1WithFile2 method in
Blastp_Handler


===========================================================================
I found that if I blast PMRE from the BLAST website I get some SMC hits,
but I don't when I blast just the nr database. I wonder what database I
should be using.

What is the most comprehensive protein blast database?
..
The website blast uses these databases:
All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
database name was just nr though

I think I wasn't getting as many hits, because my search was just against
nr.00 and not against the other nrs up to nr.07 that can be downloaded.
Multiple databases can be specified in the command line simply by having
them separated by a space.
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node25.html
That site says that it is better to search for multiple databases with
aliases though.

What e value is considered a good match?
The default blast evalue is 10.
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&amp
;DOC_TYPE=FAQexpect

How do I add an item to the last line of a text file without rewriting
the whole text file in Java?
This has some information
http://stackoverflow.com/questions/4614227/how-to-add-a-new-line-of-text-to
doesn't look totally easy though.
Actually maybe it's not that bad. Here's another site
http://www.kodejava.org/examples/108.html


Almost all of the
compareAllSequencesInFile1WithFile2
method is written. I haven't tested it though since my program is trying
to blast against the whole nr database this time instead of just nr.00. I
might want to add some of the information to a text file by appending to
the end. I'd like to do that for the entire program as well as the
program makes a blog. I haven't done this yet though.