Home
News
Feed
search engine
by
freefind
advanced
Work 092712
2015-01-13
azim58 - Work 092712 continued working on compareAllSequencesInFile1WithFile2 method Made 2 artificial comparison files "S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-26-12\test_compari son1.txt" "S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-26-12\test_compari son2.txt" Added a line from the 1st one to the 2nd one. gb|EGH42662.1| RNA methyltransferase TrmH, group 2 [Pseudomonas ... 18.0 19006 Alright the compareAllSequencesInFile1WithFile2 method seems to be working now. Now I just need to make sure that the blast result from each motif group gets compared with the others. Then I can see how many matches each input has. =========================================================================== I would also like to get my HEE sequence to match something from the database called this command blastp -db S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-26-12_database\nr -query x -word_size 2 -seg no -evalue 200000000000000 -comp_based_stats no -matrix pam30 -threshold 4 -num_descriptions 20000 -num_alignments 0 -out test_for_hee.txt x contains the sequence HEEX Searching this way did yield a list of matches and some of them did contain SMC related matches. searching this way also yielded a list of matches with some SMC related matches for PMRE as well. These test search results can be found here C:\kurt\storage\CIM Research Folder\DR\2012\9-27-12\test_blast =========================================================================== When I search for matches lower than the evalue I don't think numbers like 3e-2 are getting counted so I'll need to fix this in the need_to_determine_number_of_matches section The match counting code appears to work. Now I can just add this match information to the table as well. Now I would like to rank the items in a manner so that the items with the highest matches and lowest e-values are ranked the highest. Actually, I can basically do this simply by sorting the numbers in excel. The only other features that I wanted to add to my program involve better logging, and bepipred functionality. I don't think either of these things will be terribly difficult to implement. Finished writing logging functions. Started writing bepipred_handler class. When I tried to use java to ssh I go the following message Pseudo-terminal will not be allocated because stdin is not a terminal. ssh for java http://stackoverflow.com/questions/2514439/how-to-run-ssh-commands-on-remot e-system-through-java-program I'm getting the following error cannot make a static reference to the non-static method exec(String) from the type Runtime I'll forget the bepipred stuff for now. Now I'll try to run the program from scratch from here S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-28-12\mpa I should modify the log file for the comparison of the blast results so that it states which two are being compared out of how many. I added this feature. For some reason the FSA files are not being created properly for some sequences. I suspect this is a sequence and regex problem. What is the command to see how many files are in a directory? ls -1 | wc -l Now I'll look into the fsa file issue. Here's a file that was not found blast_res_motif_group_0_3i_b_blast_res_motif_group_1_15980i.txt blast_res_motif_group_0_3i.fsa was created The file for blast_res_motif_group_1_15980i was not made instead this file was made console_blast_res_motif_group_1_15980i.fsa.txt The text in this file shows that the blast did not work (the blast help commands are listed and everything). There is also this message Error: Too many positional arguments (1), the offending value: Chain Now I need to find out what this line was in the original blast result document. I'm not sure which line in the blast result document that 15980 refers to. I would think it would refer to either line 15980 or 15980+31(header part of document) = 16011 + or - 1 number for each of these possibilities. None of these entries contain the word "Chain" though. Line 16021 does contain the word chain. Why would it be 10 off? I see why it is off. The regex I used expects input that has a space after the "|...|" but the lines with the word chain don't have a space after the | and so they were not added. I'll need to fix this. I think changing the regex from this (.+?)\|(.+?)\|\s\s+?(.+?)\s\s+(.+?)\s\s+(.+) to this (.+?)\|(.+?)\|.*?\s\s+?(.+?)\s\s+(.+?)\s\s+(.+) should work How long approximately will my program take to compare to 20,000 line blast result files? On Saturday at about 11:39am there were 78484 comparisons performed. The program started on Friday at 11:44 am so let's say that 24 hours passed 20000*20000 = 400,000,000 comparisons need to be made. How many hours will this take? 24/78484=y/400000000 This will take about 122,318 hours. This will take 5097 days or 14 years. A little long haha Okay now I can take a look at getting ssh and bepipred working ssh with java =========================================================================== Actually spent time trying to get the blast to work. Wanted to blast an accession against 20,000 accessions, but this doesn't seem to be working. It looks like I may need to retrieve the sequences. When I blasted one retrieved sequences against the other approximately 20,000 retrieved sequences all in one fast file this gave me the result I wanted. It looks like the program will take approximately 5 days to compare all of the sequences one at a time in file 1 with all of the sequences at once in file 2. I think this is fairly reasonable. Much better than the 11-16 year time period it was going to take before. I would like to start cleaning up my code a little bit. I have 3 versions of the comparison file method, but I think I can get rid of all but 1 and store the others somewhere else. I would also like to make it so that certain files are not created unless they need to be.I think I'll start on this another time.
azim58wiki: