work to finish integrating my fast code with the automation code to analyze many samples 10-9-13
2014-08-29++ work to finish integrating my fast code with the automation code to analyze many samples 10-9-13
^Notes
10-7-13
I need a faster way to extract data from large files.
I'll try using the java library opencsv.
Here's an initial test file:
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\csv testing\summary_numbers_1005131627.csv"
Here's a nice page describing how to use opencsv
http://viralpatel.net/blogs/java-read-write-csv-file/
- Here's another useful page
- -http://www.simplecodestuffs.com/read-write-csv-file-in-java-using-opencsv-library/
Here is a tab delimited text file for testing.
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\csv testing\summary_numbers_100513d1627.txt"
If I read all the values line by line, how long does it take to parse a whole 330k file csv file?
- test 330k file
- It takes 4 seconds for opencsv to go through and read all of the lines of a 330k file.
How long would it take to retain all the values from one column and store them in an int array?
- First I want to make a new csv with no header and just data to test this.
- -"F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\330k_gpr_test\oneu330kutest_no_header.gpr"
- -"F532 Median is in column 8
- -time takes about 5 seconds. That's reasonable.
How long does it take to find the F532 Median column, and then store all of the data in an int_array? Should be fast I think.
- This took 6 seconds. I'll just check the mean to make sure that everything is working, and then I can integrate this opencsv code into my normal program.
- The average I got was 2511.6 (which is correct) which took 6 seconds.
The code for this fast code experimentation can be found here:
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\SummaryNumberCodeLean"
Now I'm going to integrate this fast code into my previous code.
1557 Okay I've finished integrating the fast code with my previous code for automating the process of looking at many gpr files. Now I would like to do some testing to make sure that everything is working.
I'll start by testing looking at one gpr.
file here:
F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\one gpr
^The program finishes in 7 seconds
- Are all the values what they should be?
- -I compare them in this file
- --"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\one gpr\summary\match of summary numbers with known summary numbers 10-7-13d1621.xls"
- --All of the values match except for 95th percentile normalized and 5th percentile normalized. I bet I just reported the values wrong.
Now I would like to test a folder of gprs.
location of folder:
F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\folder of gprs
I would also like to test a table for non-normalized data.
- sample name from row 2 and column 1-2. data from row 3 and column 1-2.
- "F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\table of data non-normalized test\table_of_data.txt"
I would also like to test a table for normalized data.
- sample name from row 2 and column 5-6. data from row 3 and column 5-6
- "F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\table of data normalized test"
- Some values don't seem right while others do.
I'll use the amazon web server to get a computer powerful enough to manually calculate the numbers in excel to double check my program.
- instance: i-a5ad1f91
- password: hQq!8M7BNp
- ip: 54.200.144.159
- Instead I just did the calculations on my laptop.
10-8-13
12:53pm: calculations started on Samsung ATIV BOOK 6
1:31 pm: 11%
by 4:34pm the calculations were finished
As of 100813d1505 it looks like everything is now working (I'm still waiting a little bit on some normalized entropy values, but from what I have seen so far normalized entropy is the same as entropy which makes sense). I can now start using the program to look at some data. I should make sure to backup and make some documentation for the program as well.
Tiger has been asking me about his data so I'll get that to him first.
- -location of data: biofs.biodesign.asu.edu\CIM\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130501_FVBN time series
- -S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130501_FVBN time series
- -location of data: biofs.biodesign.asu.edu\CIM\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130517_FVBN time series_pilot2
- -S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130517_FVBN time series_pilot2
Output files here
"S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130501_FVBN time series\summary\table_of_summary_numbers.txt"
and
"S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130517_FVBN time series_pilot2\summary\table_of_summary_numbers.txt"
Okay I've emailed Tiger the location of those files. Now I can look at the 330k DTRA data as I have been planning to do.
Now I'd like to make sure that my program works with the commands in a jar file, and I can then back up the program and everything.
list of summary numbers calculated by entropy program as of 10-9-13
code was setup and stored here
F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013
Okay now I'll just test it and I'll be good to go. Just need to test 4 commands.
command 1
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_one_gpr "F:\\kurt\\storage\\CIM Research Folder\\DR\\2013\\10-9-13\\code test 10-7-13\\one gpr" "oneu330kutest.gpr" "F532 Median"
^This works!
command 2
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\folder of gprs" "F532 Median"
^This works!
command 3
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\table of data normalized test" "table_of_data.txt" 2 1 2 1 2 3
^This works!
command 4
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_normalized_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\table of data normalized test" "table_of_data.txt" 2 5 6 5 6 3
^This works!