work to finish integrating my fast code with the automation code to analyze many samples 10-9-13

2014-08-29

++ work to finish integrating my fast code with the automation code to analyze many samples 10-9-13

^Notes


10-7-13

I need a faster way to extract data from large files.

I'll try using the java library opencsv.

Here's an initial test file:
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\csv testing\summary_numbers_1005131627.csv"

Here's a nice page describing how to use opencsv
http://viralpatel.net/blogs/java-read-write-csv-file/

Here is a tab delimited text file for testing.
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\csv testing\summary_numbers_100513d1627.txt"

If I read all the values line by line, how long does it take to parse a whole 330k file csv file?
"F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\330k_gpr_test\oneu330kutest.gpr"


How long would it take to retain all the values from one column and store them in an int array?

How long does it take to find the F532 Median column, and then store all of the data in an int_array? Should be fast I think.


The code for this fast code experimentation can be found here:
"F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\SummaryNumberCodeLean"

Now I'm going to integrate this fast code into my previous code.



1557 Okay I've finished integrating the fast code with my previous code for automating the process of looking at many gpr files. Now I would like to do some testing to make sure that everything is working.

I'll start by testing looking at one gpr.
file here:
F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\one gpr
^The program finishes in 7 seconds
^okay now all of the values match.

Now I would like to test a folder of gprs.
location of folder:
F:\kurt\storage\CIM Research Folder\DR\2013\10-7-13\code test 10-7-13\folder of gprs


I would also like to test a table for non-normalized data.


I would also like to test a table for normalized data.


I'll use the amazon web server to get a computer powerful enough to manually calculate the numbers in excel to double check my program.
^This was still slow, so I just quit the job.

10-8-13
12:53pm: calculations started on Samsung ATIV BOOK 6
1:31 pm: 11%
by 4:34pm the calculations were finished


As of 100813d1505 it looks like everything is now working (I'm still waiting a little bit on some normalized entropy values, but from what I have seen so far normalized entropy is the same as entropy which makes sense). I can now start using the program to look at some data. I should make sure to backup and make some documentation for the program as well.


Tiger has been asking me about his data so I'll get that to him first.


and here

Output files here
"S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130501_FVBN time series\summary\table_of_summary_numbers.txt"
and
"S:\Research\DocInABox\People\Tiger\Entroy Study\FVBN 2013 Time series study\20130517_FVBN time series_pilot2\summary\table_of_summary_numbers.txt"


Okay I've emailed Tiger the location of those files. Now I can look at the 330k DTRA data as I have been planning to do.


Now I'd like to make sure that my program works with the commands in a jar file, and I can then back up the program and everything.

list of summary numbers calculated by entropy program as of 10-9-13

code was setup and stored here
F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013

Okay now I'll just test it and I'll be good to go. Just need to test 4 commands.

command 1
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_one_gpr "F:\\kurt\\storage\\CIM Research Folder\\DR\\2013\\10-9-13\\code test 10-7-13\\one gpr" "oneu330kutest.gpr" "F532 Median"
^This works!

command 2
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\folder of gprs" "F532 Median"
^This works!

command 3
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\table of data normalized test" "table_of_data.txt" 2 1 2 1 2 3
^This works!

command 4
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_normalized_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\code test 10-7-13\table of data normalized test" "table_of_data.txt" 2 5 6 5 6 3
^This works!