work to make really lean fast code 10-7-13

2014-08-29

++ work to make really lean fast code 10-7-13

10k entropy calculations 10-3-13
{
Okay I think I can start to get all of my 10k data calculated.

The first thing I will start up is the FVBN_Time_Series data.  This is in the format of a folder of gprs.  All of the data uses F647 median.

/home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645
and
/home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645

I already started getting this prepared previously.

The name of the jar file used will be
EntropyOfArray100113d1557.jar
I will not use the Xmx command to specify 6GB of RAM for these 10k files.

I will also not specify westmereEP as the node.

Now I'll start the job.
job for /home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645
6205848.newmoab.local
started at 100313d1851
^expected time: 5min*24=2hr.  However, now the job has been going for 14hr 55 min.  I thought it was supposed to complete much faster than that.

Now I'll get the other time series going
job for /home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645
6205854.newmoab
started at 100313d1700
^I added the Westmere line to the pbs file back
#PBS -l nodes=1:westmereEP
^I ran out of memory so I guess I should have reserved more.

I'd like to test one single 10k gpr again.
/home/lwang138/entropy/one_10k_test_100413d0917
393308_Bot_WT2-3_37.4_20130501_SLOT10_S01_Red
-script files here
--/home/lwang138/kurt/one_10k_sample_100413d0919
job id
6205979.newmoab.local
started at 100413d0929
^Let's see how long this one 10k file takes
^I can't get this started because I can't reserve enough memory. .

I'd also like to see how long one 330k file takes.
6205980.newmoab.local
}

I want to see if I can analyze the data in excel in amazon web services.
Made a new instance
high_memory_2_100413d1510
i-a5ad1f91
keypair: azim58 keypair
password: hQq!8M7BNp
Public IP: 	54.200.40.69
Username: 	Administrator

Many of the jobs I had running on saguaro were ended by the computer administrator since my jobs were taking way too much RAM.
Here's the email about this issue.
https://mail.google.com/mail/u/0/?ui=2&shva=1#label/Computer+Stuff/1418549880bdf5c6

test_10-1-13.pem keypair file
"F:\kurt\storage\CIM Research Folder\DR\2013\10-4-13\test_10-1-13.pem"

I think I'll let AWS keep working on the files until tomorrow morning.  If nothing is done by then, I will stop the instance.
^Actually I think the program finished (100413d2008), but for some reason, I do not see an outputsummary file.  That's strange.  Well, I guess I can collect the files, and stop the program now.
^Or maybe the program did not finish, and "PowerShell" just let's me close the program before the last one is done.  The processes don't seem to be in use. . .  I'll go ahead and end the programs and collect my files since this is taking a lot of money.

Parallelizing a for loop
http://stackoverflow.com/questions/5686200/parallelizing-a-for-loop

Current aws charges in this little experimental test as of 100413d2036 (everything is stopped now so hopefully there aren't really anymore charges).
$215.83
for 53hr
^It shouldn't increase too much more than that I don't think.
100513d0922 Now the charges are 256.56 for 63 hr.  Okay now I don't think it should increase anymore.

I want to write some new code for calculating these summary numbers that is extremely lean and fast.

--------
how to specify multiple file types in windows search
cloud storage AND (type:txt OR type:html OR type:wiki)
--------

My new code now correctly calculates the entropy in less than one second.
Now I would like to calculate the other summary numbers as well.  What are the summary numbers that I want?  Some of the ones I found before are identical whether the data is normalized or not.  I also think it is unnecessary to find the mode (it also seems more time consuming to calculate this since I was calculating it with arraylists).  Here's what I want to find:
entropy, normalized_entropy, entropy_normalized_data, normalized_entropy_normalized_data, cv, stdev, stdev_normalized, mean, mean_normalized, median, min, min_normalized, max, max_normalized, kurtosis, skew, 95th_percentile, 95th_percentile_normalized, 5th_percentile, 5th_percentile_normalized, dynamic_range
^The code can get all of these numbers in less than 3 seconds.

How long does it take for the entropy program to extract the values from a 330k gpr file?
A very long time with the current code.

azim58wiki: