work to make really lean fast code 10-7-13

2014-08-29

++ work to make really lean fast code 10-7-13

10k entropy calculations 10-3-13

Okay I think I can start to get all of my 10k data calculated.

The first thing I will start up is the FVBN_Time_Series data. This is in the format of a folder of gprs. All of the data uses F647 median.

/home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645

and

/home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645

I already started getting this prepared previously.

The name of the jar file used will be

EntropyOfArray100113d1557.jar

I will not use the Xmx command to specify 6GB of RAM for these 10k files.

I will also not specify westmereEP as the node.

Now I'll start the job.

job for /home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645

6205848.newmoab.local

started at 100313d1851

^expected time: 5min*24=2hr. However, now the job has been going for 14hr 55 min. I thought it was supposed to complete much faster than that.

Now I'll get the other time series going

job for /home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645

6205854.newmoab

started at 100313d1700

^I added the Westmere line to the pbs file back

#PBS -l nodes=1:westmereEP

^I ran out of memory so I guess I should have reserved more.

I'd like to test one single 10k gpr again.

/home/lwang138/entropy/one_10k_test_100413d0917

393308_Bot_WT2-3_37.4_20130501_SLOT10_S01_Red

-script files here

--/home/lwang138/kurt/one_10k_sample_100413d0919

job id

6205979.newmoab.local

started at 100413d0929

^Let's see how long this one 10k file takes

^I can't get this started because I can't reserve enough memory. .

I'd also like to see how long one 330k file takes.

6205980.newmoab.local

I want to see if I can analyze the data in excel in amazon web services.
Made a new instance
high_memory_2_100413d1510
i-a5ad1f91
keypair: azim58 keypair
password: hQq!8M7BNp
Public IP: 54.200.40.69
Username: Administrator

Many of the jobs I had running on saguaro were ended by the computer administrator since my jobs were taking way too much RAM.
Here's the email about this issue.
https://mail.google.com/mail/u/0/?ui=2&shva=1#label/Computer+Stuff/1418549880bdf5c6

test_10-1-13.pem keypair file
"F:\kurt\storage\CIM Research Folder\DR\2013\10-4-13\test_10-1-13.pem"

I think I'll let AWS keep working on the files until tomorrow morning. If nothing is done by then, I will stop the instance.
^Actually I think the program finished (100413d2008), but for some reason, I do not see an outputsummary file. That's strange. Well, I guess I can collect the files, and stop the program now.
^Or maybe the program did not finish, and "PowerShell" just let's me close the program before the last one is done. The processes don't seem to be in use. . . I'll go ahead and end the programs and collect my files since this is taking a lot of money.

Parallelizing a for loop
http://stackoverflow.com/questions/5686200/parallelizing-a-for-loop

Current aws charges in this little experimental test as of 100413d2036 (everything is stopped now so hopefully there aren't really anymore charges).
$215.83
for 53hr
^It shouldn't increase too much more than that I don't think.
100513d0922 Now the charges are 256.56 for 63 hr. Okay now I don't think it should increase anymore.

I want to write some new code for calculating these summary numbers that is extremely lean and fast.

how to specify multiple file types in windows search
cloud storage AND (type:txt OR type:html OR type:wiki)

My new code now correctly calculates the entropy in less than one second.
Now I would like to calculate the other summary numbers as well. What are the summary numbers that I want? Some of the ones I found before are identical whether the data is normalized or not. I also think it is unnecessary to find the mode (it also seems more time consuming to calculate this since I was calculating it with arraylists). Here's what I want to find:
entropy, normalized_entropy, entropy_normalized_data, normalized_entropy_normalized_data, cv, stdev, stdev_normalized, mean, mean_normalized, median, min, min_normalized, max, max_normalized, kurtosis, skew, 95th_percentile, 95th_percentile_normalized, 5th_percentile, 5th_percentile_normalized, dynamic_range
^The code can get all of these numbers in less than 3 seconds.

How long does it take for the entropy program to extract the values from a 330k gpr file?
A very long time with the current code.