++ work to make really lean fast code 10-7-13 10k entropy calculations 10-3-13 { Okay I think I can start to get all of my 10k data calculated. The first thing I will start up is the FVBN_Time_Series data. This is in the format of a folder of gprs. All of the data uses F647 median. /home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645 and /home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645 I already started getting this prepared previously. The name of the jar file used will be EntropyOfArray100113d1557.jar I will not use the Xmx command to specify 6GB of RAM for these 10k files. I will also not specify westmereEP as the node. Now I'll start the job. job for /home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645 6205848.newmoab.local started at 100313d1851 ^expected time: 5min*24=2hr. However, now the job has been going for 14hr 55 min. I thought it was supposed to complete much faster than that. Now I'll get the other time series going job for /home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645 6205854.newmoab started at 100313d1700 ^I added the Westmere line to the pbs file back #PBS -l nodes=1:westmereEP ^I ran out of memory so I guess I should have reserved more. I'd like to test one single 10k gpr again. /home/lwang138/entropy/one_10k_test_100413d0917 393308_Bot_WT2-3_37.4_20130501_SLOT10_S01_Red -script files here --/home/lwang138/kurt/one_10k_sample_100413d0919 job id 6205979.newmoab.local started at 100413d0929 ^Let's see how long this one 10k file takes ^I can't get this started because I can't reserve enough memory. . I'd also like to see how long one 330k file takes. 6205980.newmoab.local } I want to see if I can analyze the data in excel in amazon web services. Made a new instance high_memory_2_100413d1510 i-a5ad1f91 keypair: azim58 keypair password: hQq!8M7BNp Public IP: 54.200.40.69 Username: Administrator Many of the jobs I had running on saguaro were ended by the computer administrator since my jobs were taking way too much RAM. Here's the email about this issue. https://mail.google.com/mail/u/0/?ui=2&shva=1#label/Computer+Stuff/1418549880bdf5c6 test_10-1-13.pem keypair file "F:\kurt\storage\CIM Research Folder\DR\2013\10-4-13\test_10-1-13.pem" I think I'll let AWS keep working on the files until tomorrow morning. If nothing is done by then, I will stop the instance. ^Actually I think the program finished (100413d2008), but for some reason, I do not see an outputsummary file. That's strange. Well, I guess I can collect the files, and stop the program now. ^Or maybe the program did not finish, and "PowerShell" just let's me close the program before the last one is done. The processes don't seem to be in use. . . I'll go ahead and end the programs and collect my files since this is taking a lot of money. Parallelizing a for loop http://stackoverflow.com/questions/5686200/parallelizing-a-for-loop Current aws charges in this little experimental test as of 100413d2036 (everything is stopped now so hopefully there aren't really anymore charges). $215.83 for 53hr ^It shouldn't increase too much more than that I don't think. 100513d0922 Now the charges are 256.56 for 63 hr. Okay now I don't think it should increase anymore. I want to write some new code for calculating these summary numbers that is extremely lean and fast. -------- how to specify multiple file types in windows search cloud storage AND (type:txt OR type:html OR type:wiki) -------- My new code now correctly calculates the entropy in less than one second. Now I would like to calculate the other summary numbers as well. What are the summary numbers that I want? Some of the ones I found before are identical whether the data is normalized or not. I also think it is unnecessary to find the mode (it also seems more time consuming to calculate this since I was calculating it with arraylists). Here's what I want to find: entropy, normalized_entropy, entropy_normalized_data, normalized_entropy_normalized_data, cv, stdev, stdev_normalized, mean, mean_normalized, median, min, min_normalized, max, max_normalized, kurtosis, skew, 95th_percentile, 95th_percentile_normalized, 5th_percentile, 5th_percentile_normalized, dynamic_range ^The code can get all of these numbers in less than 3 seconds. How long does it take for the entropy program to extract the values from a 330k gpr file? A very long time with the current code.