work organizing datasetsC developing codeC and running code to collect data 10-2-13

2014-08-29

++ work organizing datasets C developing code C and running code to collect data 10-2-13

I modified my entropy code so that it could calculate more summary numbers.

renamed this file:
"F:\Users\kwhittem\Desktop\temp\4-46 S8 G2 Hi R-P50 G-P20 03012013 EC 722078u0532 K.gpr"
to this for a test
test1 9-16-13.gpr

found here
"F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test1\test1 9-16-13.gpr"

I deleted most of the rows to make it a small file.

The column to extract is "F532 Median"

Here is the output summary
name entropy entropy_normalized cv cv_normalized stdev stdev_normalized mean mean_normalized median median_normalized mode mode_normalized min min_normalized max max_normalized
test1 9-16-13 1.945910149055313 1.945910149055313 0.3463060072388747 0.34631240565783933 2051.4673145963457 2879.191867438468 5923.857142857143 8313.857142857143 7125.0 10000.0 7753.0, 7125.0, 7983.0, 4719.0, 7172.0, 3215.0, 3500.0 10881.0, 10000.0, 11204.0, 6623.0, 10065.0, 4512.0, 4912.0 3215.0 4512.0 7983.0 11204.0
^
All of these values match with the manual excel calculations!

String[] column_titles =

"name", "entropy","entropy_normalized","cv","cv_normalized","stdev","stdev_normalized","mean","mean_normalized","median","median_normalized","mode","mode_normalized","min","min_normalized","max","max_normalized"





I'd like to double check that all of these numbers are correct, and then I will be good to go!





Here's the spreadsheet I will use to check everything.


"F:\kurt\storage\CIM Research Folder\DR\2013\9-17-13\entropy\entropy check 9-17-13.xlsx"





name	entropy	entropy_normalized	cv	cv_normalized	stdev	stdev_normalized	mean	mean_normalized	median	mean_normalized	median	median_normalized	mode	mode_normalized	min	min_normalized	max	max_normalized











remaining results for 330k data 9-17-13


{


entropy	cv	normalized_entropy	normalized_cv


94	9.066523770190122	0.892885333613144	5.605292376537055	0.8955995917461678

7753
7125
7983
4719
7172
3215
3500

median
7125

7753/7125.0=1.088

088140351

120421053
662315789
006596491
45122807
49122807

check All Non Zero Frequencies
sum of non zero frequencies should be 7

10881, 10000, 11204, 6623, 10065, 4512, 4912

The normalized entropy is not as I would expect it to be, and some of the output in the text file from the entropy calculations are not as I would expect them to be.

I should try to make a smaller distribution with a smaller factor to help me debug these problems.

I made a new simplified gpr file.
"F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test2\test2 9-18-13.gpr"

output was this
name entropy entropy_normalized cv cv_normalized stdev stdev_normalized mean mean_normalized median median_normalized mode mode_avg mode_normalized mode_normalized_avg min min_normalized max max_normalized
test1 9-16-13 1.945910149 1.945910149 0.346306007 0.346312406 2051.467315 2879.191867 5923.857143 8313.857143 7125 10000 7753.0, 7125.0, 7983.0, 4719.0, 7172.0, 3215.0, 3500.0 5923.857143 10881.0, 10000.0, 11204.0, 6623.0, 10065.0, 4512.0, 4912.0 8313.857143 3215 4512 7983 11204

All of these values were what they should be when I did the manual calculations in excel. Now I'll look at the first test gpr again.

I would now like to test one whole 330k gpr file.

referred to "test of one 330k gpr for cv and entropy normalized and non-normalized"
retrieved gpr from here
-F:\kurt\storage\CIM Research Folder\DR\2013\9-4-13\330k gpr test
pasted this file

4-22 S4 A1 Hi P60 092612 #8-15 5um K.gpr
here
/home/lwang138/entropy/one_330k_test

renamed to

one_330k_test.gpr

I'll export a jar file from my current code to run the file
-F:\kurt\storage\CIM Research Folder\DR\2013\9-19-13\EntropyOfArray091913d1043.jar
copied jar here

/home/lwang138/kurt/test one 330k 9-19-13

command entered in cmd_original.saguaro

java -jar "/home/lwang138/kurt/test one 330k 9-19-13/EntropyOfArray091913d1043.jar" /home/lwang138/entropy/one_330k_test "one_330k_test.gpr" -Xmx6144

commands run in terminal
-cd "/home/lwang138/kurt/test one 330k 9-19-13"
-chmod +x jobScript.saguaro
-./jobScript.saguaro cmd_original.saguaro pbs.saguaro

I forgot about the filename problem. I will try to use this method

FileNameUtils.separatorsToSystem
from
Apache Commons
to fix my problem.

downloaded the apache commons jar files and included them in my project.

commons-lang3-3.1

Here's a page about how to make apache commons work with eclipse



http://www.java-forums.org/new-java/10601-how-make-apache-commons-stringutils-etc-work-eclipse.html


Download commons-lang-2.4.zip from Apache Commons - Lang Downloads


Expand the archive in a directory (eg. H:\devel\commons-lang-2.4)


In eclipse:


A Put the library in the java build path





Window -> Preferences: Java - Build path - User Libraries push the "New..." button, and in the "User library name:" enter 'appache-commons-lang'.


Click to select "appache-commons-lang" in the "Defined user libraries" list and then click on the "Add JARs..." button and browse for commons-lang-2.4.jar (in the folder you've saved earlier) and select it.


Now in the "Defined user libraries", under commons-lang-2.4.jar you have to have "Source attachment" and "Javadoc location" ; for each of them, select it, then push "Edit..." , then "External file" and browse for the corresponding jar (/commons-lang-2.4-sources.jar and commons-lang-2.4-sources.jar





B. In your project add this user library


In the Package Explorer right-click on the project name, go to Properties, and select the Libraries tab; click the "Add Library..." button, from the list select "User library" , "Next", mark [x] appache-commons-lang, and click "Finish"

^The command FileNameUtils.separatorsToSystem still was not recognized after this.

Apache seems to have some major changes in the 3rd version. Perhaps a previous version will allow me to use the method.
I tried to add appache 2.4, but I still had a problem
I ended up using File.separator and Paths.get instead

In order to get the program running on saguaro, I had to change the underscores in the filename to "u"s. Then it seemed to work. Actually, the program failed, and I cannot see the output file.

I'll continue fixing up the version of my program on windows.

I made a new folder test3 to test running multiple gpr files.
F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test3

The output results were as follows:
name entropy entropy_normalized cv cv_normalized stdev stdev_normalized mean mean_normalized median median_normalized mode mode_avg mode_normalized mode_normalized_avg min min_normalized max max_normalized
test2 9-18-13.gpr 1.5498260458782016 1.5498260458782016 0.3333333333333333 0.3533431298606758 2.0 2.9277002188455996 6.0 8.285714285714286 7.0 10.0 8.0, 7.0 7.5 11.0, 10.0 10.5 3.0 4.0 8.0 11.0
test3 9-18-13.gpr 1.4750763110546947 1.4750763110546947 0.6862667749109684 0.7172191381865587 3.8234863173611098 6.454972243679028 5.571428571428571 9.0 6.0 10.0 2.0 2.0 3.0 3.0 2.0 3.0 12.0 20.0

I will compare this to some manual calculations.
All of the values match just as they should.

data in "output of both s distributions" in "F:\kurt\storage\CIM Research Folder\DR\2013\9-17-13\entropy\entropy check 9-17-13.xlsx"

Now I'll create another test folder (test4) to test a tab delimited text file.
^Everything worked as expected.

Now I'll do a test to make sure that the function for normalized data in a tab delimited text file works.
^Everything works as expected.

Now I just want to make sure that the program will work from the command line so that it can be run on a supercomputer.
^Oh actually, first I need to make it so that the program can collect the output from a bunch of single gpr runs.

First I need to make a folder with 2 gpr runs.
This is found here:
F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test6\test3

Now I need to write a method to go through all the folders and collect the necessary data.

s: java function to search through all subfolders

This page looks like it has some good information.
-http://stackoverflow.com/questions/12656569/recursive-method-to-search-through-folder-tree-and-find-specific-file-types

/*
* matching a pattern that is not surrounded by another pattern turns out to be a complex task
* There's a webpage about it here
* http://stackoverflow.com/questions/1191397/regex-to-match-values-not-surrounded-by-another-char
* We only want there to be a match if there is an even number of surrounding_pattern after the pattern. If there is an odd number, than that means the pattern is within at a least two surround_pattern
* I tried to accomplish this task with regular expressions. However, this was proving difficult. Instead, I think I will just write a function (possibly recursive)
*/

Here are a few of my failed obsolete attempts



//return_string = findAndReplacewithRegEx(return_string, pattern "(?=(?:(?:(?:[^" surrounding_pattern "]  |\\.)* " surrounding_pattern "){2

)* (?:^" surrounding_pattern " |\\.)* $)", "q");
//return_string = findAndReplacewithRegEx(return_string, pattern "(?=(?:(?:^" surrounding_pattern "* " surrounding_pattern ")

)* ^" surrounding_pattern "* )", "q");
//return_string = findAndReplacewithRegEx(return_string, "(, )(?=(?:(?:(?:^\\] |\\.)* \\])

)* (?:^\\] |\\.)* $", "q");
//return_string = findAndReplacewithRegEx(return_string, "pattern(?=(?:(?:^surrounding_pattern*surrounding_pattern)

)*^surrounding_pattern*", "q");
//return_string = findAndReplacewithRegEx(return_string, "(?<!\\)(. )(, )(. )(?!\\)", "$1\t$3" );

}

temp code I think I'll remove 9-23-13d2054



//the number of left brackets would be an upper bound on the number of possible odd number of brackets that are not allowed to occur after a comma


			for(int i=0; i<left_brackets.size(); i  )


			{


				return_string = findAndReplacewithRegEx(return_string, pattern "(?!(. " surrounding_pattern ")((. " surrounding_pattern ")(. " surrounding_pattern "))* )", "q");

}

I want to test my method to replace a pattern not surrounded by another pattern.

I'll use this regex on the original string to put it into a spreadsheet in excel (then I can get the numbered position of each character (including spaces)).
(\S\s)
$1\t

The spreadsheet I used to test with is found here
"F:\kurt\storage\CIM Research Folder\DR\2013\9-24-13\Test replace pattern not surrounded by surrounding pattern method 9-24-13.xlsx"

comma #15 is a comma within brackets
comma #12 is a comma within brackets
There are a total of 20 commas in the test string

9-24-13 1357
Great it looks like the whole program works now!

I'll make another slight change to the program so that all of the summary numbers from the normalized data are calculated before multiplying the data by a factor and calculating the entropy.

before I make this change, here is the current version of the ScenarioHandler class.
Okay I think that change should be fairly trouble free.

Now I can just make sure the program works from the command line, and then I can get these programs running on a supercomputer.

examples of commands that could be run from the command line

//usual numbers for the min, max, and factor for the entropy would be 1, 65535, 10000 (aka 10,000)
//examples of column titles to extract could be "F647 Median" or "F532 Median"
//-find_summary_numbers_one_gpr(String directory, String sample_name, String column_title_to_extract, boolean entropy, int min_for_entropy, int max_for_entropy, double factor)
//--command line version: find_summary_numbers_one_gpr directory sample_name column_title_to_extract entropy min_for_entropy max_for_entropy factor

//-find_summary_numbers_from_folder_of_gprs(String directory, String column_title_to_extract, boolean entropy, int min_for_entropy, int max_for_entropy, int factor)
//-command line version: find_summary_numbers_from_folder_of_gprs directory column_title_to_extract entropy min_for_entropy max_for_entropy factor

//-find_summary_numbers_from_tabdelimitedtext_raw_data(String directory, String filename, int min_for_entropy, int max_for_entropy, int factor, boolean entropy, int sample_name_row, int sample_name_column_start, int sample_name_column_end, int data_column_start, int data_column_end, int data_starting_row)
//--command line version: find_summary_numbers_from_tabdelimitedtext_raw_data directory filename min_for_entropy max_for_entropy factor entropy sample_name_row sample_name_column_start sample_name_column_end data_column_start data_column_end data_starting_row
//^note that the data rows and columns would start from count 0

//-find_summary_numbers_from_tabdelimitedtext_normalized_data(String directory, String filename, int min_for_entropy, int max_for_entropy, int factor, boolean entropy, int sample_name_row, int sample_name_column_start, int sample_name_column_end, int data_column_start, int data_column_end, int data_starting_row)
//--command line version: find_summary_numbers_from_tabdelimitedtext_normalized_data directory filename min_for_entropy max_for_entropy factor entropy sample_name_row sample_name_column_start sample_name_column_end data_column_start data_column_end data_starting_row
//^note that the data rows and columns would start from count 0

//-collectAllSummaryFilesIntoOneTable(String directory, String name_of_summary_file, String output_file_name)
//--command line version: collectAllSummaryFilesIntoOneTable directory name_of_summary_file output_file_name

Okay now I should be able to run the program from the command line. I'll just do a little bit of testing, and then I should be good to go.

test7: I made the jar file and will test a single gpr file.
Here's the command I will run

for one gpr
-java -jar EntropyOfArray091913d1043.jar find_summary_numbers_one_gpr "F:\\kurt\\storage\\CIM Research Folder\\DR\\2013\\9-16-13\\entropy\\test7" "test2 9-18-13" "F532 Median" true 1 20 10

test8: now I'll test a folder of gprs

folder of gprs
-java -jar EntropyOfArray091913d1043.jar find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test8" "F532 Median" true 1 20 10

test 9:
now I'll test a table of non-normalized data

Here's the command I will run
java -jar EntropyOfArray091913d1043.jar find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test9" "test_data_9-20-13d1124.txt" 1 20 10 true 0 1 2 1 2 1

test10:
now I'll test a table of normalized data

Here's the command I will run
java -jar EntropyOfArray091913d1043.jar find_summary_numbers_from_tabdelimitedtext_normalized_data "F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test10" "test_data_9-20-13d1124.txt" 1 20 10 true 0 1 2 1 2 1

test11:
now I'll test the ability to combine the data from different runs of the program into one table

java -jar EntropyOfArray091913d1043.jar collectAllSummaryFilesIntoOneTable "F:\kurt\storage\CIM Research Folder\DR\2013\9-16-13\entropy\test11" "table_of_summary_numbers.txt" "complete_table_092513d1115.txt"

Now that the code is modified the way I would like it to be, I want to get it running on some supercomputers to analyze all of the datasets.

I started by running this code found here
/home/lwang138/kurt/testSPoneSP330kSP9-19-13
to just run one single gpr so I could analyze this.

Now I would like to arrange everything for all of my other datasets and get them running.

The first thing I want to get organized is the 330K dataset.

The next thing I want to get organized is the old and young mouse data.
Initial Experiment 1-18-12
there should be 8 gprs
I'll copy this data from here
F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files
to here
/home/lwang138/entropy/oldSandSyoungSmouseSdataS9-26-13d1340

Here is the command I will use to run this data
java -jar "/home/lwang138/entropy/EntropyOfArray091913d1043.jar" find_summary_numbers_from_folder_of_gprs "/home/lwang138/entropy/oldSandSyoungSmouseSdataS9-26-13d1340" "F647 Median" true 1 65535 10000

I have the pbs, script, and command file here
/home/lwang138/kurt/oldSandSyoungSmouseSdataS9-26-13d1340

time to complete one 330K gpr with all types of measures:
21:54:27
^actually it looks like this job did not complete.
job id: 6174346.newmoab
error message
=>> PBS: job killed: walltime 75024 exceeded limit 75000
The program seemed to be running fine before the walltime was reached since files were being output as they should be.
I'll double the walltime to 40:00:00, and I'll try rerunning the program.

Now I can move onto the next set of data.
polyclonal data from Josh

-analysis of Josh's polyclonal experiment 7-19-QC array data
-entropy analysis of QC samples 7-20-12
-location of data: S:\Research\labdata\jaricher\QC Data\sera
-location of data: S:\Research\labdata\jaricher\QC Data\biotin

copied data to here on saguaro
/home/lwang138/entropy/QCData/QC Data

Now I'll get it running.
^Actually, before I start running all of these jobs, I better make sure I know how long it takes to run the program for one 330k file and one 10k file. I'll hold off on running everything. I suppose I can still get things organized and ready to run though.
These qc data scripts should be ready to go:

/home/lwang138/kurt/QC_Data_biotin_9-26-13d1507
/home/lwang138/kurt/QC_Data_sera_9-26-13d1504

time for qc bition
80*7=560min= 9hr 20 m

Tiger's analysis of mouse tumor samples 7-23-12

location of data: I do not have the original location of this data

Entropy Calculations for Bart's samples 12-13-12

location of data: S:\Research\Cancer_Eradication\Users\kwhittem\kwhittem\Raw Data Often Originally on Research Drive\2012\12-13-12\gprs for entropy calculation for Bart\gprs
F649 Median
154 items (estimated calculation time 154*7min=17hr58min)
program started 9-26-13d1602

Entropy for FVBN Transgenic mice

location of data: S:\Research\Cancer_Eradication\Discovering tumor specific antigens\entropy\5-5-13
sample id starts at row 2 column 1
data id starts at row 14 column 1 (last column is column 111)

command will be
find_summary_numbers_from_tabdelimitedtext_normalized_data "/home/lwang138/entropy/FVBN_Transgenic_Mice" "FVBN_mouse_in_10K_Version_2_Batch_effect_corrected.txt" 1 65535 10000 true 2 1 111 1 111 14

FVBN Time Series Data

F647 Median

command I will run
find_summary_numbers_from_folder_of_gprs "/home/lwang138/entropy/FVBN_Time_Series_1_9-26-13_1645" "F647 Median" true 1 65535 10000
and
find_summary_numbers_from_folder_of_gprs "/home/lwang138/entropy/FVBN_Time_Series_2_9-26-13_1645" "F647 Median" true 1 65535 10000

"Entropy calculation for Bart's samples (some normalized) 3-1-13"

Bart mentioned some other measures of the distribution
Median, 95th percentile, 5th percentile, dynamic range(95th%/5th%), kurtosis and skewness.

kurtosis (measures the peakedness of a distribution)

algorithm for calculating kurtosis
-http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics

skewness

code for stdev, skewness, and kurtosis
http://www.iolite.net/?p=41

I guess I'll add these measures (kurtosis, skewness, 5th percentile, 95th percentile, dynamic range) to my code.

code for percentile
http://stackoverflow.com/questions/8137391/percentile-calculation

9-27-13d1405: finished modifying code so it can calculate additional measures as well

Okay now I need to figure out how long the 9-27-13 code takes to run with one 10K gpr file and one 330k gpr file.

/home/lwang138/entropy/EntropyOfArray092713d1430.jar

/home/lwang138/entropy/one_330k_test_092713d1432
oneu330kutest.gpr

/home/lwang138/entropy/one_10k_test_092713d1434
406971_bot-GRP2_01182012_young.gpr

job id 6179503.newmoab.local : /home/lwang138/kurt/one_10k_sample_9-27-13

job id 6179508.newmoab.local : /home/lwang138/kurt/one_330k_sample_9-27-13

Now I can finish getting my other datasets ready, and then I'll probably wait to proceed until I get the times for the 10k and 330k sample test.

"Entropy calculation for Bart's samples (some normalized) 3-1-13"
"S:\Research\Cancer_Eradication\Discovering tumor specific antigens\entropy\3-1-13\sample_analysis\CIM10K Dogs\CIM10K Dogs.xlsx"

names at row 3 column 1 to column 25 and data at row 5 column 1 to column 25

copied the spreadsheet to here
/home/lwang138/entropy/lymphoma_data_2_9-27-13d1508
CIM10K Dogs.xlsx

Now for the other 330k samples
S:\Research\CIM-HealthTell\Experiments\20130717 HTChipV7P-108 Production Run - 100% Denisty (HT-108)\Good GPR from Slides 2 to 8 on 900 Correct Labels

I need to copy each gpr and put it into it's own folder.

time for one 10k gpr with 9-27-13 entropy program: 5m8s

job id : 6179562.newmoab.local : /home/lwang138/kurt/one_330k_sample_9-27-13

Entropy calculation for all human normals as of 4-17-13

I need to go through and collect the human normals that I have age info for. I have the names of 63 gprs for this found in "sample info refined 3" sheet in this spreadsheet: "F:\kurt\storage\CIM Research Folder\DR\2013\4-24-13\entropy\some entropy age associations 4-26-13.xlsx"

copy "1009951_bot_N-19(152)_08132012.gpr" "C:\Users\kwhittem\Desktop\temp\human_normal_data_with_ages"
^I tried to use this type of command to copy the 63 files I needed to collect, but I was only able to get 4 files. Where are the others? Perhaps I will need to write a program to search the shared drive.
^I wrote this program, and it takes about 8 min per file to find the location in the PeptideArrayCore folder on the shared drive.

I should make a text file containing all the file information for the folders that I think the files might be in.

time and space required to calculate entropy of one gpr on saguaro

from 9-27-13 1708 to 9-29-13 1700
47.9 hr

Okay now I would like to get saguaro running on all of my 330k data which should take about 48 hr.

how many gprs are there total?
110+141+138+134 = 523
How many computation hours will this require?
25104 hr
However, we only have 18044 hr
What if I just did the old 108 wafer, and one of the new wafers?
110+141 = 251*48 = 12,048

How much would it cost me to get all of these samples run on amazon?

06*12048 = $722.88 (which is quite a bit of money!)

I think I'll just run the 251 gprs (some of them will have to wait in a queue since there are not enough cores to run them all at once)

Then I'll get all of the 10k samples running.

First I need to make sure that my program produced the correct values for the 330k and 10k samples

Paste the numbers you want to calculate the entropy of into a column in excel.
Copy this column into another column and then choose data->remove duplicates
Use the countif function for each of the duplicate removed column values to count the number of times each item occurs in the original column like

=COUNTIF($A$1:$A$13,E1)

Calculate p(x)*ln(p(x)) for each count
sum all of the p(x)*ln(p(x)) values
take the negative value of the sum for the final entropy value

It looks like my code has a problem right now with calculating the percentile numbers.

percentile code modification
// Another method: double n = (N + 1) * excelPercentile;
http://stackoverflow.com/questions/8137391/percentile-calculation

964.949999
vs
1811.05

Basic method for determining percentile as described in apache commons
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.commons/commons-math/2.2/org/apache/commons/math/stat/descriptive/rank/Percentile.java#Percentile.insertionSort%28double[]%2Cint%2Cint%29
Let n be the length of the (sorted) array and 0 < p <= 100 be the desired percentile.
If n = 1 return the unique array element (regardless of the value of p); otherwise
Compute the estimated percentile position pos = p * (n + 1) / 100 and the difference, d between pos and floor(pos) (i.e. the fractional part of pos).
If pos < 1 return the smallest element in the array.
Else if pos >= n return the largest element in the array.
Else let lower be the element in position floor(pos) in the array and let upper be the next element in the array. Return lower + d * (upper - lower)

I found out the problem with my percentile code. I was trying to sort a list of strings, but these strings should have been converted to doubles first. When I fixed this problem, my code matched with the percentile that excel and R returns (or very very closely anyway; I think there are a few minor rounding differences)

Now that the percentile is corrected, I would also like to add a normalized_shannon_entropy filed as well.

Normalized Shannon Entropy (Sn) = S/LnN

Sheri needed some antibody
Thermo Scientific MS1295 Actin Pan Ab5 200 ug/mL

located in Box 2 in a small white tube around middle

Alright I checked he program with a 10k gpr, and everything works! All of the calculated values match the manual calculations. I'll just do a little bit of cleaning up and recording, and then I'll start getting some of the 330k samples running.

Okay so now I can try to get some of the 330k data running.

/home/lwang138/entropy

Linux command to list all files in subdirectories
find . -type f -exec ls -l

} \; 2> /dev/null | sort -t' ' -k +6,6 -k +7,7


-http://stackoverflow.com/questions/9620050/list-all-files-with-full-paths-in-a-directory-and-subdirectories-order-by-a








first I'll process the data here


/home/lwang138/entropy/wafer_108_data_330kdata





I need to create all the commands I will run.


java  -jar  "/home/lwang138/entropy/EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "/home/lwang138/entropy/wafer_108_data_330kdata/" "" "F532 Median" true 1 65535 10000 -Xmx6144





wafer 46 is "F532 Median"





wafer 108 is "F635 Median"





I'm surprised that all of the gprs I've checked use F635 Median








is all the wafer 108 data F635 data?


635


635


1


1


1


20->1


40->1


60->1


100->1


120->1


140->1


^Yes this appears to be the case





before I run all >100 samples I'll run a test command first





java  -jar  "/home/lwang138/entropy/EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "/home/lwang138/entropy/wafer_108_data_330kdata/135" "HT7-108 S8 F2 DUAL 091813 R13 Romeo_0241" "F635 Median" true 1 65535 10000 -Xmx6144





100213d0918 got wafer 108 job running


-[jobs associated with running good wafer 108 gprs from DTRA 2013 run]





Now I can get wafer 46 running as well.





I see that many of the jobs for wafer 108 failed right away because there was not enough memory.


I think I might try running the program on Amazon Web Services.





created a high memory instance


name: high_memory_1_10-2-13


assigned name: i-00cef434


keypair: azim58 keypair


password: qAsd)AYTKTE





F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\azim58.pem





When I remote to this server Internet explorer enhanced security configuration is causing some problem


-How to disable IE Enhanced Security in Windows Server 2012


--http://blog.blksthl.com/2012/11/28/how-to-disable-ie-enhanced-security-in-windows-server-2012/








Maybe I'll sign up for at&t locker to transfer the files.


email


[email protected]


id: azim58


password: Joseph without money





The windows server has _ GB of RAM, and I need at least 6 GB per file. 244/6=40.6.  There are 32 virtual processors.  Therefore, I'll run 32 gprs per windows server.  There are a total of 110 gprs so I will need 110/32.0 = 4 windows servers to accomplish the task quickly.





It's going to take a while to get these files transferred.  In the meantime, I can get my last few datasets in order.








Analysis of Pre-run samples 2-27-12


Here's an e-mail from Muskan about his human naive samples (Muskan e-mail 2-15-12)





s33rst0n3$





Immunosignature Entropy 6 4-3-12





dir /s/b/o:gn > f.txt





The command I will run will look something like this.





java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\32" "4-46 S4 E1 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144





^see "F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\running program on gprs\commands for 46 wafer.xlsx"





Everything seems to be working well.





Now I'll get all the other commands together, and I will be good to go.








Some command line commands:


{


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\32" "4-46 S4 E1 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\1" "4-46 S2 A2 Hi P20 DTRA4 110512 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\2" "4-46 S2 B1 Hi P20 110512 DTRA 2 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\3" "4-46 S2 B2 Hi P20 DTRA1 110512 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\4" "4-46 S2 B3 Hi P20 DTRA3 110512 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\5" "4-46 S2 C1 Hi P20 110512 DTRA 2 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\6" "4-46 S2 C2 Hi P20 DTRA2 110512 L" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\7" "4-46 S2 C3 Hi P20 buffer 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\8" "4-46 S2 D1 Hi P20 110512 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\9" "4-46 S2 D2 Hi P20 ND136 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\10" "4-46 S2 D3 Hi P20 DTRA4 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\10" "HT4-46 S1 B2 Hi P20 103112 #1-1 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\11" "4-46 S2 E1 Hi P20 110512 BC040 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\12" "4-46 S2 E2 Hi P20 BC009 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\13" "4-46 S2 F1 Hi P20 110512 ND145 50k S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\14" "4-46 S2 F2 Hi P20 DTRA1 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\15" "4-46 S2 F3 Hi P20 ND145 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\16" "4-46 S2 G1 Hi P20 110512 p53Ab8 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\17" "4-46 S2 G3 Hi P20 DTRA3 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\18" "4-46 S2 H1 Hi P20 110512 p53Ab1 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\19" "4-46 S2 H2 Hi P20 ND145 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\20" "4-46 S2 H3 Hi P20 BC001 110512 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\21" "4-46 S4 A1 Hi P20 111312 DTRA 2 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\22" "4-46 S4 A2 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\23" "4-46 S4 A3 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\24" "4-46 S4 B1 Hi P20 111312 DTRA 1 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\25" "4-46 S4 B2 Hi P20 111312 DTRA 2 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\26" "4-46 S4 B3 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\27" "4-46 S4 C1 Hi P20 111312 DTRA 2 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\28" "4-46 S4 C2 Hi P20 111312 DTRA 4 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\29" "4-46 S4 C3 Hi P20 111312 DTRA 3 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\30" "4-46 S4 D1 Hi P20 111312 DTRA 1 S" "F532 Median" true 1 65535 10000 -Xmx6144


java  -jar  "C:\Users\Administrator\Downloads\EntropyOfArray100113d1557.jar" find_summary_numbers_one_gpr "C:\Users\Administrator\Downloads\group 1\group 1\31" "4-46 S4 D2 Hi P20 111312 DTRA 3 S]" "F532 Median" true 1 65535 10000 -Xmx6144

^I could only run programs up to sample 22, I didn't have enough RAM for the rest of the samples (as of 10-2-13)