work analyzing many different datasets with entropy and other measures 10-26-13

2014-08-29

++ work analyzing many different datasets with entropy and other measures 10-26-13

10-9-13

I would now like to get the summary numbers for the 2013 DTRA data.

I'll copy the data I want to look at just in case anything wierd occurs (I don't want to damage the original data).

Copied gpr files from here

S:\Research\CIM-HealthTell\Experiments\20130724 HTChipV7P-128 Production Run - 100% Density (HT-128)\Good GPRs Slides 3 to 8 on 900
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp

copied gpr files from here

S:\Research\CIM-HealthTell\Experiments\20130724 HTChipV7P-130 Production Run - 100% Density (HT-130)\Good GenePix GPR from Slides 3 to 8 from 900
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp2

It's taking some time to copy the files so I could probably analyze some other data as well.

I'll look at this data:
F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files

Need to use folder of gpr code.
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files" "F647 Median"

files finished copying. can now run program. (F532 Median)
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp" "F532 Median"

and the other file
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2" "F532 Median"

short names for summary measures
entropy norm_entropy max min cv stdev mean median 5th_perc 95th_perc max_norm min_norm stdev_norm mean_norm 5th_perc_norm 95th_perc_norm kurtosis skew dynamic_range

summary numbers for wafer 128
summary numbers for wafer 130

table_of_summary_numbers_clean_100913d1421
at
F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary
"F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\table_of_summary_numbers_clean_100913d1421.xlsx"

parameters I like for a boxplot combined with a dot plot (scatterplot or strip chart) as of 9-26-13

-set group and type
-alpha at 0.5
-binwidth at 0.01
-dotsize at 3.0

how to save a ggplot() from command line in R
png("C:\Users\kwhittem\Desktop\temp\myplot.png")
....plot code here....
dev.off()
^this file will be placed in the current working directory which can be found with
getwd()
The directory can be set with
setwd(dir)

png("myplot.png", height = 800, width = 600)
ggplot() +
geom_boxplot(aes(y = entropy,x = Classification),data=table_of_summary_numbers_clean_100913d1421) +
coord_flip() +
geom_dotplot(aes(x = Classification,y = entropy),data=table_of_summary_numbers_clean_100913d1421,alpha = 0.5043,binaxis = 'y',binwidth = 0.01,stackdir = 'center')
dev.off()

I'll continue copying files over.
copied this
S:\Research\CIM-HealthTell\Experiments\20130807 HTChipV7P-135 Production Run - 100% Density (HT-135)\Mapix GPR from 900 First Pass
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp

copied this
S:\Research\CIM-HealthTell\Experiments\20130814 HTChipV7P-136 Production Run - 100% Density (HT-136)\Good GenePix GPRs slides 3 to 8 on the 900
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp2

wafer 135 results
wafer 136 results

command1
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp" "F532 Median"
command2
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2" "F532 Median"

I would like to automate making graphs of all of the measures. Then I can just copy and paste the commands for specific situations.
tiff("myplot.tiff")
ggplot() +
geom_boxplot(aes(y = entropy,x = Classification),data=table_of_summary_numbers_clean_100913d1421) +
coord_flip() +
geom_dotplot(aes(x = Classification,y = entropy),data=table_of_summary_numbers_clean_100913d1421,alpha = 0.5043,binaxis = 'y',binwidth = 0.01,stackdir = 'center')
dev.off()
tiff("myplot2.tiff")
ggplot() +
geom_boxplot(aes(y = entropy,x = Classification),data=table_of_summary_numbers_clean_100913d1421) +
coord_flip() +
geom_dotplot(aes(x = Classification,y = entropy),data=table_of_summary_numbers_clean_100913d1421,alpha = 0.5043,binaxis = 'y',binwidth = 0.01,stackdir = 'center')
dev.off()

I also need to specify the binwidth like this
bindwidth =(max(subset(table_of_summary_numbers, select=c("entropy")))-min(subset(table_of_summary_numbers, select=c("entropy"))))*0.1

All the commands can be found here
"F:\kurt\storage\CIM Research Folder\DR\2013\10-10-13\commands for automatically plotting summary numbers in r 10-10-13.txt"

"F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\wafer 128 results\table_of_summary_numbers_with_classification.txt"

Copied this
S:\Research\CIM-HealthTell\Experiments\20130814 HTChipV7P-137 Production Run - 100% Density (HT-137)\Good GPRs GENEPIX on 900 from slides 3 to 8
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp2

Copied this
S:\Research\CIM-HealthTell\Experiments\20130717 HTChipV7P-108 Production Run - 100% Density (HT-108)\Good GPR from Slides 3 to 8 on 900 Correct Labels
to here
S:\Research\Cancer_Eradication\Users\kwhittem\temp3

I moved the large wafer 135 results to my home desktop computer, and I'll try to run it using more RAM since these MAPPIX aligned files are larger.

Command
java -jar "C:\Users\Owner\Desktop\temp\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\Owner\BTSync\temp" "F532 Median" -Xmx6144
^There was some type of error

I'll try a new command on my work desktop computer in hope that I don't run out of RAM this time.
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp" "F532 Median" -Xmx2000
^started around 101113d1515

I'd like to analyze this 2013 DTRA data on the 10ks as well.

found in file locations that look like this

S:\Administration\PeptideArrayCore\2013 sample run\DTRA-go-no go\Set-L-good-gpr-files

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2" "F539 Median"
and
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp3" "F549 Median"
and
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2" "F649 Median"
^I won't do the F649 one since that is for IgM

I'll test out java on my laptop
java -jar "C:\Users\kurtw_000\Downloads\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\BTSync\temp" "F532 Median"

Now I'd like to analyze wafer 46 again.

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\9-26-13\46\all_gprs_in_one_folder" "F532 Median"

I'll store the results around here:
F:\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\summary of entropy values 9-10-13.xlsx
here
F:\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13

Now I'll take a look at Josh's disease dataset.
/home/josh/CIM/Research/labdata/jaricher/newDecipher/Data for Database/Array Results/First Chip Disease Dataset/llnl.csv

I will add a row to this data with a unique number identifier.
-new file here

"F:\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\llnl.csv"

-id from row 1 column 1 to 128. Sample data starts from row 2 column 1 to column 128
-this is normalized data

command
java -jar "C:\Users\kurtw_000\Downloads\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_normalized_data "C:\Users\kurtw_000\BTSync\temp" "llnl_tab.txt" 1 1 128 1 128 2

q:
what is the max and min value of normalized shannon entropy?

10-13-13
I need to make a letter of recommendation draft for Yung Chang
saved here
"F:\kurt\storage\Documents\Career\Letters of Recommendation\Letter of Recommendation Yung Chang 10-13-13.doc"

10-14-13
worked more with disease dataset from Josh

aov.ex1= aov(entropy~name2,data=table_of_summary_numbers)

performed analysis of variance in R software

fit <- aov(entropy_normalized_data ~ Classification, data=table_of_summary_numbers)
str(summary(fit))

code to obtain the analysis of variance for all numbers

fit <- aov(entropy ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(norm_entropy ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(min ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(cv ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(stdev ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(mean ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(median ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(fifth_percentile ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(ninety_fifth_percentile ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(entropy_normalized_data ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(normalized_entropy_normalized_data ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(max_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(min_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(stdev_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(mean_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(fifth_percentile_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(ninety_fifth_percentile_normalized ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(kurtosis ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(skew ~ Classification, data=table_of_summary_numbers)
str(summary(fit))
fit <- aov(dynamic_range ~ Classification, data=table_of_summary_numbers)
str(summary(fit))

Now I'll look at Muskan's human data
"F:\kurt\storage\CIM Research Folder\kwhittem\Records in CIM Folder\Categorical Records\Biodesign\Entropy of Immunosignature\human naive 2-27-12\human naive raw.txt"
The command will look something like this

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\kwhittem\Records in CIM Folder\Categorical Records\Biodesign\Entropy of Immunosignature\human naive 2-27-12" "human naive raw 2.txt" 2 1 323 1 323 8

It looks like the first time I looked at this data, I randomly picked 5 kids and 5 adults.
5 kids
c1_kid
c2_kid
c3_kid
c4_kid
c10_kid
5 adults
nc01t0
nc01t3
nc16t0
nc16t6
nc45t0

/home/josh/CIM/Research/labdata/jaricher/GFOD/results_processed.csv

command
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-14-13" "results_processed 10-14-13.txt" 0 1 14 1 14 2

"C:\Users\kwhittem\Desktop\temp\HT7-135 S3 B1 DUAL 09202013 S24 Sierra_b1.gpr"

cut -d : -f 5 /etc/passwd

10-15-13
Now I'll look at Krupa's valley fever data
S:\Administration\Biostatistics\Valley Fever\paper working directory\Random peptides paper\10K_v2\Original GPR's

command

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-15-13\valley fever 10kv2" "F647 Median"

10-16-13
10Kv2 disease data 10-12 to 3-13

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\10kv2 disease data 10-16-13" "F647 Median"

Questions:
What is PS202

another command for more normals

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\10kv2 disease data 10-16-13\extra normals" "F647 Median"

another command for dengue
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\10kv2 disease data 10-16-13\dengue" "F647 Median"

another command for Rebecca's monoclonal data
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\Rebecca Monoclonal Data\data" "F647 Median"

t-V5
t-P53Ab8
t-P53Ab1 <100 pM
t-none
t-LeuEnk 1.95 nM
t-cMyc
b-V5
b-p53Ab8
b-P53Ab1
b-none
b-LeuEnk
b-cMyc 80 nM

Why do some of the monoclonal antibody samples say t- and some say b-?

Valley fever longitudinal data.
In the tube# column in the Blinded set sheet. The study # indicates which person the tube # corresponds to and there are often multiple datapoints per person along with titer information.

0_13-4-CNS00048.gpr
The first number matches with the CF Titer followed by the case number in the training set sheet. The case number can be matched up with an individual in the PTID column, and a date difference.

command
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\valley fever longitudinal data\blinded samples" "F649 Median"

and
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-16-13\valley fever longitudinal data\training samples" "F649 Median"

10-17-13

I tried using Weka to use a support vector machine to classify disease and normal based on summary numbers. The classification was actually quite good (about 80% correctly classified).

SVM Classification with entropy only: 77.2%
SVM Classification with all summary numbers: 79.7%

I should start looking at the SVM data from weka more frequently

Classification of summary number data with weka 10-17-13

Now I can finish taking a look at Krupa's data.

Now I'll take a look at the Alzheimers data from Lucas.

command for analyzing the alzheimers data
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-17-13\alzheimers data" "F647 Median"

Now I'll look at some infectious diseases (some on both arrays)
10k
S:\Administration\PeptideArrayCore\2013 sample run\LLNL_set-5
4-5 were 2013
S:\Administration\PeptideArrayCore\2012 sample run\LLNL Samples
1-3

330k LLNL samples
S:\Research\CIM-HealthTell\Experiments\06252012 HTChipV4-22 Production Run 3 (HT-22)

some more infectious diseases
S:\Research\CIM-HealthTell\Experiments\06182012 HTchipV4-20 (HT-20)
S:\Research\CIM-HealthTell\Experiments\06262012 HTChipV4-25 Production run 4 (HT-25)

I'll start getting the data for these samples.

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\wafer 22" "F532 Median"

wafer 4 has already been copied

10-18-13

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2\wafer 20" "F532 Median"
start time: 0957
7*s*316 = 36 m 52 s
estimated end time around 1040

started looking at wafer 25
some of the gprs are wavelength 635 and some are 532

Unknown sample
4-22 S7 G3 Hi P60 092812 M4_0532 5um K.gpr
4-22 S7 G1 Hi P60 092812 I5_0532 5um K.gpr

SVM correctly classified 56.1% of the samples.

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2\wafer 25" "F532 Median"

wafer 20
svm could correctly classify 81.46% of the samples

wafer 22
svm could correctly classify 97.1% of the samples

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2\10k llnl" "F647 Median"

some sample location information can be found in these spreadsheets
"S:\Administration\PeptideArrayCore\2012 sample run\LLNL Samples\Sample Run LLNL Samples-set1.xlsx"
"S:\Administration\PeptideArrayCore\2012 sample run\LLNL Samples\Sample Run LLNL Samples-set2.xlsx"
"S:\Administration\PeptideArrayCore\2012 sample run\LLNL Samples\Sample Run LLNL Samples-set3.xlsx"
"S:\Administration\PeptideArrayCore\2013 sample run\LLNL-Set-4\LLNL-Sample Run-Set-4.xlsx"
"S:\Administration\PeptideArrayCore\2013 sample run\LLNL_set-5\LLNL-Sample Run-Set-5.xlsx"
"S:\Administration\PeptideArrayCore\2013 sample run\LLNL-Set-6_06202013\LLNL Set-6 Sample Run.xlsx"

I don't know what these samples are
10010979_bot_r1_07112012.gpr
10010981_top_rm5_07112012.gpr
10010976_top_m4_07112012.gpr

SVM Correctly classified 91.9% of the samples.

I'll try to run the remaining wafer 25 samples on my desktop computer

java -jar "C:\Users\Owner\Desktop\temp\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\Owner\Desktop\temp\wafer 25" "F635 Median" -Xmx6144

10-19-13

I would like to parse the large Mappix aligned files.
need to remove header from rows 1 to 31
need to extract column 27 for F532 Median

UNIX commands will be like this
awk 'NR >= 32' "test.gpr">temp.txt ; cut -f27 temp.txt>parsed/"test.gpr"

Okay now that everything is working I just need to get the commands together to parse all of the files

Some of the files have headers that end at line 27 and some at 32. Maybe I should just integrate this code with my Java code so that I can first find where F532 Median is found.

I added cygwin to my environment path variable. Now I should try to make sure everything works with the windows command prompt.
"C:\cygwin\bin\gawk.exe" 'NR >= 32' "C:/cygwin/home/kwhittem/test.gpr">"C:/cygwin/home/kwhittem/temp.txt" & cut -f27 "C:/cygwin/home/kwhittem/temp.txt">"C:/cygwin/home/kwhittem/parsed/test.gpr"

"C:\cygwin\bin\gawk.exe" 'NR >= 32' "test.gpr">"temp.txt" & cut -f27 "C:/cygwin/home/kwhittem/temp.txt">"C:/cygwin/home/kwhittem/parsed/test.gpr"

^I was not able to get the awk commands to work (or work quickly anyway) from the windows command line. I think I will just need to prepare the data on linux and then use my Java program after that (or just manually prepare the data if the amount of data is not too great). Definitely not the ideal situation, but I couldn't find a nice solution.

SVMs for idiots
http://www.cs.ucf.edu/courses/cap6412/fall2009/papers/Berwick2003.pdf

S:\Research\CIM-HealthTell\Experiments\06262012 HTChipV4-25 Production run 4 (HT-25)

In wafer 25 there are some samples that I cannot find the idea of. While the computer is doing some searches, I'll start trying to analyze some other datasets as well.

time course data
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Administration\PeptideArrayCore\2013 sample run\Normals-2013\Normals 1 month and 6 year 2 people" "F647 Median"

For the next dataset I would like to take a look at Bart's lymphoma samples.

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Users\kwhittem\temp2\bart_dog_lymphoma_1" "F649 Median"

Now I can take a look at Bart's other (2nd one I have) dog lymphoma dataset.

names at row 3 column 1 to column 25 and data at row 5 column 1 to column 25

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "F:\kurt\storage\CIM Research Folder\DR\2013\10-20-13\bart_dog_lymphoma_2" "CIM10K Dogs.txt" 3 1 25 1 25 5

I finished extracting all of the auto-aligned wafer 135 data. Now I can run the program!

java -jar "C:\Users\kurtw_000\BTSync\temp\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\BTSync\temp\temp2" "F532 Median"
^The program was done in less than 1 min once the columns had been extracted from those huge auto-aligned files.

Monoclonal ab mix experiment
They tested 8 different antibodies on the 330k
S:\Research\labdata\jaricher\newDecipher\Data for Database\Array Results\Monoclonals
They also mixed the 8 different antibodies together. Found here:
"S:\Research\labdata\jaricher\newDecipher\Data for Database\Array Results\VF and mAb mix\all.csv"
The epitopes for these 8 different antibodies is found here:
"S:\Research\labdata\jaricher\newDecipher\Data for Database\Array Results\VF and mAb mix\epitopes.xls"

Process for making a heatmap in JMP

open some data
use graph cell plot or follow the directions below
analyze->multivariate methods->cluster
choose all of the columns you want to be clustered and click ok
click the arrow next to heirarchical clustering and choose
choose two way clustering

I would like to analyze the monoclonal antibody data now.
I'll run a command for this spreadsheet.
"S:\Research\labdata\jaricher\newDecipher\Data for Database\Array Results\VF and mAb mix\all.csv"
I copied the spreadsheet to here
"S:\Research\Cancer_Eradication\Users\kwhittem\temp2\monoclonal\all.csv"
sample names go from column 1 to 285 and row 0. data goes from column 1 to 285 and row 1

command
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "S:\Research\Cancer_Eradication\Users\kwhittem\temp2\monoclonal" "all.txt" 0 1 284 1 284 1

while copying all my data from the F drive there was an error for 2 files

runtime_info_find_summary_numbers_from_tabd...

txt
40 bytes
10-14-13 2:42pm

-found in human naive 2-27-12 folder
table of data non-normalized test

10-21-13 4:10pm

-found in

10-22-13

10K_v1 valley fever, median normalized dataset
"F:\kurt\storage\CIM Research Folder\DR\2013\10-22-13\valley fever\10K_v1_Random_MedNorm(10,440).xlsx"

Info from Bart
The B and T refer to the top and bottom arrays of the slide. N and named dogs are normal. L and LSA are lymphoma.

info for timecourse
individual 84 caught a cold, around 20 days in, something like that.

10-23-13
Fulbright Scholarship

Would this be the right type of Fulbright scholarship for me?

The Fulbright U.S. Scholar Program sends American faculty members, scholars and professionals abroad to lecture and/or conduct research for up to a year.

or maybe this one
Fulbright-Hays Program

or maybe the Core Fulbright Scholar program

http://www.cies.org/us_scholars/us_awards/
eligibility info

http://www.cies.org/us_scholars/us_awards/Eligibility.htm

10-23-13

I want to extract the columns from the table containing 330k data including some monoclonal ab mix data.

I want to extract columns 1 to 284.

commands will be something like this
cut -f2 all.txt>out2.txt

spreadsheet for commands

"F:\kurt\storage\CIM Research Folder\DR\2013\10-23-13\commands for extracting columns 10-23-13d1118.xlsx"

program for extracting from monoclonal antibody data

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-23-13\Monoclonal ab data" "F532 Median"

p53 located at columns 258, 259, 264, 265, 268, 269
cut -f258 test.txt>out1.txt
cut -f259 test.txt>out2.txt
cut -f264 test.txt>out3.txt
cut -f265 test.txt>out4.txt
cut -f268 test.txt>out5.txt
cut -f269 test.txt>out6.txt

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Dropbox\ab mix" "F532 Median"

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data directory filename sample_name_row sample_name_column_start sample_name_column_end data_column_start data_column_end data_starting_row

java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_one_gpr "F:\kurt\storage\CIM Research Folder\DR\2013\10-23-13\Monoclonal ab data\ab mix experiment" "HT4-25 S6 D2 GRN-P80 RED P50 062113 p53Ab1_0635 SM.gpr" "F532 Median"

p53Ab1 epitope RHSVV
p53Ab8 epitope DLWKLL

10-25-13
Want to reserve room for defense

November 18th 2-5pm
need to email
[email protected]
Two middle rooms on 2nd floor
B262
A204
large room at the end of A building on 2nd floor
A250

Large middle room on 3rd floor
B362

Large room at the end of the A building on 3rd floor
A350

Conference room in atrium towards building B (north)
AL1-10/14

Conference room in atrium towards south entrance (A building entrance)
AL1-50

Need to contact committee members.

Bert Jacobs, Stephen Johnston, Phillip Stafford, Valerie Stout, Kathryn Sykes
[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Hi committee members,

We can hold my oral defense on November 18th 2-5pm in the auditorium. Of course we will not need the full 3 hr time period.

Note that in addition to presenting about screening the tumor cDNA library, a larger proportion of my defense will be oriented around the entropy "immune temperature" idea than you have seen in the past. I have introduced this concept several times in previous committee meetings, and we now have some interesting new data in this area. I will send out a copy of my dissertation closer to the time of the oral defense. Thanks for all of your help and suggestions over the years!

Best regards,
Kurt Whittemore

Graduate Student
Arizona State University
BIODESIGN INSTITUTE
Center for Innovations in Medicine
1001 S McALLISTER AVE
TEMPE, AZ 85287

10-26-13
I want to give a 330k disease and 330k normal to Lu so he can see what the minimum number of random peptides is necessary to distinguish between the groups.

A good chronic disease sample looks like sample 30, 28, 21, or 35. A good infectious disease sample looks like 39. A good normal sample looks like 81.

The infectious disease samples are actually DTRA samples, so maybe I will just give Lu a chronic disease and a normal.

81: 4-46 S7 E1 Hi P20 03262013 ND43 L.gpr
35: 4-46 S2 E2 Hi P20 BC009 110512 S.gpr

F:\kurt\storage\CIM Research Folder\DR\2013\8-10-13\azim58 wikispaces download as of 8-10-13

example command for Lu
java -jar "F:\some_path_here\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\some_path_here" "F532 Median"

location of time series
S:\Administration\PeptideArrayCore\2013 sample run\Normals-2013\Normals 1 month and 6 year 2 people