work analyzing most significant peptides 10-31-13

2014-08-29

++ work analyzing most significant peptides 10-31-13

Need to analyze the timecourse data in more detail.  I'd like to look at when they just analyzed the data over one month.  This seems to have occurred around November 2011.

Cluster order for individual 84 for one month

entropy
normalized_shannon_entropy
ninety_fifth_percentile
mean
median
fifth_percentile
stdev
max
fifth_percentile_normalized
min
min_normalized
kurtosis
skew
dynamic_range
ninety_fifth_percentile_normalized
cv
stdev_normalized
mean_normalized
max_normalized

-----
copying info

blinded samples created 10-26-13 5:39 PM too long
training samples 10-26-13 5:40 PM
final point by point response to send to reviewers 10-26-13 7:12 PM

-------

cluster order for individual 43

entropy
normalized_shannon_entropy
mean
median
fifth_percentile
ninety_fifth_percentile
stdev
max
min
min_normalized
fifth_percentile_normalized
kurtosis
skew
cv
stdev_normalized
max_normalized
mean_normalized
ninety_fifth_percentile_normalized
dynamic_range

---------

10-27-13

Datasets I want to include in final analysis
-mouse young vs old experiment
-young vs old data from Muskan
-Tiger's mouse tumor samples (FVBN time series data)
-Bart's dog lymphoma samples
-human normals at different ages (possibly; I think this data is across many different wafers so I may not want to include it)
-First Chip Disease Dataset
-LLNL dataset
-Valley fever data from Krupa (10kv2 data)
-Alzheimers data from Lucas
-same individual monitored over time
-antibody mix experiment from Josh
-antibody mix experiment from Heidi and Krupa
-Rebecca's monoclonal antibody data
-two antibodies mixed from Daniel

------
<notes on immune status without disease> specific peptides>
{

I'd like to see how the immunosignature changes when I take away the disease specific peptides.

I'll take a look at a chronic disease and an infectious disease on the 330k, and also a chronic and infectious disease on the 10k.

For the chronic disease 330k data I'll look at wafer 46 with breast cancer.
For the infectious disease 330k data I'll look at the First Chip Disease Dataset.

I'll just start with this for now.
}
-</notes on immune status without disease>

----------
Need to contact Juan Ramon Molina
jrmolina@cnio.es

---------
10-28-13

Investigation of significant peptides to immune temperature

I would like to remove the following numbers of peptides (both random and unique to immunosignature)
(there are a total of 330,173)
330173-y=100000
1
5
50
100
500
1000
5000
10000
100000
230173
320173
325173
329173
329673
330073
330123
330153
330168

X Axis goes from 0 to 6 by 1

Y axis goes from 0 to 3.5 by 0.5

1	0	330172
5	0.698970004	330168
50	1.698970004	330123
100	2	330073
500	2.698970004	329673
1000	3	329173
5000	3.698970004	325173
10000	4	320173
100000	5	230173
230173	5.362054378	100000
320173	5.505384705	10000
325173	5.512114478	5000
329173	5.517424206	1000
329673	5.51808338	500
330073	5.51861	100
330123	5.518675783	50

1
0.32
0.1
0.032
0.01
3.2E-03
1.0E-03
3.2E-04

What if I removed the top 5,000 highest intensity peptides, removed the top 50 lowest intensity peptides, kept the peptides ranked from  peptides 100 to 5,000 by intensity, kept peptides with a p-value <0.30 and then used the entropy to compare the two groups?
^This would take some time to do so I don't think I will do this for now.

need a scale for 5.511 to 5.52

325173 	5.512114 	5000
329173 	5.517424 	1000
329673 	5.518083 	500
330073 	5.51861 	100
330123 	5.518676 	50
330153 	5.518715 	20

0.001
0.000316228

1.00E-03
3.16E-04

Flu vaccination records
"S:\Administration\PeptideArrayCore\2011 sample run\30-Days-Normals\Tetanus samples.xlsx"

------------
pre 10-31-13 notes on analyzing most important peptides

1
5
50
100
500
1000
5000
10000
100000
230173
320173
325173
329173
329673
330073
330123
330153
330168

command for getting all summary numbers

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\sig 330073r" "intensity values of all gprs 2 10-27-13d1318.txt" 0 8 36 8 36 4

new command for high intensities
java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\sig 330073r" "intensity values of all gprs 2 10-27-13d1318.txt" 0 2 30 2 30 4

=TTEST(B2:B6,B7:B30,2,3)

What happens if we calculate the entropy using the top 1000 p-value peptides and then calculate the entropy?

for least significant peptides

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\least significant removed\" "intensity values of all gprs 2 10-27-13d1318.txt" 0 9 37 9 37 4

r r5 to r104
K 100-5000

KLI = Keep low intensity
LI = Low intensity
HI = High intensity
KP = Keep P-value

max-101,834 peptides have a p-value less than 0.3

remove if LI, HI, not KLI or not KP

excel function used
=IF(OR(AZ330178="HI",BA330178="LI"),"REMOVE",IF(OR(AY330178="KLI",BB330178="KP"),"KEEP","REMOVE"))

excel function 2
IF(OR(AY330178="KLI",BB330178="KP"),"KEEP","REMOVE")
^Based on sorting from n1 this ends up being the same as keeping all peptides with p-value less than 0.3.

103469 peptides removed

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\custom selected peptides" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\bot 1000 without bot 100" "intensity values of all gprs 2 10-27-13d1318.txt" 0 2 30 2 30 4

I was able to get the best p-value by selecting the bottom 1000 values for each sample by intensity and removing the the bottom 100.
I obtained a p-value of 0.000409 (-log10 of that equals 3.39)

What if I just chop off the top 5,000, and the bottom 100, and then remove least significant peptides?

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\chop off top 5000 and bottom 100" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4

p-value for entropy was

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\chop off top 5000 bottom 100 and pvalue greater than 0p7" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4

azim58wiki: