work analyzing most significant peptides 10-31-13

2014-08-29

++ work analyzing most significant peptides 10-31-13



Need to analyze the timecourse data in more detail. I'd like to look at when they just analyzed the data over one month. This seems to have occurred around November 2011.

Cluster order for individual 84 for one month

entropy
normalized_shannon_entropy
ninety_fifth_percentile
mean
median
fifth_percentile
stdev
max
fifth_percentile_normalized
min
min_normalized
kurtosis
skew
dynamic_range
ninety_fifth_percentile_normalized
cv
stdev_normalized
mean_normalized
max_normalized


copying info

blinded samples created 10-26-13 5:39 PM too long
training samples 10-26-13 5:40 PM
final point by point response to send to reviewers 10-26-13 7:12 PM



cluster order for individual 43

entropy
normalized_shannon_entropy
mean
median
fifth_percentile
ninety_fifth_percentile
stdev
max
min
min_normalized
fifth_percentile_normalized
kurtosis
skew
cv
stdev_normalized
max_normalized
mean_normalized
ninety_fifth_percentile_normalized
dynamic_range



10-27-13

Datasets I want to include in final analysis


<notes on immune status without disease> specific peptides>


I'd like to see how the immunosignature changes when I take away the disease specific peptides.

I'll take a look at a chronic disease and an infectious disease on the 330k, and also a chronic and infectious disease on the 10k.

For the chronic disease 330k data I'll look at wafer 46 with breast cancer.
For the infectious disease 330k data I'll look at the First Chip Disease Dataset.

I'll just start with this for now.



Need to contact Juan Ramon Molina
[email protected]


10-28-13


Investigation of significant peptides to immune temperature

I would like to remove the following numbers of peptides (both random and unique to immunosignature)
(there are a total of 330,173)
330173-y=100000
1
5
50
100
500
1000
5000
10000
100000
230173
320173
325173
329173
329673
330073
330123
330153
330168


X Axis goes from 0 to 6 by 1

Y axis goes from 0 to 3.5 by 0.5

1 0 330172
5 0.698970004 330168
50 1.698970004 330123
100 2 330073
500 2.698970004 329673
1000 3 329173
5000 3.698970004 325173
10000 4 320173
100000 5 230173
230173 5.362054378 100000
320173 5.505384705 10000
325173 5.512114478 5000
329173 5.517424206 1000
329673 5.51808338 500
330073 5.51861 100
330123 5.518675783 50


1
  1. 32
  2. 1
  3. 032
  4. 01
  5. 2E-03
  6. 0E-03
  7. 2E-04


What if I removed the top 5,000 highest intensity peptides, removed the top 50 lowest intensity peptides, kept the peptides ranked from peptides 100 to 5,000 by intensity, kept peptides with a p-value <0.30 and then used the entropy to compare the two groups?
^This would take some time to do so I don't think I will do this for now.


need a scale for 5.511 to 5.52

325173 5.512114 5000
329173 5.517424 1000
329673 5.518083 500
330073 5.51861 100
330123 5.518676 50
330153 5.518715 20


  1. 001
  2. 000316228

  1. 00E-03
  2. 16E-04


Flu vaccination records
"S:\Administration\PeptideArrayCore\2011 sample run\30-Days-Normals\Tetanus samples.xlsx"


pre 10-31-13 notes on analyzing most important peptides

1
5
50
100
500
1000
5000
10000
100000
230173
320173
325173
329173
329673
330073
330123
330153
330168

command for getting all summary numbers

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\sig 330073r" "intensity values of all gprs 2 10-27-13d1318.txt" 0 8 36 8 36 4

new command for high intensities
java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\sig 330073r" "intensity values of all gprs 2 10-27-13d1318.txt" 0 2 30 2 30 4


=TTEST(B2:B6,B7:B30,2,3)


What happens if we calculate the entropy using the top 1000 p-value peptides and then calculate the entropy?


for least significant peptides

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\least significant removed\" "intensity values of all gprs 2 10-27-13d1318.txt" 0 9 37 9 37 4

r r5 to r104
K 100-5000

KLI = Keep low intensity
LI = Low intensity
HI = High intensity
KP = Keep P-value

max-101,834 peptides have a p-value less than 0.3

remove if LI, HI, not KLI or not KP

excel function used
=IF(OR(AZ330178="HI",BA330178="LI"),"REMOVE",IF(OR(AY330178="KLI",BB330178="KP"),"KEEP","REMOVE"))

excel function 2
IF(OR(AY330178="KLI",BB330178="KP"),"KEEP","REMOVE")
^Based on sorting from n1 this ends up being the same as keeping all peptides with p-value less than 0.3.

103469 peptides removed

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\custom selected peptides" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\bot 1000 without bot 100" "intensity values of all gprs 2 10-27-13d1318.txt" 0 2 30 2 30 4

I was able to get the best p-value by selecting the bottom 1000 values for each sample by intensity and removing the the bottom 100.
I obtained a p-value of 0.000409 (-log10 of that equals 3.39)

What if I just chop off the top 5,000, and the bottom 100, and then remove least significant peptides?

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\chop off top 5000 and bottom 100" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4

p-value for entropy was

java -jar "C:\temp_sync\EntropyOfArray100913d0921.jar" find_summary_numbers_from_tabdelimitedtext_raw_data "C:\temp_sync\chop off top 5000 bottom 100 and pvalue greater than 0p7" "intensity values of all gprs 2 10-27-13d1318.txt" 0 11 39 11 39 4