++ work refining and inserting figures for dissertation 11-17-13d1213
10-31-13
I would now like to analyze the human normals.
human normal data with ages located here
F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\human_normal_data_with_ages
-samples are F647 Median
I'll run my program on these files.
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\10-1-13\human_normal_data_with_ages" "F647 Median"
got directory info
formatted with regular expression
f
(\d\d/\d\d/\d\d\d\d)\s+(\d\d:\d\d\s\w\w)\s+(.+)
r
$1\t$2\t$3
fifties,forties,sixties,teens,thirties,twenties
----------
from 9/26/11 to 11/1/11
I would like to see if I can get better age data on the early 330k wafers.
----------
11-3-13
all human normals svm
Correctly Classified Instances: 77.9%, Kappa: 0.345, ROC Area: 0.642
Random Assignment (all samples classified as young)
Correctly Classified Instances: 71.8%, Kappa: 0, ROC Area: 0.5
all human normals j48graft tree
Correctly Classified Instances: 72.5%, Kappa: 0.277, ROC Area: 0.602
Random Assignment (all but 1 classified as young)
Correctly Classified Instances: 71.0%, Kappa: -0.0151, ROC Area: 0.455
all 330k normals svm
Correctly Classified Instances: 70%, Kappa: 0.385, ROC Area: 0.693
Random Assignment
Correctly Classified Instances: 43.5%, Kappa: -0.264, ROC Area: 0.378
all 330k normals j48graft tree
Correctly Classified Instances: 77.1%, Kappa: 0.529, ROC Area: 0.806
Random Assignment
Correctly Classified Instances: 54.3%, Kappa: -0.0596, ROC Area: 0.444
-----------
11-4-13
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2696.html
Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing
"F:\kurt\storage\CIM Research Folder\DR\2013\11-4-13\Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing.pdf"
DTRA will be present Tuesday and Wednesday
Volunteer 38 hepatitis vaccine 1-20-13 (1st boost on 4-21-13)
Volunteer 84 flu vaccine on day 17 of 30 day timecourse
Where is the 1/hour for 1 day?
Where is the 1/year for 6 years?
There are a total of 7 volunteers
References for general assay conditions and analytical methods
Where is the data for the 6 volunteers from 2009 who donated blood prior to and immediately following the yearly tripartite influenza vaccine?
6 volunteers that were a part of vaccine trial
112
113
33
43
73
84
Many patients had blood drawn pre, 1, 5, 7, 14, 21
{
Hi Dr. Stafford,
Dr. Johnston told me he thinks we have some more datasets that would be interesting for me to look at. I have several questions pertaining to normal donor info, the daily 1 month trial of Dr. Sykes and Dr. Johnston, hourly data, vaccine info, alzheimer patient control info.
Normal Donor Info
Do you have a spreadsheet of gender, ethnicity, and age info for the normal donors? Dr. Johnston was curious whether I could tell a difference between genders. I have found quite a bit of age information for the normal donors from the attached "Workbook3" that you gave me previously, but this does not have gender or ethnicity information. Actually is the name of all of the normal donors super confidential? I would be personally extremely interested to see if individuals who just qualitatively physically appear extremely young and healthy to me have a different "AbStat" then people who qualitatively physically appear extremely old and sick to me. I could look at this for my own personal curiosity if I had the normal donor names. .
Daily One Month Trial of Dr. Sykes and Dr. Johnston
Do you know where tha daily one month trial of Dr. Sykes and Dr. Johnston is located? I do not think this data is here:
S:\Administration\PeptideArrayCore\2013 sample run\Normals-2013
Hourly Data
I saw that you, Zbig, Daniel, and Stephen are writing a paper titled "Immunosignaturing of humoral activity in healthy humans". In this paper you present data for individuals who were monitored every hour for a day. Do you know where I can find this data?
Vaccine Info
In the same paper ("Immunosignaturing of humoral activity in healthy humans") I saw that you have data before and after vaccine for 6 individuals (112, 113, 33, 43, 73, 84) around days 1, 5, 7, 14, 21. Do you know where I can find this data?
Alzheimer Patient Control Info
Dr. Johnston told me that we have age info of the old normal control donors for the alzheimer experiments. I have the gpr names here:
S:\Administration\PeptideArrayCore\2012 sample run\ADNI-project\Good -gpr-files-ADNI
Do you know where I can find out the age information for these older people?
Thanks for any help and information you can offer me?
Best,
Kurt
}
Telomerase gene therapy in adult and old mice delays aging and increases longevity without increasing cancer
----------
11-5-13
I'll analyze the antibody mix experiment from Bart and Heidi
These are F647 Median samples
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\11-5-13\Antibody mix experiment from heidi and bart\gpr" "F647 Median"
Mouse Infection Experiments
Time Course on Version 1
F555 Median
java -jar "F:\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "F:\kurt\storage\CIM Research Folder\DR\2013\11-5-13\Mouse Infection Experiment\Time Course on Version 1\gpr files" "F555 Median"
\\biofs.biodesign.asu.edu\CIM\Administration\PeptideArrayCore\2012 sample run\Murine Influenza Strains-Bart\Final GPRs
------------
11-6-13
Infected vs Mock on V3
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-5-13\Mouse Infection Experiment\Infected vs Mock on V3\gpr files" "F647 Median"
entropy
normalized_shannon_entropy
max
min
cv
stdev
mean
median
fifth_percentile
ninety_fifth_percentile
max_normalized
min_normalized
stdev_normalized
mean_normalized
fifth_percentile_normalized
ninety_fifth_percentile_normalized
kurtosis
skew
dynamic_range
-------------
11-7-13
Outlining write-up of AbStat section
-mouse young vs old experiment
-young vs old data from Muskan
-before and after immunization
-Tiger's mouse tumor samples (FVBN time series data)
-Bart's dog lymphoma samples
-human normals at different ages (possibly; I think this data is across many different wafers so I may not want to include it)
-First Chip Disease Dataset
-LLNL dataset
-Valley fever data from Krupa (10kv2 data)
-Alzheimers data from Lucas
-same individual monitored over time
-antibody mix experiment from Josh
-antibody mix experiment from Heidi and Krupa
-Rebecca's monoclonal antibody data
-two antibodies mixed from Daniel
Methods for analyzing these datasets
-box and dotplots
-t test
-svm
-tree
-heatmap
some age affinity papers
lot's of fascinating articles from the search:
high entropy but low kolmogorov complexity
mutual information
"Useful information" is defined in Shannon's framework using the rate-distortion function. "meaningful information" is defined in the Kolmogorov framework using the Kolmogorov structure function.
How to quantify how ordered/sophisticated/complex something is?
Coffee cup example completely separated mixture, partially mixed, completely mixed
^I don't know the answer to this.
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-7-13\Mouse Infection Long Timecourse\gpr files 2" "F555 Median"
Bailey-Grossman Equation
network complexity index
measure of self-organization
Is the Primordial soup done yet?
http://vserver1.cscs.lsa.umich.edu/~crshalizi/Self-organization/soup-done/
------------
11-8-13
how to open a file with excel from right click menu in windows 8
1. Click the Orb {previously known as Start}.
2. Type: File Associations in the search box.
3. At the top of the search list click on: Change the file type associated with a file extension.
4. Click on the first item {.xla} listed as a Microsoft Office Excel ... file.
5. Click the Change program... button.
6. Click on Microsoft Office Excel, if presented, or click on Browse... and search until you find your Excel.exe file then click on it.
7. Repeat from step 4. until you've exhausted all the items listed as Microsoft Office Excel.
This should fix your problem unless things are seriously messed up!
The excel.exe and winword.exe are located in a directory like this:
C:\Program Files\Microsoft Office 15\root\office15
Dr. Cortese talk at Biodesign Auditorium November 13th noon
-------------
11-8-13
I'll start writing the ABSTAT section of my paper now.
246 day timecourse
?How many mice were there in the 246 day timecourse? What strain were they? What age were they when they were infected? What was the strain of influenza? Which arrays were these samples applied to? When were the weights collected?
6 day timecourse
?How many mice were there in the 6 day timecourse? What strain were they? What age were they when they were infected? What was the strain of influenza? What was the mock group injected with?
Heatmap of AbStat measures for 246 day mouse timecourse
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-7-13\Mouse Infection Long Timecourse\Mouse Infection Long Timecourse Cell Plot 2 11-8-13.png"
Linegraph of entropy over time for 246 day mouse timecourse
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-7-13\Mouse Infection Long Timecourse\Mouse Infection Long experiment table of summary numbers 11-7-13.xlsx"
-------------
11-9-13
I'll start working on writing my dissertation.
I should fix the heatmap so that it does not have normalized_entropy_normalized_data
The standard deviation and 95th percentile normalized start high and decrease with age, while the mean, median, and ninety fifth percentile start low, but then increase with age.
mice infected with the 2006-2007 influenza vaccine (06_INF_vacc), mice infected with the 2007-2008 influenza vaccine (07_INF_vacc), mice infected with influenza (infected), mice injected with killed PR8 (killed PR8), and a mock group of mice.
?which strain of influenza were the mice infected with? What is a more proper term for "killed PR8"? How was the mock group of mice treated? Was the sera taken on day 37 or 38? What was the exact composition of the vaccines?
figures
-Entropy for multiple mouse immunization experiment
-Dynamic range for multiple mouse immunization experiment
-Figure 6 Bar graph of SVM weight for measures from multiple mouse immunization experiment
A J48graft tree classification algorithm can only correctly classify 45% of the instances with a kappa statistic of -0.1 and an ROC area of 0.413.
62.5% of the instances with a kappa statistic of 0.25 and an ROC area of 0.625.
random assignment
55% with a kappa statistic of -0.0496 and an ROC area of 0.478.
None of the measures comparing the mock and infected on day 6 were statistically significant. The best p-value was 0.195 for the mean, and the min_norm, median, and entropy followed.
figures
Entropy for 6 day mouse timecourse
Mean for 6 day mouse timecourse
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-5-13\Mouse Infection Experiment\Time Course on Version 1\mouse infection timecourse v1 11-8-13.xlsx"
---Spiking antibody into sera (mouse and human)
---Monoclonal affinity data
---Two monoclonals mixed
The Polyclonal antibody was added to a 1:500 dilution of normal mouse sera in order to obtain the following final antibody concentrations: 0.1 nM, 1 nM, 2.5 nM, 5 nM, 10 nM, and 40 nM. There were two technical replicates for each condition.
a polyclonal IgG Mouse antibody against the human GFOD1 (glucose fructose oxidoreductase) protein
The order of the measures with the most significant p-values between normal mouse sera and 5 nM antibody is 95th percentile, 5th percentile, median, mean, entropy, and cv. The concentration of 5 nM was chosen as the comparison point because this is the point at which the trend in the curves reverses.
-Entropy for increasing antibody concentrations in normal mouse sera
-Mean for increasing concentrations of normal mouse sera
--"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-14-13\summary\table_of_summary_numbers with classification 10-14-13.xls"
-Entropy for increasing concentrations to 1 nM in normal human sera
-Entropy for increasing concentrations to 100 nM in normal human sera
-Mean for increasing concentrations to 1 nM in normal human sera
-Mean for increasing concentrations to 100 nM in normal human sera
-"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-5-13\Antibody mix experiment from heidi and bart\table_of_summary_numbers 11-5-13.xlsx"
For example, when comparing the 1 nM and 10 nM samples, the entropy measure
0.0253 compared to 0.140
the order should be entropy
min
AND(C3>C2,C4>C3)
=IF(AND(C3>C2,C4>C3),"increasing")
IF(AND(C3histogram options->choose count axis and set binwidth
0
6553.5
13107
19660.5
26214
32767.5
39321
45874.5
52428
58981.5
65535
(13192-4)/2=6594
column 6, column 9, and column 19
--------
11-11-13
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\power\entropy calc" "F647 Median"
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\power\random entropy calc" "F647 Median"
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\internet\entropy of internet network" "F647 Median"
How many nodes? 22963
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\internet\entropy of random internet network" "F647 Median"
internet
entropy normalized_shannon_entropy
8.396538087 0.731332764
random internet
entropy normalized_shannon_entropy
9.916903081 0.863754745
California road network: 1965206
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\california road network\entropy of california road network" "F647 Median"
entropy of california road network
entropy normalized_shannon_entropy
11.25258618 0.86222857
entropy of random california road network
entropy normalized_shannon_entropy
11.0176667 0.844227883
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\california road network\entropy of random california road network" "F647 Median"
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\protein interaction network\entropy of protein interaction network" "F647 Median"
Protein Interaction Network
entropy normalized_shannon_entropy
7.102698463 0.780471244
Random Protein Interaction Network
entropy normalized_shannon_entropy
11.0176667 0.844227883
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\protein interaction network\entropy of random protein interaction network" "F647 Median"
c elegans neural network
neg sum (entropy) normalized entropy
8.428979013 0.774394178
random c elegans network
neg sum (entropy) normalized entropy
8.435005328 0.795848508
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\airports\entropy of airport connections" "F647 Median"
Entropy of airport connections
entropy normalized_shannon_entropy
5.02502384 0.6014287
Entropy of random airport connections
entropy normalized_shannon_entropy
5.768033481 0.690357099
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Downloads\airports\entropy of random airport connections" "F647 Median"
types of entropy of complex networks: "degree distribution entropy, search information, target entropy, and road entropy."
from "Automation in proteomics and genomics: an engineering case-based approach"
http://books.google.com/books?id=OEYHLzTsEtwC&pg=PA163&lpg=PA163&dq=entropy+of+small-world+scale-free+networks&source=bl&ots=qfD49lyHqv&sig=4Tx2kWGnahxLPypQMTa8ocr-Stw&hl=en&sa=X&ei=N12BUrWzFaTBigLygYGQBQ&ved=0CDEQ6AEwAg#v=onepage&q=entropy%20of%20small-world%20scale-free%20networks&f=false
"We observe that the ensembles with �xed scale-free
degree distribution have smaller entropy than the ensembles with homogeneous degree distribution
indicating a higher level of order in scale-free networks."
from
The entropy of randomized network ensembles
Topological analysis and interactive visualization of biological networks and protein structures
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-11-13\The entropy of randomizaed network ensembles.pdf"
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-11-13\Scale-free and heirarchical structures in complex networks.pdf"
Title of document is Complex Networks: Small-world, scale-free and beyond
Complex Networks c Small-world, scale-free and beyond
Personal analysis of some network datasets 11-11-13
---------------
11-12-13
The first antibody is referred to as Ab1 and this antibody is a mouse IgG1 anti-h-p53 antibody for the "RHSVV" epitope from Millipore (Cat No CBL404). The second antibody is referred to as Ab2 and this antibody is a mouse IgG2a anti-h-p53 antibody for the "SDLWKL" epitope from Thermo Scientific (Cat no MA1-19055).
Data and graphs here
--- "F:\kurt\storage\CIM Research Folder\DR\2013\10-19-13\normal_timecourse\summary\table_of_summary_numbers with classification 10-19-13.xls"
--- "F:\kurt\storage\CIM Research Folder\DR\2013\10-19-13\normal_timecourse\summary\table of summary numbers with classification for one month 10-26-13.xlsx"
Which type of array was each sample run on? What was the name of the influenza vaccine they received?
decrease in 43 for the 2006 vaccination 25 days later (Figure 24), decrease in 84 for the 2009 vaccination 4 days later (Figure 26). No decrease in individual 43 for 2009 3 days later (Figure 25)
Figure 27 and Figure 28
Individual 84 reported catching a cold on day 17, and there was a very dramatic drop in the entropy on day 27 10 days later. The entropy stayed low for one more day before returning back to the previous level. Note that no blood was drawn on days 23-26 due to the individual's illness. These observations indicate that these measures of the overall peptide intensity distribution can capture the event of an individual catching a cold.
how to add a program to windows right click menu?
-didn't quickly find the answer
(Are there changes in the AbStat measurement during the course of a lymphoma which reduces the complexity of the antibody repertoire? During the course of a B cell lymphoma one particular antibody in the repertoire becomes dominant as one B cell proliferates uncontrollably and produces large amounts of the antibody)
-How many dogs were used? What type of dog? Male or female? What age?
-Why do I have two different dog lymphoma datasets?
The data for all of the measures for the two groups was input into a SVM algorithm, and the algorithm was able to correctly classify 78.8% of the instances with a Kappa statistic of 0.566 and a ROC area of 0.779. When the normal or LSA class was randomly assigned the SVM could only correctly classify 52.3% of the instances with a Kappa statistic of 0.0348 and a ROC area of 0.517.
figure 29 for entropy dotplot
figure 3 for entropy line graph
78.8 52.3
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-19-13\bart_dog_lymphoma1\summary\classification vs entropy box and dotplot 11-12-13.png"
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-19-13\bart_dog_lymphoma1\summary\table_of_summary_numbers with classification 10-19-13d0856.xls"
FVBN Time Series Data
Mouse cancer progression data (Are there changes in the AbStat measurement as mice develop cancer?)
--Mouse cancer progression data (Are there changes in the AbStat measurement as mice develop cancer?)
--Human disease data (Can the AbStat measurements distinguish between humans with and without disease on different platforms?)
---330K wafer data (20, 22, 25, and 46)
---LLNL 10K dataset
wafer 20
C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\wafer 20\summary
I'll show box and dot plot, heatmap, p-value bar chart, svm bar chart.
First I need to clean up the data a little.
location of data
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-12-13\FVBN Time Series\Entropy analysis for 2013 pilot study.xlsx"
What are the details of the cancer mice? What strain are they? What type of cancer do they develop, and when does this occur?
Figure 31 and 32
-------------
11-13-13
wafer 46
box and dotplot
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\box and dotplot clean 11-12-13d1725.png"
heatmap
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\wafer 46 clean heatmap Hierarchical Cluster.png"
p-value bar chart
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\table_of_summary_numbers_clean 11-12-13.xlsx"
svm bar chart
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\table_of_summary_numbers_clean 11-12-13.xlsx"
Statistical significance of measures comparing normal with disease
CD: I14 to I 28
A box and dotplot of the entropy for the groups is presented in figure 33, and a heatmap of all of the measures for each sample is presented in figure 34. The statistical significance of each measure when comparing disease and normal is presented in figure 35. Machine learning was also used to classify samples as disease or normal. The SVM weight of each measure is presented in figure 36, the J48graft tree is presented in figure 37, and machine learning statistics for actual and random class assignments is presented in Table 1.
normals to row 21
10K LLNL Dataset
boxplot
Box and dotplot of entropy for groups on 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\10k llnl box and dotplot clean 11-13-13d1301.png"
heatmap
Heatmap of Measures for samples on 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\10k llnl heatmap Hierarchical Cluster 11-13-13d1349.png"
p-value bar graph
Statistical significance of measures comparing normal with disease for 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\table_of_summary_numbers with classification clean 10-18-13.xlsx"
SVM bar graph
SVM weight of measures comparing normal with disease for 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\table_of_summary_numbers with classification clean 10-18-13.xlsx"
Tree
J48graft tree for 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\machine learning\machine learning clean\j48graft tree 20n 20d.PNG"
Table of statistics
Machine learning statistics for 10K
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\table_of_summary_numbers with classification clean 10-18-13.xlsx"
I would like to combine the cleaned data from many different wafers into one dataset and then analyze that.
I think I'll use the first chip disease data.
@ATTRIBUTE cv NUMERIC
@ATTRIBUTE entropy NUMERIC
@ATTRIBUTE normalized_entropy NUMERIC
@ATTRIBUTE max_normalized NUMERIC
@ATTRIBUTE min_normalized NUMERIC
@ATTRIBUTE stdev_normalized NUMERIC
@ATTRIBUTE mean_normalized NUMERIC
@ATTRIBUTE fifth_percentile_normalized NUMERIC
@ATTRIBUTE ninety_fifth_percentile_normalized NUMERIC
@ATTRIBUTE kurtosis NUMERIC
@ATTRIBUTE skew NUMERIC
@ATTRIBUTE dynamic_range NUMERIC
It seems strange to me that wafer 46 has some negative values. For example, this sample has a cv of -2.791345317
4-46 S5 C3 Hi P20 03222013 MM19 L.gpr
Are there negative intensity values in this file?
No there are not.
What happens when I run my program on this file?
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\test of program 11-13-13" "F532 Median"
The program output the same cv of -2.791345317.
I think there must be a glitch in the program. What value do I get when I do a manual calculations?
standard deviation divided by the mean
Mean should be 7155.7
stdev should be 9882.618
My program did not calculate the mean or the stdev correctly.
program mean: -5852.54
program stdev: 16336.45
At least it divided stdev by mean correctly!
The entropy is correct.
Here are the numbers which are not correct:
cv, stdev, mean, kurtosis, and skew
Would these numbers also be incorrect in another file in which there is not a negative cv?
I'll take a look at this sample:
4-46 S5 A2 Hi P20 03222013 MM19 L.gpr
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\test of program 2 11-13-13" "F532 Median"
Wow for this file all of the values match the manual calculation values!. . . Why would there be different results for two different gpr files? Did the program not have enough memory or something? (probably unlikely
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-9-13\EntropyOfArray092013\EntropyOfArray100913d0921.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\further program testing 11-13-13\more ram test" "F532 Median" -Xmx6144
Program runs okay from eclipse and gives the expected output.
Did all of the data get extracted?
5864
9064
Yes
As expected, the cv is still negative with more ram
It looks like I'll have to open the code up and do some debugging.
It looks like I found the problem! When I sum all the numbers, I will get a value that is greater than the maximum value
max value of an integer in java
2147483647
value of sum in non-working gpr
2362611367
value of sum in working gpr
1438479334
Wow this problem did not even occur to me until now. I guess I can use the integer
a long can store a value of
9223372036854775807
What is 330173*65535
21637887555
It looks like a long could easily handle my situation.
Now the program is completely fixed!
Where is this program stored?
C:\Users\kurtw_000\workspace
made a jar file
now I'll test it out
java -jar "C:\Users\kurtw_000\workspace\ImmunosignatureEntropyCode\AbStat111313d0951.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\further program testing 11-13-13\corrected program test 11-13-13d0953" "F532 Median"
The program works. Now I'll just clean up a bit.
Alright now that my program is working again, I can get back to working on my dissertation.
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\AbStat_Code_11-13-13\ImmunosignatureEntropyCode\AbStat111313d0951.jar" find_summary_numbers_from_folder_of_gprs "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\46\all_gprs_in_one_folder2" "F532 Median"
I also want to reanalyze the first chip disease dataset.
java -jar "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-13-13\AbStat_Code_11-13-13\ImmunosignatureEntropyCode\AbStat111313d0951.jar" find_summary_numbers_from_tabdelimitedtext_normalized_data "C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh 2" "llnl_tab.txt" 1 1 128 1 128 2
-----------
11-14-13
I'll check the maximum sum value in the llnl dataset
2147483647
No the llnl dataset does not have any samples exceeding the limit.
row 91
HT4-46 S1 B2 Hi P20 103112 #1-1 S.gpr
row 93
HT4-46 S1 D1 Hi P20 103112 #4-7 K.gpr
row 2
4-22 S4 A1 Hi P60 092612 #8-15 5um K.gpr
Okay now that I've reanalyzed First Chip Disease Dataset, I would like to repaste the figures into my dissertation
First Chip Disease Dataset
Box and dotplot of entropy for groups in first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\first chip disease dataset box and dot plot no hnp 111413d1108.png"
Heatmap of Measures for samples in first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\first chip disease dataset no hnp heatmap heirarchical cluster with key 111413d1142.png"
Statistical significance of measures comparing normal with disease in first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\first chip disease dataset clean 11-13-13.xlsx"
SVM weight of measures comparing normal with disease in first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\first chip disease dataset clean 11-13-13.xlsx"
J48graft tree for first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\machine learning\j48graft tree no hnp equal d and n 111413d1419.PNG"
Machine learning statistics for first chip disease dataset
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\first chip disease dataset clean 11-13-13.xlsx"
-------
Now I'll get wafer 46 back into my dissertation now that it has the correct values.
Box and dotplot of entropy for groups for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\box and dotplot clean 11-12-13d1725.png"
Heatmap of Measures for samples for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\corrected_values_111313\wafer 46 heatmap heirarchical cluster 111413d1052.png"
Statistical significance of measures comparing normal with disease for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\corrected_values_111313\table_of_summary_numbers_with_classification 111413d0928.xls"
SVM weight of measures comparing normal with disease for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\corrected_values_111313\table_of_summary_numbers_with_classification 111413d0928.xls"
J48graft tree for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\corrected_values_111313\machine learning\j48graft tree output only bc and mm 111413d1024.png"
Machine learning statistics for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-10-13\entropy\new summary 10-12-13\corrected_values_111313\table_of_summary_numbers_with_classification 111413d0928.xls"
Okay now I just need to make sure that none of the samples in the 10k llnl dataset surpassed the int limit.
Actually, I know that none of them could have since even if all of the peptides were maxed out and then some, they would not surpass the java int limit.
65535*30000.0=1,966,050,000
java int max = 2,147,483,647
Now I can get the rank and ranges from the 10k llnl dataset.
------------
I would like to get the Alzheimers data into the paper now.
Box and dotplot of entropy for groups for Alzheimer's disease
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\alzheimers data\summary\graphs\entropy.tiff"
Heatmap of Measures for Alzheimer's disease
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\alzheimers data\summary\alzheimer's heatmap 111413d2217.png"
Statistical significance of measures comparing normal with Alzheimer's disease
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\alzheimers data\summary\table_of_summary_numbers with classification.xls"
Machine learning statistics for wafer 46
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\alzheimers data\summary\table_of_summary_numbers with classification.xls"
---------
Changes in entropy measure with removal of peptides
P-value vs peptides removed
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-31-13\important peptide analysis pre 10-31-13\graphs and diagrams\Peptides removed log zero to six layers merged 2.png"
P-value vs peptides removed with 5,000 peptides remaining
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-31-13\important peptide analysis pre 10-31-13\graphs and diagrams\log10 5p51 to 5p52 peptides removed 2 10-30-13.png"
P-value of most significant peptide vs significant peptides removed
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-31-13\important peptide analysis pre 10-31-13\graphs and diagrams\P-value of MS Peptide vs S Peptide Removed 2 10-30-131048.png"
P-value vs least significant peptides removed
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-31-13\important peptide analysis pre 10-31-13\graphs and diagrams\log10 zero to six least significant removed 2.png"
P-value of least significant peptide vs peptides removed
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-31-13\important peptide analysis pre 10-31-13\graphs and diagrams\P-value of LS Peptide vs Peptides Removed 2.png"
-------
Now I'll put in the figures for the young and aged mice
Box and dotplot of entropy for young and aged mice
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\table_of_summary_numbers_clean_graph_100913d1421.png"
Heatmap of Measures for young and aged mice
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\young and aged mice heatmap Hierarchical Cluster 11-15-13d0919.png"
Statistical significance of measures comparing young and aged mice
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\table_of_summary_numbers_100913d1409.xlsx"
I'd like to show the Peptide intensity histogram for a young and old sample as well.
For the old sample I'll choose the sample with an entropy of 7.335901881 (406971_top-GRP6_01182012_old.gpr)
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\aged mouse Distribution 111513d0927.png"
For the young sample I'll choose the sample with an entropy of 7.075669091 (407167_bot-GRP2_01182012_young.gpr)
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\young mouse Distribution 111513d0930.png"
axis setteings: 1000 y by 200 and 20,000 x
Young and aged mouse peptide intensity histograms
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\young and aged mouse distributions 11-15-13.svg"
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\9-26-13\old and young mice gpr files\summary\young and aged mouse distributions 11-15-13.png"
-------------
Now I'll put in the human young and aged data
With my young and old data I had some normals from wafer 20, 22, 25, and 46.
I know I need to get the data for wafer 46 again since the program had the maximum java int glitch before. What about wafer 20, 25, and 22?
I'll check them one at a time.
wafer 20
S:\Research\CIM-HealthTell\Experiments\06182012 HTchipV4-20 (HT-20)
1,2,3,4,5,6,7,8,9,10,11,12,
wafer 22
S:\Research\CIM-HealthTell\Experiments\06252012 HTChipV4-22 Production Run 3 (HT-22)
1,2,3,4,5,6,7,8,10
wafer 25 (F
S:\Research\CIM-HealthTell\Experiments\06262012 HTChipV4-25 Production run 4 (HT-25)
1,2,3,4,5,6,7(no gprs),8,9(no gprs),10
commands will be something like this
wafer 20
java -jar "S:\Administration\Biostatistics\Immunosignature Entropy\Code\updated code 11-14-13\AbStatCode\AbStat111413d1633.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Discovering tumor specific antigens\temp\wafer 20" "F532 Median"
-complete
--yes this has intensities which surpass the max.
--wafer 20 has been updated in the human age spreadsheet.
wafer 25
java -jar "S:\Administration\Biostatistics\Immunosignature Entropy\Code\updated code 11-14-13\AbStatCode\AbStat111413d1633.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Discovering tumor specific antigens\temp\wafer 25" "F532 Median" -Xmx2000
^I'll just use F532 Median
java -jar "S:\Administration\Biostatistics\Immunosignature Entropy\Code\updated code 11-14-13\AbStatCode\AbStat111413d1633.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Discovering tumor specific antigens\temp\wafer 25 F532 median" "F532 Median"
wafer 22
java -jar "S:\Administration\Biostatistics\Immunosignature Entropy\Code\updated code 11-14-13\AbStatCode\AbStat111413d1633.jar" find_summary_numbers_from_folder_of_gprs "S:\Research\Cancer_Eradication\Discovering tumor specific antigens\temp\wafer 22" "F532 Median"
-complete
-----------
11-16-13
Now I'd like to insert the young and aged human data
Box and dotplot of entropy for young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\young and aged humans box and dotplot 111513d1649.png"
Heatmap of Measures for young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\young and aged human heatmap Hierarchical Cluster 111513d1700.png"
Statistical significance of measures comparing young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\human normals of many ages 11-5-13d1510.xlsx"
SVM weight of measures comparing young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\human normals of many ages 11-5-13d1510.xlsx"
J48graft tree for young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\machine learning\j48graft tree for young and aged humans 111513d2023.PNG"
Machine learning statistics for young and aged humans
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\human normals of many ages 11-5-13d1510.xlsx"
--------
I would like to do a quick test of ethnicity.
a useful search
search in a scale-free network
Interesting paper
Scale-Free Networks are Ultrasmall
https://www.google.com/#q=search+in+a+scale-free+network
Interesting paper
Search in spatial scale-free networks
-http://users.phys.psu.edu/~ralbert/pdf/Paper_NJP245866SPE.pdf
-"many real-world net-
works evolve to inherently facilitate decentralized search"
Scale-free networks in scientific american
http://www.barabasilab.com/pubs/CCNR-ALB_Publications/200305-01_SciAmer-ScaleFree/200305-01_SciAmer-ScaleFree.pdf
----------
Chinese Indian comparison
I'll just show the machine learning statistics.
8 Chinese
12 Indian
all under 40
Machine learning statistics for Chinese and Indian nationality
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\human normals of many ages 11-5-13d1510.xlsx"
Now I would like to get the rank and normal ranges of the measures.
I would like to go back and get the stdev, the mean +/- 1 stdev, the mean +/- 2 stdev for each measure.
Actually I think I'll just combine all the entropy values from multiple experiments, and then determine where the mean +/- 1 stdev lie.
C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-13-13\disease data from josh\results\summary\table_of_summary_numbers with classification 10-14-13d0928.xlsx
C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\10-17-13\id some on both arrays\10k llnl\summary\table_of_summary_numbers with classification clean 10-18-13.xlsx
C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\DR\2013\11-2-13\human normals of many ages 11-2-13\corrected values\human normals of many ages 11-5-13d1510.xlsx
C306:C478
combined data from 4 experiments
D mean 8.08132668
D STDEV 0.624594534
D+STDEV 8.705921214
D-STDEV 7.456732146
D+2STDEV 9.330515748
D-2STDEV 6.832137611
N mean 7.789545795
N STDEV 1.010608064
N+STDEV 8.800153858
N-STDEV 6.778937731
N+2STDEV 9.810761922
N-2STDEV 5.768329668
Now I'd like to make this graph again without the young and aged data.
disease
B2:B268
normal
B269:B347
--Rank of Measures (Which metrics of the AbStat measures provide the most information about healthy and disease states?)
--Range of Entropy (What is the typical range of entropy in healthy and disease states?)