Automatic Array Alignment

2014-08-29

azim58 - Automatic Array Alignment

Basic Idea
A program that would automatically align the id grid of spots with the
visual spots on the peptide arrays in CIM would be very beneficial. I am
fairly confident of a method that could align these spots. All of the
spots on the array would be found using DBSCAN. Next for each spot the
program would go through each DBSCAN spot and ask "Is this spot the spot
with an id of y?" An SVM would be used to answer this question. The SVM
would be trained with pre-aligned slides so that the DBSCAN data and the
aligned gal file would be given to it. Spot y would have a feature set
that included the position of the all the spots on the array and all of
the gal spot positions. If all of this data turns out to be too
computationally intensive, then certain spots could be chosen such as 100
spots surround spot y as well as the 100 spots in the four corners of the
array. If a spot is recognized to have an id of y by the SVM, then it is
given this id. After all of the spots have been checked, the program
would go through and use regression SVM to figure out where the id spots
should be placed since they weren't assigned to any recognized DBSCAN
spot. When everything is all done a gal file and gpr file could be
produced that had everything lined up. One additional feature that could
be added on later is a module of the program that would look for uneven
splotches in the image and mark all of the spots within this splotch as
"bad". At the end of the day, there would be a complete automatic
alignment program.
^Some updated thoughts about this as of 12-12-2013d1630
Note that an SVM cannot handle missing features so if there are any missing features on the array then it will not work. Therefore, some strategy to deal with this situation would be necessary. Perhaps there could be a variety of other ways of feeding the DBSCAN data to an SVM. For example, one could give an exact number of features to the SVM in a variety of ways. There could be several SVMs trained in several ways, and they could take a vote to decide whether something is a spot. For example, one could give the SVM the x,y coordinates of the 10 nearest spots; one could give the SVM how many total spots are to the right of the spot; how many total spots are to the left of the spot; how many total spots are above the spot; and how many total spots are below the spot (actually this may not do well at distinguishing spots that are close to eachother); the position of the 4 corner most spots (actually, this is probably also not very good since there would only be 4 points).
Giving the SVM the 10 nearest spots is probably the best solution, but I think multiple SVMs would have to be trained with all possible orderings of the 10 nearest spots so that the collective group of SVMs is not restricted to assigning an identity to a spot only if "feature 1" is in an approximate location. If "feature 2" is in that approximate location then the correct assignment would be made by one of the SVMs in the group. In other words, "feature 1" can be at any number of one of the 10 spot locations. This is necessary because there could be an extra stain on the array which makes all the subsequent features have a different assignment. For example, if the 4th point is a stain, then the real 4th point would be the 5th, the real 5th point would be the 6th, etc. Likewise, instead of a stain which adds an extra spot, there could be spots that are missing which would also change the normal order of the real spots. Therefore, it would be necessary to train multiple SVMs for the identity of a feature with each different svm trained with a different ordering of the closest 10 spots. In order to account for every possible ordering of the 10 closest spots, 10!=3,628,800 SVMs would need to be trained in order to assign the identity of one spot. Although this is a very large number, this approach may still be computationally feasible since the speed of one single SVM with only 10 features is virtually instantaneous on a standard PC. One of the beautiful aspects of this method is that the assignment of an identity to one feature is completely independent from the assignment of an identity to the other features. Therefore, even if there is very large bad region on the array which makes it impossible to assign an identity to a large group of spots, this would not interfere with the correct assignment of spots in a different good region of the array. For the spots that did not receive an assignment, a human could even go in and inspect these unassigned spots manually if desired.

Version of program before I handed it off to Nate found here (I'm not exactly sure what all of the changes that he made were)
"C:\Users\kurtw_000\Documents\kurt\storage\CIM Research Folder\kwhittem\Array Analysis Project\Java code\AutomaticSlideAlignment3 where I left Program to Nate"

The java and jar files used are also copied here

\\biofs.biodesign.asu.edu\CIM\Administration\Biostatistics\Kurt\Automatic Slide Alignment

Example Images of the Automatic Array Alignment Program Nate and I worked
on
"S:\Research\Cancer_Eradication\Users\kwhittem\kwhittem\kwhittem\Array
Analysis Project\Java code\AutomaticSlideAlignment4 Nate's
Work\MicroarrayAnalyzer5 Nates
Work\AutomaticSlideAlignmentEclipse2-26-10\output.html"
This program tried to align with a red reflect image of which a copy is
found here
"C:\kurt\storage\CIM Research Folder\kwhittem\Array Analysis Project\Java
code\AutomaticSlideAlignment3 where I left Program to
Nate\zbig_red_reflect_one_block_cropped2.jpg"
Note that this html file shows that the correlation coefficient with
human is 0.92, but as I remember the program was not updated this number
in the html file. I remember clearly that when things were aligned as
well as they are in this image that the correlation coefficient was 0.94.
Also see automatic array alignment program diagram here
"C:\kurt\storage\CIM Research Folder\DR\2012\12-12-12\Automatic Array
Alignment program diagram\Automatic Array Alignment Program Diagram
12-12-12.svg"

===========================================================================
Literature
Automatic DNA microarray gridding based on Support Vector Machines
http://www.researchgate.net/publication/224355444_Automatic_DNA_microarray_gridding_based_on_Support_Vector_Machines
Unsupervised SVM-based gridding for DNA microarray images.
http://www.researchgate.net/publication/38057390_Unsupervised_SVM-based_gri
dding_for_DNA_microarray_images

===========================================================================
Kevin did some work to automatically align the array, but he said that he
did not get that far. Nevertheless, his stuff can be found here:

S:\Administration\software\custom software\Slide Align
and here
C:\kurt\storage\CIM Research Folder\DR\2012\11-28-12\Kevin Slide
Alignment\SlideAlign