azim58 - my annotations from paper 4-16-13 dot 0909 Bioinformatic requirements for protein database searching using predicted epitopes from disease-associated antibodies Page: 1 Page: 1 Author: Kurt Whittemore Subject: The E-MAP process first involves re- construction of a predicted epitope using a peptide com- binatorial library. We then search the protein database for closely matching amino acid sequences. Date: 2013-04-14 14:38: 51-07 Page: 1 Author: Kurt Whittemore Subject: 1) short predicted epitopes yield too many irrelevant matches from a database search and 2) the epitopes may not accurately represent the native antigen with sufficient fidelity. Date: 2013-04-14 14:39:24-07 Page: 1 Author: Kurt Whittemore Subject: We find that epitopes generally need to have at least seven amino acids, with an overall accuracy of >70% to the native protein, in order to correctly identify the protein in a nonredundant protein database search. Date: 2013-04-14 14:40:34-07 Page: 1 Author: Kurt Whittemore Subject: we demonstrate the efficacy of paired epitope searches Date: 2013-04-14 14:40:52-07 Page: 1 Author: Kurt Whittemore Subject: It has not previously been pos- sible, however, to use these epitope motifs to identify un- known target proteins from the entire protein database. Date: 2013-04-14 14:45:33-07 Page: 1 Author: Kurt Whittemore Subject: A short motif (4–6 amino acids) does not possess enough in- formation content to uniquely identify a candidate antigen in broad bioinformatic searches. The list of search results re- trieved from the nonredundant (nr) database is usually large, with hundreds or thousands of hits effectively burying the true matching protein in the noise of extraneous results. Date: 2013-04-14 14:46:17-07 Page: 1 Author: Kurt Whittemore Subject: E-MAP is for identifying proteins that are immunoreactive with an antibody without any prior information as to the identity of that protein. Date: 2013-04-14 14:47:31-07 Page: 1 Author: Kurt Whittemore Subject: PSSM, position-specific scoring matrix; Date: 2013-04-14 15:05:10-07 Page: 2 Page: 2 Author: Kurt Whittemore Subject: Sticky note Date: 2013-04-14 14:49:54-07 etiology the cause or origin of a disease Page: 2 Author: Kurt Whittemore Subject: Phage libraries contained rationally designed combinatorial librar- ies of peptide sequences inserted into the N terminus of the cpIII minor coat protein of the M13 bacteriophage. The libraries were supplied by Dyax Corp. (Cambridge, MA). The libraries termed TN6 and TN10 contained two conserved cysteine residues separated re- spectively by four or eight amino acids. The cysteines formed a disulfide bridge, creating a conformationally constrained ring (10). Date: 2013-04-14 14:51:46-07 Page: 2 Author: Kurt Whittemore Subject: This negative depletion step removes phage that may bind to constant regions of mouse IgG. Date: 2013-04-14 14:52:57-07 Page: 2 Author: Kurt Whittemore Subject: Phage par- ticles that bound to the mAb-coated beads were eluted with 0.1 mol/L glycine-HCl (pH 2.2) containing 1 g/L bovine serum albumin Date: 2013-04-14 14:54:02-07 Page: 2 Author: Kurt Whittemore Subject: The variable regions of the inserts were transcribed into the FASTA form and submitted to MEME (Multiple EM for Motif Elicitation, available at meme.sdsc.edu/ meme). Date: 2013-04-14 14:58:51-07 Page: 2 Author: Kurt Whittemore Subject: To carry out bioinformatic searches using a single motif, the PSSM was submitted to the MAST (Motif-Alignment and Search Tool) utility (meme.sdsc.edu/meme), to be searched against the nr protein data- base while allowing a maximal E value (expectation value). The first 500 hits were then screened for the presence of the known target. F Date: 2013-04-14 15:00:43-07 Page: 3 Page: 3 Author: Kurt Whittemore Subject: The E value can then be thought to represent the expected number of sequences in a random database of equal size that would match the motif(s) at least as well. Date: 2013-04-14 15:03:17-07 Page: 3 Author: Kurt Whittemore Subject: short sequences of predefined length N were se- lected randomly from the NCBI nr protein sequence database. Date: 2013-04-14 19:10:28-07 Page: 3 Author: Kurt Whittemore Subject: These sequences were then used to construct a position specific probability matrix, with the degree of residue conservation at each position perturbed by a Gaussian function around the average conservation, C. Date: 2013-04-14 19:10:59-07 Page: 3 Author: Kurt Whittemore Subject: Sequence Generation Date: 2013-04-14 19:12:09-07 Page: 3 Author: Kurt Whittemore Subject: zoops model and user-defined restriction of the motif length. Date: 2013-04-14 19:13: 05-07 Page: 3 Author: Kurt Whittemore Subject: For pairwise searches, we set the MAST threshold expectation value (E value) to Epitope-mediated Antigen Prediction 10 and the threshold value for motif display to p 0.0001. Date: 2013-04-14 19:15:11-07 Page: 3 Author: Kurt Whittemore Subject: Sticky note Date: 2013-04-14 19:17:15-07 Does this represent all amino acids? Page: 3 Author: Kurt Whittemore Subject: MEME creates a consensus motif profile, capturing each phage clone’s sequence infor- mation in PSSM, a two-dimensional numeric array. Date: 2013-04-14 19:19:12-07 Page: 3 Author: Kurt Whittemore Subject: Using such a profile in a bioinformatic search offers distinct advantages. Instead of searching with a single “best-guess� query repre- senting the dominant motif, the queried profile considers a larger number of combinatorially weighted sequences, aver- aging around the dominant motif. Date: 2013-04-14 19:19:51-07 Page: 4 Page: 4 Author: Kurt Whittemore Subject: The combined statistical power of a pairwise search, however, is sufficient to narrow the list to a small number of antigen candidates. We use the MAST (Motif Align- ment and Search Tool) utility (16, 17) to perform single and pairwise motif searches against the nr protein database. Date: 2013-04-14 19:21:28-07 Page: 4 Author: Kurt Whittemore Subject: There are two important variables for identifying proteins from only short linear epitopes: 1) the length of the epitope and 2) the fidelity with which the predicted epitope matches the actual sequence in the protein database (average motif conservation). Date: 2013-04-14 22:06:32-07 Page: 4 Author: Kurt Whittemore Subject: For the process to yield accurate da- tabase matches, how long and accurate must the predicted epitope be? Date: 2013-04-14 22:06:54-07 Page: 5 Page: 5 Author: kwhittem Subject: Highlight Date: 2013-04-15 10:26:32-07