PCR to screen for selected frameshifts and gene fusions 4-11-13
2015-01-13azim58 - PCR to screen for selected frameshifts and gene fusions 4-11-13
Chimeric transcript primer order form 4-18-13
I would like to screen two pools (each pool consisting of 1/2 of the 41
pools from the PCR plates) for SMC1fs as well as the 13 validated mouse
chimeric transcripts from Hojoon.
SMC1 WT Size: 464 bp
SMC1fs Size: 162 bp
Primers
SMC1-mou-RV: GAGCTGTCCTCTCCTTG
SMC1-mou-FWD: CTGTCATGGGTTTCCTG
see Primers for amplifying SMC1fs
Need to find 13 validated mouse chimeric transcript primers
see List of Chimeric Transcripts from Hojoon
primers found here
C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion
info\Mouse_Human_primers_for_mouse.xls
"C:\kurt\storage\CIM Research
Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3
from Hojoon's dissertation.xlsx"
First I need to find the ID that matches with the fusion in Table 3.3.
Then I need to find the primers that match with this ID.
Fusions in Table 3.3: "C:\kurt\storage\CIM Research
Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3
from Hojoon's dissertation.xlsx"
IDs can be found here:
"C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion
info\Total_FG_peptide_annotation2-1.xlsx"
Not all of the IDs could be found in this file
These chimeric transcripts
Rnf139 + Ndufb9
Lats2 + Xpo4
Mia1+Rab4b
were here
S:\Research\Cancer_Eradication\Candidate_List\96Pep_chip_0820_2009.txt
I need to find a few more transcripts
Tmem170 + Cfdp1
Slc35a3 + Hiat1
Noslap + EG665574
Samd5 + Sash1
I should search for these here
S:\Research\Cancer_Eradication\
but not here
S:\Research\Cancer_Eradication\file swap folder
file information for Sash1 search
"C:\kurt\storage\CIM Research Folder\DR\2013\4-12-13\searching
transcripts\file information for Sash1 search 4-12-13.txt"
file information for Cfdp1 search
"C:\kurt\storage\CIM Research Folder\DR\2013\4-12-13\searching
transcripts\Cfdp1 search.txt"
I'm kind of stuck for now.
I'll email Hojoon 4-13-13 about this
===========================================================================
Noslap + EG665574
seems to correlate with
NOS1AP_^Exon10_FLJ13137_^Exon1, TR_E10176
in
"C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion
info\Total_FG_peptide_annotation2-1.xlsx"
because the accession NM_001085375.1 matches with homo sapien C1orf226
nucleotide sequence and a search for EG665574 also matches with the same
C1orf226 from the HomoloGene database
Tmem170 + Cfdp1
Slc35a3 + Hiat1
Samd5 + Sash1
I could try to retrieve a batch of accessions using batch entrez
http://www.ncbi.nlm.nih.gov/sites/batchentrez
used these accessions
"C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript
search\accessions_4-13-13.txt"
obtained these results
"C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript
search\accession_search_nucleotide database.gb"
and
"C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript
search\accession_search_gene_database.txt"
but neither of these files contained these search terms
Tmem170 + Cfdp1
Slc35a3 + Hiat1
Samd5 + Sash1
===========================================================================
4-16-13
Alright Hojoon sent me a spreadsheet that looks like it has what I will
need.
"C:\kurt\storage\CIM Research
Folder\DR\2013\4-16-13\64_primer_fusion_for_mouse.xlsx"
Now I can order the primers fresh for the 13 chimeric transcripts that I
would like to screen in my tumor cDNA library.
Since I can't find the Cnpy2-Cs annotation, I think I will just design
these primers myself
First accession of transcript
NM_014255.4
Second accession of transcript
NM_004077.2
Cnpy2_section_protein_sequence
MKGWGWLALLLGALLGTAWARRSQDLHCGACRALVDELEWEIAQVDPKKTIQMGSFRINPDGSQSVVE
Cs_section_protein_sequence
NASCLVLAARHASASSTNLKDILADLIPKEQARIKTFRQQHGKTVVGQITVDMMYGGMRGMKGLVYETSVLDPDE
GIRFRGFSIPECQKLLPKAKGGEEPLPEGLFWLLVTGHIPTEEQVSWLSKEWAKRAALPSHVVTMLDNFPTNLHP
MSQLSAAVTALNSESNFARAYAQGISRTKYWELIYEDSMDLIAKLPCVAAKIYRNLYREGSGIGAIDSNLDWSHN
FTNMLGYTDHQFTELTRLYLTIHSDHEGGNVSAHTSHLVGSALSDPYLSFAAAMNGLAGPLHGLANQEVLVWLTQ
LQKEVGKDVSDEKLRDYIWNTLNSGRVVPGYGHAVLRKTDPRYTCQREFALKHLPNDPMFKLVAQLYKIVPNVLL
EQGKAKNPWPNVDAHSGVLLQYYGMTEMNYYTVLFGVSRALGVLAQLIWSRALGFPLERPKSMSTEGLMKFVDSK
SG
CNPY2_nucleotide_seq
ctggactgctcgctggccggcagcgcaccgttttgaaggtcctagcccacctgggctggctcacgcgcacgacta
gccgctcccatacagcacgcccggactctgtcgtcgcttaaggccactcctattctacggctgacccctggtggt
cacgtggatctgttcgccacgcaagtctgggtccttcggcgattgaccggggtccttgctgttcgggagcctctc
ctaagctgcctgttcgcgcgagagtttggaggggcgggtttggggtcggtgtctgattggggctcgcaccgcagc
acgctggagtcccgcttaggtaccagttagcgtcaggggagctgggtcaggcggtcgccgggacaccccgtgtgt
ggcaggcggcgaagcgctctggagaatcccggacagccctgctccctgcagccaggtgtagtttcgggagccact
ggggccaaagtgagagtccagcggtcttccagcgcttgggccacggcggcggccctgggagcagaggtggagcga
ccccattacgctaaagatgaaaggctggggttggctggccctgcttctgggggccctgctgggaaccgcctgggc
tcggaggagccaggatctccactgtggagcatgcagggctctggtggatgaactagaatgggaaattgcccaggt
ggaccccaagaagaccattcagatgggatctttccggatcaatccagatggcagccagtcagtggtggaggtgcc
ttatgcccgctcagaggcccacctcacagagctgctggaggagatatgtgaccggatgaaggagtatggggaaca
gattgatccttccacccatcgcaagaactacgtacgtgtagtgggccggaatggagaatccagtgaactggacct
acaaggcatccgaatcgactcagatattagcggcaccctcaagtttgcgtgtgagagcattgtggaggaatacga
ggatgaactcattgaattcttttcccgagaggctgacaatgttaaagacaaactttgcagtaagcgaacagatct
ttgtgaccatgccctgcacatatcgcatgatgagctatgaaccactggagcagcccacactggcttgatggatca
cccccaggaggggaaaatggtggcaatgccttttatatattatgtttttactgaaattaactgaaaaaatatgaa
accaaaagt
- chimeric transcript protein start 542 AAG^ATGA
- chimeric transcript protein end 746 GAG^GTG
- cnpy2 chimeric transcript portion
CTGGACTGCTCGCTGGCCGGCAGCGCACCGTTTTGAAGGTCCTAGCCCACCTGGGCTGGCTCACGCGCACGACTA
GCCGCTCCCATACAGCACGCCCGGACTCTGTCGTCGCTTAAGGCCACTCCTATTCTACGGCTGACCCCTGGTGGT
CACGTGGATCTGTTCGCCACGCAAGTCTGGGTCCTTCGGCGATTGACCGGGGTCCTTGCTGTTCGGGAGCCTCTC
CTAAGCTGCCTGTTCGCGCGAGAGTTTGGAGGGGCGGGTTTGGGGTCGGTGTCTGATTGGGGCTCGCACCGCAGC
ACGCTGGAGTCCCGCTTAGGTACCAGTTAGCGTCAGGGGAGCTGGGTCAGGCGGTCGCCGGGACACCCCGTGTGT
GGCAGGCGGCGAAGCGCTCTGGAGAATCCCGGACAGCCCTGCTCCCTGCAGCCAGGTGTAGTTTCGGGAGCCACT
GGGGCCAAAGTGAGAGTCCAGCGGTCTTCCAGCGCTTGGGCCACGGCGGCGGCCCTGGGAGCAGAGGTGGAGCGA
CCCCATTACGCTAAAGATGAAAGGCTGGGGTTGGCTGGCCCTGCTTCTGGGGGCCCTGCTGGGAACCGCCTGGGC
TCGGAGGAGCCAGGATCTCCACTGTGGAGCATGCAGGGCTCTGGTGGATGAACTAGAATGGGAAATTGCCCAGGT
GGACCCCAAGAAGACCATTCAGATGGGATCTTTCCGGATCAATCCAGATGGCAGCCAGTCAGTGGTGGAG
Cs_nucleotide_seq
agtgggcggggcctccttgaggaccccgggctgggcgccgccgccggttcgtctactctttccttcagccgcctc
ctttcaaccttgtcaacccgtcggcgcggcctctggtgcagcggcggcggctcctgttcctgccgcagctctctc
cctttcttacctccccaccagatcccggagatcgcccgccatggctttacttactgcggccgcccggctcttggg
aaccaagaatgcatcttgtcttgttcttgcagcccggcatgccagtgcttcctccacgaatttgaaagacatatt
ggctgacctgatacctaaggagcaggccagaattaagactttcaggcagcaacatggcaagacggtggtgggcca
aatcactgtggacatgatgtatggtggcatgagaggcatgaagggattggtctatgaaacatcagttcttgatcc
tgatgagggcatccgtttccgaggctttagtatccctgaatgccagaaactgctacccaaggctaagggtgggga
agaacccctgcctgagggcttattttggctgctggtaactggacatatcccaacagaggaacaggtatcttggct
ctcaaaagagtgggcaaagagggcagctctgccttcccatgtggtcaccatgctggacaactttcccaccaatct
acaccccatgtctcagctcagtgcagctgttacagccctcaacagtgaaagtaactttgcccgagcatatgcaca
gggtatcagccgaaccaagtactgggagttgatttatgaagactctatggatctaatcgcaaagctaccttgtgt
tgcagcaaagatctaccgaaatctctacagagaaggcagcggtattggggccattgactctaacctggactggtc
tcacaatttcaccaacatgttaggctatactgatcatcagttcactgagctcacgcgcctgtacctcaccatcca
cagtgaccatgagggtggcaatgtaagtgcccataccagccatttggtgggcagtgccctttccgacccttacct
gtcctttgcagcagccatgaacgggctggcagggcctctccatggactggcaaatcaggaagtgcttgtctggct
aacacagctgcagaaggaagttggcaaagatgtgtcagatgagaagttacgagactacatctggaacacactcaa
ctcaggacgggttgttccaggctatggccatgcagtactaaggaagactgatccgcgatatacctgtcagcgaga
gtttgctctgaaacacctgcctaatgaccccatgtttaagttggttgctcagctgtacaagattgtgcccaatgt
cctcttagagcagggtaaagccaagaatccttggcccaatgtagatgctcacagtggggtgctgctccagtatta
tggcatgacggagatgaattactacacggtcctgtttggggtgtcacgagcattgggtgtactggcacagctcat
ctggagccgagccttaggcttccctctagaaaggcccaagtccatgagcacagagggtctgatgaagtttgtgga
ctctaagtcagggtaaaactggagactgggtgaaagtgactaccagaaagtgaggaagcctaaataaaaagtata
cttttgtttcagggggcctttaaagacttaagattaaattatatctgaggcactgataatatgtttgaggttaaa
atataaattaagactttaaaagatgaaaaatggtcccttcttccctaatcagctcccttcccctgcctggtatga
gttgcccatcatacgcatggtcctggaggatgaccaggactaatgcatgtggtatgagtaggtttggccccctca
ctatctctagagtgagaatctggctcctgtttccatgggtcaaagccggttgcagagaatctgtagtcactttgg
agctttagcttctctgccaagccctcaataagccagcaaaccaggactctgccccttctgtttccataggaatca
tgttggatagtcagctgtaccaagccccttggccctctcccatgcacacaaacacctcctagcaagacctgttgg
ttagctggacatgctttggcaatttttttatactaccaagtgaccataaaggcatggcatttgttgtgactggca
cccaatgtttgattttttttttaaaactatccaattaaaattaaggtctgggagtgttctgtttcccattacttt
aatactcacctcctcccagactttctacacctgttgcacctcaggcagaggatgttctggacctccccctcttgg
tccctactagagacctctcaacagatctgtgggcccagtcattgggttttatcagtgcttaatgtgaactaagtt
ttttacttccacagaatacaagccactaccttctgacctccccaccccccaccaacccccatcttttaatatgct
gtggggcatagaactccggaatgaccagcatgatattttcagagtcttgtccccggggtattagcacctcttttt
gaacagggaattgattcaagattggacatggtctcctctgattatcaggtactggggctgagggcattaaaaata
gtaagcctccctcctcgtcccctgcctcaagaaattgcctccttatttatcaacatctttttcctccctttccct
gagagctcacagtacaatgtttcagaagccccatttgcacaggttttcagcaactcagaatgctctacttctttt
tctttgagaaaggattaagatacactcctgctgtgcccccatctttcctccaaactcctgcctgtgtttgtgtgg
atacccagtcccagaaccacactgttgagttggacacactgtaaacccctgggtaactgtcaagtcatgatggag
acttcaggttgttctgtataaaatgcaaaataaatgtttttattaacaatgaaaaaaaaaaaaaaaaaaaaa
- chimeric transcript protein start 233 CAAG^AAT
- chimeric transcript protein end 1589 GGG^TA
- cs chimeric transcript portion
AATGCATCTTGTCTTGTTCTTGCAGCCCGGCATGCCAGTGCTTCCTCCACGAATTTGAAAGACATATTGGCTGAC
CTGATACCTAAGGAGCAGGCCAGAATTAAGACTTTCAGGCAGCAACATGGCAAGACGGTGGTGGGCCAAATCACT
GTGGACATGATGTATGGTGGCATGAGAGGCATGAAGGGATTGGTCTATGAAACATCAGTTCTTGATCCTGATGAG
GGCATCCGTTTCCGAGGCTTTAGTATCCCTGAATGCCAGAAACTGCTACCCAAGGCTAAGGGTGGGGAAGAACCC
CTGCCTGAGGGCTTATTTTGGCTGCTGGTAACTGGACATATCCCAACAGAGGAACAGGTATCTTGGCTCTCAAAA
GAGTGGGCAAAGAGGGCAGCTCTGCCTTCCCATGTGGTCACCATGCTGGACAACTTTCCCACCAATCTACACCCC
ATGTCTCAGCTCAGTGCAGCTGTTACAGCCCTCAACAGTGAAAGTAACTTTGCCCGAGCATATGCACAGGGTATC
AGCCGAACCAAGTACTGGGAGTTGATTTATGAAGACTCTATGGATCTAATCGCAAAGCTACCTTGTGTTGCAGCA
AAGATCTACCGAAATCTCTACAGAGAAGGCAGCGGTATTGGGGCCATTGACTCTAACCTGGACTGGTCTCACAAT
TTCACCAACATGTTAGGCTATACTGATCATCAGTTCACTGAGCTCACGCGCCTGTACCTCACCATCCACAGTGAC
CATGAGGGTGGCAATGTAAGTGCCCATACCAGCCATTTGGTGGGCAGTGCCCTTTCCGACCCTTACCTGTCCTTT
GCAGCAGCCATGAACGGGCTGGCAGGGCCTCTCCATGGACTGGCAAATCAGGAAGTGCTTGTCTGGCTAACACAG
CTGCAGAAGGAAGTTGGCAAAGATGTGTCAGATGAGAAGTTACGAGACTACATCTGGAACACACTCAACTCAGGA
CGGGTTGTTCCAGGCTATGGCCATGCAGTACTAAGGAAGACTGATCCGCGATATACCTGTCAGCGAGAGTTTGCT
CTGAAACACCTGCCTAATGACCCCATGTTTAAGTTGGTTGCTCAGCTGTACAAGATTGTGCCCAATGTCCTCTTA
GAGCAGGGTAAAGCCAAGAATCCTTGGCCCAATGTAGATGCTCACAGTGGGGTGCTGCTCCAGTATTATGGCATG
ACGGAGATGAATTACTACACGGTCCTGTTTGGGGTGTCACGAGCATTGGGTGTACTGGCACAGCTCATCTGGAGC
CGAGCCTTAGGCTTCCCTCTAGAAAGGCCCAAGTCCATGAGCACAGAGGGTCTGATGAAGTTTGTGGACTCTAAG
TCAGGGTAAAACTGGAGACTGGGTGAAAGTGACTACCAGAAAGTGAGGAAGCCTAAATAAAAAGTATACTTTTGT
TTCAGGGGGCCTTTAAAGACTTAAGATTAAATTATATCTGAGGCACTGATAATATGTTTGAGGTTAAAATATAAA
TTAAGACTTTAAAAGATGAAAAATGGTCCCTTCTTCCCTAATCAGCTCCCTTCCCCTGCCTGGTATGAGTTGCCC
ATCATACGCATGGTCCTGGAGGATGACCAGGACTAATGCATGTGGTATGAGTAGGTTTGGCCCCCTCACTATCTC
TAGAGTGAGAATCTGGCTCCTGTTTCCATGGGTCAAAGCCGGTTGCAGAGAATCTGTAGTCACTTTGGAGCTTTA
GCTTCTCTGCCAAGCCCTCAATAAGCCAGCAAACCAGGACTCTGCCCCTTCTGTTTCCATAGGAATCATGTTGGA
TAGTCAGCTGTACCAAGCCCCTTGGCCCTCTCCCATGCACACAAACACCTCCTAGCAAGACCTGTTGGTTAGCTG
GACATGCTTTGGCAATTTTTTTATACTACCAAGTGACCATAAAGGCATGGCATTTGTTGTGACTGGCACCCAATG
TTTGATTTTTTTTTTAAAACTATCCAATTAAAATTAAGGTCTGGGAGTGTTCTGTTTCCCATTACTTTAATACTC
ACCTCCTCCCAGACTTTCTACACCTGTTGCACCTCAGGCAGAGGATGTTCTGGACCTCCCCCTCTTGGTCCCTAC
TAGAGACCTCTCAACAGATCTGTGGGCCCAGTCATTGGGTTTTATCAGTGCTTAATGTGAACTAAGTTTTTTACT
TCCACAGAATACAAGCCACTACCTTCTGACCTCCCCACCCCCCACCAACCCCCATCTTTTAATATGCTGTGGGGC
ATAGAACTCCGGAATGACCAGCATGATATTTTCAGAGTCTTGTCCCCGGGGTATTAGCACCTCTTTTTGAACAGG
GAATTGATTCAAGATTGGACATGGTCTCCTCTGATTATCAGGTACTGGGGCTGAGGGCATTAAAAATAGTAAGCC
TCCCTCCTCGTCCCCTGCCTCAAGAAATTGCCTCCTTATTTATCAACATCTTTTTCCTCCCTTTCCCTGAGAGCT
CACAGTACAATGTTTCAGAAGCCCCATTTGCACAGGTTTTCAGCAACTCAGAATGCTCTACTTCTTTTTCTTTGA
GAAAGGATTAAGATACACTCCTGCTGTGCCCCCATCTTTCCTCCAAACTCCTGCCTGTGTTTGTGTGGATACCCA
GTCCCAGAACCACACTGTTGAGTTGGACACACTGTAAACCCCTGGGTAACTGTCAAGTCATGATGGAGACTTCAG
GTTGTTCTGTATAAAATGCAAAATAAATGTTTTTATTAACAATGAAAAAAAAAAAAAAAAAAAAA
- the whole Cnpy2-Cs chimeric transcipt (note that I am not completely
protein sequence I have here after the protein coding part is correct.
However, at least the protein coding part from Hojoon's table
("C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion
info\Total_FG_peptide_annotation2-1.xlsx") should be correct)
"C:\kurt\storage\CIM Research Folder\DR\2013\4-18-13\chimeric
transcript\Cnpy2_Cs chimeric transcript info 4-18-13.ape"
need to design forward primer for TR_E10150
ATGAAAGGCTGGGGTTGGCTG
Length: 21
Tm: 56.2
Hairpin Tm (none)
need to design reverse primer for TR_E10150
GTTTGTGGACTCTAAGTCAGGGT
Length: 23
Tm 55.2
Hairpin Tm (none)
location of sequencher file to verify everything is facing the correct
direction
"C:\kurt\storage\CIM Research Folder\DR\2013\4-18-13\chimeric
transcript\Cnpy2-Cs chimeric transcript.SPF"
Expected length of amplified chimeric transcript
1561 bp
Chimeric transcript primer order form 4-18-13
===========================================================================
4-19-13
I found Hojoon's original primers for Cnpy2-Cs here
S:\Research\Cancer_Eradication\HoJoon\Primers\1st&2nd_plate_Invitrogen_
Oligo_Plate_Jan16_2009.xls
file information for
TR_E10150 search for Cnpy2-Cs primers 4-19-13
"C:\kurt\storage\CIM Research Folder\DR\2013\4-19-13\chimeric transcript
primer\TR_E10150 search for Cnpy2-Cs primers 4-19-13.txt"
TR_E10150Fo (the "o" is for original)
AGCGCTCTGGAGAATCCCG
TR_E10150Ro (the "o" is for original)
CCACCACCGTCTTGCCATG
order form for these primers here
"C:\kurt\storage\CIM Research Folder\DR\2013\4-19-13\chimeric transcript
primer\Kurt primer order form 4-19-13.xls"
===========================================================================
5-1-13
I don't have the size information for all of the chimeric transcripts I
would like to PCR screen.
Gene fusion Size (bp)
TR_E10142 449
TR_E10150 884
TR_E10339 315
TR_E20081
TR_E10028 400
TR_E20166
TR_E20026
TR_E20131
TR_E10176 506
TR_E20151
TR_E10002 293
TR_E10446 384
TR_E10324 499
Maybe if I find files containing the name, I can find the missing sizes.
I'll do a file search for "TR_E20081"
search started 5-1-13
No files were found which contained "TR_E20081" in
S:\Research\Cancer_Eradication\
Shen sent me a file that has the mouse chimeric transcript peptide info
"F:\kurt\storage\CIM Research Folder\DR\2013\5-3-13\chim transcr\GF
antigens list from Hojoon.xlsx"
here's another file that may be somewhat related
"F:\kurt\storage\CIM Research Folder\DR\2013\5-3-13\chim transcr\summary
of Mus GF .xls"
I now have the size information for the PCR products for all 13 validated
mouse chimeric transcripts in Hojoon's Table 3.3 in his dissertation
"F:\kurt\storage\CIM Research
Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3
from Hojoon's dissertation.xlsx"