No scoring matrices contained any information error 06-20-2014d1410

2015-01-13

No scoring matrices contained any information error 06-20-2014d1410

copy of email
https://mail.google.com/mail/u/0/?ui=2&shva=1#label/Career/146b632cd68c3ac1
me

Hi Meme representative,

What would cause this type of error?

"FATAL: No scoring matrices contained any information."

I typed a command like this:

./mast "C:\Users\kurtw_000\Documents\kurt\storage\lgdata_DR\2014\01-16-2014d1422\selected young aged gprs\decipher for 100 peptides by age 3 criteria\analysis of josh\glam2 motifs of top 50 sequences from josh 06-17-2014d1955.meme" "C:\Users\kurtw_000\Documents\kurt\storage\pDR\lgdata_DR\06-18-2014d1642\all human proteins\human proteins from search\sequence.fasta"

Do you have any suggestions or advice? I have attached the meme file. My fasta file has 51.2 MB of sequences like this:

"
>gi|612149760|ref|NP_001013765.2| transmembrane protein 225 [Homo sapiens]
MVHVSNRSIQGMNILFSSWAVVLMVMGITLDKWVELISEDERAKMNHSPWMMCCPALWPEDDLKVVRIMM
TSSLGLSFLLNLILGMKFTYLIPQNKYIQLFTTILSFFSGISLLWALILYHNKLKQGQSMHFSSYRITWI
MYTAYLNVFFLSVCGVLSLLECKLSTSSCTCLNIHKSDNECKESENSIEDISLPECTAMPRSIVRAHTVN
SLNKKVQTRHVTWAL

>gi|575771947|ref|NP_001276538.1| synaptotagmin-like protein 2 isoform k [Homo sapiens]
MGKKKTLVVKKTLNPVYNEILRYKIEKQILKTQKLNLSIWHRDTFKRNSFLGEVELDLETWDWDNKQNKQ
LRWYPLKRKTAPVALEAENRGEMKLALQYVPEPVPGKKLPTTGEVHIWVKECLDLPLLRGSHLNSFVKCT
ILPDTSRKSRQKTRAVGKTTNPIFNHTMVYDGFRPEDLMEACVELTVWDHYKLTNQFLGGLRIGFGTGKS
YGTEVDWMDSTSEEVALWEKMVNSPNTWIEATLPLRMLLIAKISK

>gi|575771945|ref|NP_001276539.1| synaptotagmin-like protein 2 isoform h [Homo sapiens]
MSKSVPAFLQDEVSGSVMSVYSGDFGNLEVKGNIQFAIEYVESLKELHVFVAQCKDLAAADVKKQRSDPY
VKAYLLPDKGKMGKKKTLVVKKTLNPVYNEILRYKIEKQILKTQKLNLSIWHRDTFKRNSFLGEVELDLE
TWDWDNKQNKQLRWYPLKRKTAPVALEAENRGEMKLALQYVPEPVPGKKLPTTGEVHIWVKECLDLPLLR
GSHLNSFVKCTILPDTSRKSRQKTRAVGKTTNPIFNHTMVYDGFRPEDLMEACVELTVWDHYKLTNQFLG
GLRIGFGTGKSYGTEVDWMDSTSEEVALWEKMVNSPNTWIEATLPLRMLLIAKISK
"


Best regards,
Kurt Whittemore

Graduate Student
Arizona State University
BIODESIGN INSTITUTE
Center for Innovations in Medicine
1001 S McALLISTER AVE
TEMPE, AZ 85287


MEME representative

I'm afraid the error is a little misleading. The actual problem seems to be that you've defined a background model with zeros for some of the AA frequencies. Internally MAST has to compute log-odds ratios and having a frequency of exactly 0 causes problems when we try to divide by 0 or take the log of 0.

Try replacing the 0 values in your alphabet with a very small value like

Background letter frequencies (from dataset without add-one prior applied):
A 0.056 C 0.0001 D 0.007 E 0.007 F 0.086 G 0.265 H 0.004 I 0.0001 K 0.056 L 0.015 M 0.0001 N 0.056 P 0.049 Q 0.082 R 0.134 S 0.030 T 0.0001 V 0.037 W 0.007 Y 0.108


MAST then seems to run ok.

We'll look into creating a more informative error message for this problem.