2015-05-11 67 views
-2

我在BAM格式 一些芯片序列數據在某些時候,我想用本壘打做一個德從頭motif發現 findMotifsGenome.pl腳本HOMER從頭motif發現無法打開hg19 FASTA文件

問題似乎是這個應用程序無法打開refrence基因組fasta文件,即使它們是由應用程序本身安裝的!

有沒有人遇到過這個問題?

Linux命令使用:

$ perl的/home/chipseq_project/homer/bin/findMotifsGenome.pl /home/chipseq_project/homer/findpeak_output/peaks.txt hg19 /家庭/ chipseq_project /荷馬/ motif_output/- 給定大小

標準輸出文本:

Position file = /home/chipseq_project/homer/findpeak_output/peaks.txt 
    Genome = hg19 
    Output Directory = /home/chipseq_project/homer/motif_output/ 
    Using actual sizes of regions (-size given) 
    Fragment size set to given 
    Found mset for "human", will check against vertebrates motifs 
    Peak/BED file conversion summary: 
      BED/Header formatted lines: 0 
      peakfile formatted lines: 7662 

    Peak File Statistics: 
      Total Peaks: 7662 
      Redundant Peak IDs: 0 
      Peaks lacking information: 0 (need at least 5 columns per peak) 
      Peaks with misformatted coordinates: 0 (should be integer) 
      Peaks with misformatted strand: 0 (should be either +/- or 0/1) 

    Peak file looks good! 

    Background fragment size set to 81 (avg size of targets) 
    Background files for 81 bp fragments found. 

    Extracting sequences from directory: /home/chipseq_project/homer/.//data/genomes/hg19// 
    !!Could not open file for 1 (.fa or .fa.masked) 
    !!Could not open file for 10 (.fa or .fa.masked) 
    !!Could not open file for 11 (.fa or .fa.masked) 
    !!Could not open file for 12 (.fa or .fa.masked) 
    !!Could not open file for 13 (.fa or .fa.masked) 
    !!Could not open file for 14 (.fa or .fa.masked) 
    !!Could not open file for 15 (.fa or .fa.masked) 
    !!Could not open file for 16 (.fa or .fa.masked) 
    !!Could not open file for 17 (.fa or .fa.masked) 
    !!Could not open file for 18 (.fa or .fa.masked) 
    !!Could not open file for 19 (.fa or .fa.masked) 
    !!Could not open file for 2 (.fa or .fa.masked) 
    !!Could not open file for 20 (.fa or .fa.masked) 
    !!Could not open file for 21 (.fa or .fa.masked) 
    !!Could not open file for 22 (.fa or .fa.masked) 
    !!Could not open file for 3 (.fa or .fa.masked) 
    !!Could not open file for 4 (.fa or .fa.masked) 
    !!Could not open file for 5 (.fa or .fa.masked) 
    !!Could not open file for 6 (.fa or .fa.masked) 
    !!Could not open file for 7 (.fa or .fa.masked) 
    !!Could not open file for 8 (.fa or .fa.masked) 
    !!Could not open file for 9 (.fa or .fa.masked) 
    !!Could not open file for X (.fa or .fa.masked) 
    !!Could not open file for Y (.fa or .fa.masked) 

    Not removing redundant sequences 


    Sequences processed: 
      0 total 

    Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8 
    Freq Bin  Count 

    Total sequences set to 50000 

    Choosing background that matches in CpG/GC Content... 

非法除數爲零/home/chipseq_project/homer/bin/assignGeneWeights.pl線63 裝配序列文件... 正火低ORD使用homer2

Reading input files... 
    0 total sequences read 
    Autonormalization: 1-mers (4 total) 
      A  inf% inf% -nan 
      C  inf% inf% -nan 
      G  inf% inf% -nan 
      T  inf% inf% -nan 
    Autonormalization: 2-mers (16 total) 
      AA  inf% inf% -nan 
      CA  inf% inf% -nan 
      GA  inf% inf% -nan 
      TA  inf% inf% -nan 
      AC  inf% inf% -nan 
      CC  inf% inf% -nan 
      GC  inf% inf% -nan 
      TC  inf% inf% -nan 
      AG  inf% inf% -nan 
      CG  inf% inf% -nan 
      GG  inf% inf% -nan 
      TG  inf% inf% -nan 
      AT  inf% inf% -nan 
      CT  inf% inf% -nan 
      GT  inf% inf% -nan 
      TT  inf% inf% -nan 
    Autonormalization: 3-mers (64 total) 
    Normalization weights can be found in file: /home/chipseq_project/homer/motif_output//seq.autonorm.tsv 
    Converging on autonormalization solution: 
    ............................................................................... 
    Final normalization: Autonormalization: 1-mers (4 total) 
      A  inf% inf% -nan 
      C  inf% inf% -nan 
      G  inf% inf% -nan 
      T  inf% inf% -nan 
    Autonormalization: 2-mers (16 total) 
      AA  inf% inf% -nan 
      CA  inf% inf% -nan 
      GA  inf% inf% -nan 
      TA  inf% inf% -nan 
      AC  inf% inf% -nan 
      CC  inf% inf% -nan 
      GC  inf% inf% -nan 
      TC  inf% inf% -nan 
      AG  inf% inf% -nan 
      CG  inf% inf% -nan 
      GG  inf% inf% -nan 
      TG  inf% inf% -nan 
      AT  inf% inf% -nan 
      CT  inf% inf% -nan 
      GT  inf% inf% -nan 
      TT  inf% inf% -nan 
    Autonormalization: 3-mers (64 total) 
    Finished preparing sequence/group files 

    ---------------------------------------------------------- 
    Known motif enrichment 

    Reading input files... 
    0 total sequences read 
    264 motifs loaded 
    Cache length = 11180 
    Using binomial scoring 
    Checking enrichment of 264 motif(s) 
    |0%         50%         100%| 

非法司/home/chipseq_project/homer/bin/findKnownMotifs.pl線142 呃寡零------------------ ---------------------------------------- De novo motif finding(HOMER)

Scanning input files... 

!!!有什麼不對......你確定你選擇了正確的長度來尋找主題嗎? !!!即也檢查你的序列文件!

Scanning input files... 

!有什麼不對......你確定你選擇了正確的長度來尋找主題嗎? !!!即也檢查你的序列文件!

-blen automatically set to 2 
    Scanning input files... 

!有什麼不對......你確定你選擇了正確的長度來尋找主題嗎? !!!即也檢查你的序列文件! 在/home/chipseq_project/homer/bin/compareMotifs.pl 1289行的數字gt(>)中使用未初始化的值。 !!!過濾掉所有圖案! 工作完成 - 如果結果看起來不錯,請發送啤酒..

Cleaning up tmp files... 
+2

所以,你問堆棧溢出調試別人的腳本,但沒有實際提供腳本?對不起,這有點超出了SO的範圍。如果您可以提供重現問題所需的代碼片段,我們可能會有機會 – Sobrique

回答

1

一兩件事來檢查:如果染色體在你的牀上文件命名,並與您所使用的基因組中的CHROM命名是一致的:例如你不應該在你的牀文件中有12號染色體,而在你感興趣的基因組中,它是'chr12'

+0

這正是問題所在,一個簡單的腳本在染色體數量前添加「chr」來固定所有內容。謝謝 – RickyGambon

0

對於「chr」問題,簡單的awk命令是你的朋友。簡單的awk'{print'chr'$ 0}'your.bed> your_new.bed將完成這項工作。 hkoohy