如何在多個文件中保存多個輸出，其中每個文件都有不同的標題來自python中的對象？

我正在從網站上抓取RSS源（http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss）。我已經寫下了一個腳本來提取和純化來自每個Feed的文本。我的主要問題是將每個項目的每個文本保存在一個不同的文件中，我還需要爲每個文件命名該項目的正確標題exctractet。我的代碼是：如何在多個文件中保存多個輸出，其中每個文件都有不同的標題來自python中的對象？

for item in myFeed["items"]: 
    time_structure=item["published_parsed"] 
    dt = datetime.fromtimestamp(mktime(time_structure)) 

    if dt>t: 

    link=item["link"]   
    response= requests.get(link) 
    doc=Document(response.text) 
    doc.summary(html_partial=False) 

    # extracting text 
    h = html2text.HTML2Text() 

    # converting 
    h.ignore_links = True #ignoro i link 
    h.skip_internal_links=True #ignoro i link esterni 
    h.inline_links=True 
    h.ignore_images=True #ignoro i link alle immagini 
    h.ignore_emphasis=True 
    h.ignore_anchors=True 
    h.ignore_tables=True 

    testo= h.handle(doc.summary()) #testo estratto 

    s = doc.title()+"."+" "+testo #contenuto da stampare nel file finale 

    tit=item["title"] 

    # save each file with it's proper title 
    with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f: 
     f.write(s) 
     f.close()

的錯誤是：

File "<ipython-input-57-cd683dec157f>", line 34 with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f: 
           ^
SyntaxError: invalid syntax

來源

2016-10-02 CosimoCD

你需要把逗號後%tit

應該是：

#save each file with it's proper title 
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

但是，如果你的文件名稱具有無效字符，它將返回一個錯誤（例如使用nltk

... 
tit = item["title"] 
tit = tit.replace(' ', '').replace("'", "").replace('?', '') # Not the best way, but it could help for now (will be better to create a list of stop characters) 

with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

其他方式：）

你可以試試這個代碼

from nltk.tokenize import RegexpTokenizer 
tokenizer = RegexpTokenizer(r'\w+') 
tit = item["title"] 
tit = tokenizer.tokenize(tit) 
tit = ''.join(tit) 
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s) 
    f.close()

來源

2016-10-02 15:26:58 estebanpdl

我做到了，但它不工作，我得到這個錯誤：C：\ Anaconda2 \ LIB \編解碼器.pyc in open（filename，mode，encoding，errors，buffering） 894＃以二進制模式強制打開文件 895 mode = mode +'b' - > 896 file = __builtin __。open（filename，mode，緩衝） 897 if encoding is None： 898 return file IOError：[Errno 22] invalid mode（'wb'）or filename：u'testo_La Comunicazione Facilitata？ Parliamone。 – CosimoCD

代碼是正確的。逗號在目標'％tit'之後，而不是之前。那是另一個錯誤。我會檢查。 – estebanpdl

期望的輸出是什麼？（即'.csv'，'.txt'） – estebanpdl

首先，你放錯地方的逗號，它應該是%tit不前了。

其次，您不需要關閉文件，因爲您使用的with語句會自動爲您執行。編解碼器從哪裏來？我沒有看到任何其他地方....反正，正確with說法應該是：

with open("testo_%s" %tit, "w", encoding="utf-8") as f: 
    f.write(s)

來源

2016-10-02 19:34:03 geo1230

我已經運行了上面的代碼，但它給了錯誤。現在我正在研究：使用io.open（「testo _」+ tit，「w」，encoding =「utf-8」）作爲f： f.write（s） – CosimoCD

它給了什麼錯誤？你應該提供一些工作來...和命名你應該堅持''testo_％s「％tit'，因爲我不認爲''testo _」+ tit'會起作用（但我可能只是錯誤） – geo1230

我已經運行了上面的代碼，但是它給我錯誤。它說這個函數不會接受象參數那樣的％tit。現在我正在研究：使用io.open（「testo _」+ tit，「w」，encoding =「utf-8」）作爲f： f.write（s）它的功能部分是因爲它保存了第一項他的正確頭銜，而不是停止。我得到這個新的錯誤：IOError：[Errno 22]無效的參數：u'testo_La Comunicazione Facilitata？ Parliamone ......」 – CosimoCD

如何在多個文件中保存多個輸出，其中每個文件都有不同的標題來自python中的對象？

回答

相關問題