BeautifulSoup將不需要的換行符添加到字符串Python3.5

我一直在使用BeautifulSoup .find函數獲得的字符串中出現隱藏換行符的問題。我已經掃描了一個html文檔的代碼，並以字符串的形式提取了名稱，標題，公司和國家。我打了檢查，看到他們是字符串，當我打印他們，並檢查他們的長度一切似乎是正常的字符串。但是當我在print("%s is a %s at %s in %s" % (name,title,company,country))或outputWriter.writerow([name,title,company,country])中使用它們寫入csv文件時，我會在字符串中看到額外的換行符。BeautifulSoup將不需要的換行符添加到字符串Python3.5

發生了什麼事？或者任何人都可以將我指向正確的方向？

我是新來的Python，並不確定在哪裏查找所有我不知道的東西，所以我在花一整天的時間試圖解決問題後問這裏。我已經通過google和其他幾個關於剝離隱藏字符的堆棧溢出文章進行了搜索，但似乎沒有任何工作。

import csv 
from bs4 import BeautifulSoup 

# Open/create csvfile and prep for writing 
csvFile = open("attendees.csv", 'w+', encoding='utf-8') 
outputWriter = csv.writer(csvFile) 

# Open HTML and Prep BeautifulSoup 
html = open('WEB SUMMIT _ LISBON 2016 _ Web Summit Featured Attendees.html', 'r', encoding='utf-8') 
bsObj = BeautifulSoup(html.read(), 'html.parser') 
itemList = bsObj.find_all("li", {"class":"item"}) 

outputWriter.writerow(['Name','Title','Company','Country']) 

for item in itemList: 
    name = item.find("h4").get_text() 
    print(type(name)) 
    title = item.find("strong").get_text() 
    print(type(title)) 
    company = item.find_all("span")[1].get_text() 
    print(type(company)) 
    country = item.find_all("span")[2].get_text() 
    print(type(country)) 
    print("%s is a %s at %s in %s" % (name,title,company,country)) 
    outputWriter.writerow([name,title,company,country])

來源

2016-08-30 gsears

我解決了我的問題嘗試一個更多的過濾器。 def filter_non_printable（str）： return''.join（[c for str in if（ord（c）> 31 or ord（c）== 9]） – gsears

最有可能你需要條空白，沒有什麼在你的代碼，增加了它，所以它必須有：

outputWriter.writerow([name.strip(),title.strip(),company.strip(),country.strip()])

您可以驗證通過觀看什麼我們有再版 outpout：

print("%r is a %r at %r in %r" % (name,title,company,country))

當你打印喲你看到的STR輸出，所以如果有您可能沒有意識到一個換行符有：

In [8]: s = "string with newline\n" 

In [9]: print(s) 
string with newline 


In [10]: print("%r" % s) 
'string with newline\n'

difference-between-str-and-repr-in-python

如果換行實際上是嵌在機身斷琴絃，你將需要替換即name.replace("\n", " ")

來源

2016-08-30 21:37:17

Thanks！正如我在我最後的評論中感到難過的那樣，我嘗試了更多的解決方案，並發現它很有效。我仍然不確定所有事情的最終結果，但我正在慢慢學習。再次感謝！ – gsears

BeautifulSoup將不需要的換行符添加到字符串Python3.5

回答

相關問題