第一次使用python是今天,所以請原諒我有點垃圾。這是我知道的基本代碼。打印使用zip創建的元組到文本文件
from lxml import html
import lxml
import requests
sourcepage = requests.get('http://www.bbc.co.uk/sport/football/championship/table')
tree = html.fromstring(sourcepage.content)
teamname = tree.xpath('descendant::table[1][@class = "table-stats"]/tbody/tr/td[@class = "team-name"]/a/text()')
position = tree.xpath('descendant::table[1][@class = "table-stats"]/tbody/tr/td[@class = "position"]/span[@class = "position-number"]/text()')
movement = tree.xpath('descendant::table[1][@class = "table-stats"]/tbody/tr/td[@class = "position"]/span[@class="moving-down" or @class="no-movement" or @class="moving-up"]/text()')
goaldiff = tree.xpath('descendant::table[1][@class = "table-stats"]/tbody/tr/td[@class = "goal-difference"]/text()')
points = tree.xpath('descendant::table[1][@class = "table-stats"]/tbody/tr/td[@class = "points"]/text()')
combined = zip(teamname,position,movement,goaldiff,points)
它能做什麼是刮網站,保存爲一棵樹,並與把它變成一個元組(我認爲)
細表打印到命令行
print("Pos. | Team | P | GD | Pts\n:--:|:--|:--:|:--:|:--:")
for var1,var2,var3,var4,var5 in combined:
print(var1,"|",var2,var3,"|",var4,"|",var5)
但是我一直有嚴重的問題讓它打印到文件。
我嘗試了以下方法:
outfile = open('output.txt', 'w')
print>>outfile("Pos. | Team | P | GD | Pts\n:--:|:--|:--:|:--:|:--:")
for var1,var2,var3,var4,var5 in combined:
print>>outfile(var1,"|",var2,var3,"|",var4,"|",var5)
outfile.close()
這一個輸出錯誤 - 類型錯誤: '_io.TextIOWrapper' 對象不是可調用
with open('output.txt', 'w') as fp:
fp.write("Pos. | Team | P | GD | Pts\n:--:|:--|:--:|:--:|:--:\n")
for var1,var2,var3,var4,var5 in combined:
var1s = str(var1)
print("debug: var1/var1s set to: ",var1,var1s) #prints nothing (?)
var2s = str(var2)
var3s = str(var3)
var4s = str(var4)
var5s = str(var5)
fp.write(var1s+"|"+var2s+var3s+"|"+var4s+"|"+var5s+"\n")
這僅輸出標題行。
(var1a, var2a, var3a, var4a, var5a) = combined
var1a, var2a, var3a, var4a, var5a = combined
print(var1a)
兩個驚訝我,因爲他們都返回一個ValueError:沒有足夠的數值解壓縮(預計5,得到0)
with open('output.txt', 'w') as fp:
fp.write('\n'.join('{} {} {} {} {}' % x for x in combined))
它輸出一個空白文件,因爲這樣做
outfile = open('outfile.txt', 'w')
for t in combined:
line = ' '.join(str(x) for x in t)
outfile.write(line + '\n')
outfile.close()
and
with open('output.txt', 'w') as f:
for stuff in combined:
f.write('%s %s %s %s %s\n' % stuff)
我花了幾個小時的時間搜索並試圖通過stackoverflow的問題來挖掘我的方式,但我仍然缺乏,並且我在這裏深入了一點。
我想將它輸出到文件的原因是在此之後的下一步是在輸出上運行一些正則表達式以將截斷的名稱轉換回適當的名稱,並在其周圍包裹一些reddit標記他們成爲聯繫。
我應該知道嘗試了4種不同的方法後,不可能所有的人都是錯的。我保留了打印線以進行調試,誰會認爲調試線會成爲錯誤的原因。謝謝。 – Paine