python中for循環的用法

我創建了一個腳本，它可以從錨標記中獲取href鏈接以及文本。python中for循環的用法

這裏是我的Python代碼：

import re 
import cssselect 
from lxml import html 

mainTree = html.fromstring('<a href="https://www.example.com/laptops/" title="Laptops"><div class="subCategoryItem">Laptops <span class="cnv-items">(229)</span></div></a>') 

for links in mainTree.cssselect('a'): 
    urls = [links.get('href')] 
    texts = re.findall(re.compile(u'[A-z- &]+'), links.text_content()) 

    for text in texts: 
     print (text) 

    for url in urls: 
     print (url)

輸出：

Laptops 
https://www.example.com/laptops/

而不是使用兩個for循環我可以做到這一點的？

for text, url in texts, urls: 
    print (text) 
    print (url)

來源

2015-10-14 Mansoor Akram

當您試用它時發生了什麼？ –

@NathanielFord我得到這個：「ValueError：需要多個值才能解包」。 –

我認爲這是一個XY問題。你所問的關於組合循環的問題確實通過@ kmad1729所描述的'zip'來回答。但是，我不知道你爲什麼在循環。每個''標籤只會有一個URL，因此如果在're.findall'搜索中獲得多個匹配，我認爲'zip'不會執行您想要的操作（除第一個結果之外的所有結果都將被忽略）。也許你只是想從'text_content'調用返回的字符串中過濾出不適當的字符？ – Blckknght

讓我們來看看你想在這裏做什麼：

for text, url in texts, urls: 
    print (text) 
    print (url)

的for後text, url部分權表示「解包tuple指示後in分爲兩部分」。如果元組沒有兩個部分，你會得到一個ValueError。

你正在迭代的兩個列表都有單個值，並且簡單地在它們之間放置一個,將不會做你想要的。作爲另一個答案的建議，你可以zip他們到一個數組：

for text, url in zip(texts, urls): 
    print (text) 
    print (url)

什麼拉鍊確實是返回一個列表，每個元素都是從每個提供的名單中包含一個元素的元組。這很好，但並沒有解決兩次循環遍歷列表的問題：你仍然這樣做，一次用於zip，一次用於解壓zip。你的更深層的問題是如何你得到你的價值。

您似乎正在逐步瀏覽每個鏈接，然後每個鏈接您正在獲取網址和文本並將其放入列表中。然後您將這些列表中的內容都打印出來。那些列表的長度是否超過一個？

的get功能只會返回一個值：

urls = [links.get('href')] //Gets one value and puts it in a list of length one

付諸列表沒有意義的。至於你的正則表達式搜索，它理論上可以返回多個值，但是如果你使用re.search()，那麼你只會得到第一個匹配，並且不需要擔心附加值。這是你目前在做什麼：

for each link in the document 
    put the url into a list 
    put all the matching text into a list 
    for each url in the list print it 
    for each text in the list print it

當你真的可以簡化爲：

for each link in the document 
    print the url 
    find the first text and print it

那麼你不必擔心額外的for循環和拉拉鍊。此重構爲：

for links in mainTree.cssselect('a'): 
    print(links.get('href')) 
    print(re.search(re.compile(u'[A-z- &]+'), links.text_content()))

來源

2015-10-14 17:23:17

可以使用zip功能：

for text, url in zip(texts, urls): 
    print (text) 
    print (url)

它所做的是拉鍊兩個或兩個以上iterables。它們不需要具有相同的尺寸。

>>> l1 = range(5) 
>>> l2 = range(6) 
>>> list(zip(l1,l2)) #produces 
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)] 
>>>

來源

2015-10-14 17:16:42 kmad1729

'zip'是一個很棒的功能！請注意，在這裏它是矯枉過正，實際上並沒有減少計算複雜度。 –

python中for循環的用法

回答

相關問題