寫入文件並得到奇怪的縮進

我有以下代碼片段，它需要一個url打開它，解析出只是文本，然後搜索小部件。它檢測小部件的方式是查找單詞widget1，然後是endwidget，這表示小部件的結尾。寫入文件並得到奇怪的縮進

基本上，代碼一旦找到文字widget1就會將所有文本行寫入文件，並在其讀取endwidget時結束。但是，我的代碼在第一行widget1行後縮進了所有行。

這是我的輸出

widget1 this is a really cool widget 
     it does x, y and z 
     and also a, b and c 
     endwidget

我想要的是：

widget1 this is a really cool widget 
it does x, y and z 
and also a, b and c 
endwidget

爲什麼會出現這個缺口？這是我的代碼...

for url in urls: 
     page = mech.open(url) 
     html = page.read() 
     soup = BeautifulSoup(html) 
     text= soup.prettify() 
     texts = soup.findAll(text=True) 

     def visible(element): 
      if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: 
      # If the parent of your element is any of those ignore it 

       return False 

      elif re.match('<!--.*-->', str(element)): 
      # If the element matches an html tag, ignore it 

       return False 

      else: 
      # Otherwise, return True as these are the elements we need 

       return True 

     visible_texts = filter(visible, texts) 

     inwidget=0 
     # open a file for write 
     for line in visible_texts: 
     # if line doesn't contain .widget1 then ignore it 
      if ".widget1" in line and inwidget==0: 
       match = re.search(r'\.widget1 (\w+)', line) 
       line = line.split (".widget1")[1] 
       # make the next word after .widget1 the name of the file 
       filename = "%s" % match.group(1) + ".txt" 
       textfile = open (filename, 'w+b') 
       textfile.write("source:" + url + "\n\n") 
       textfile.write(".widget1" + line) 
       inwidget = 1 
      elif inwidget == 1 and ".endwidget" not in line: 
       print line 
       textfile.write(line) 
      elif ".endwidget" in line and inwidget == 1: 
       textfile.write(line) 
       inwidget= 0 
      else: 
       pass

來源

2012-11-22 user1328021

原因你得到這個缺口中的所有線路除了第一個是因爲第一行用textfile.write(".widget1" + line)編輯行，但直接從包含縮進的html文件中取出其餘行。您可以通過在線上使用str.strip()刪除不需要的空格，並將textfile.write(line)更改爲textfile.write(line.strip())。

來源

2012-11-22 14:45:16 user1767344

要從輸出到您想要的輸出，這樣做：

#a is your output 
a= '\n'.join(map(lambda x: x.strip(),a.split('\n')))

來源

2012-11-22 14:34:42 LtWorf

謝謝，是'了'應該是變量'texts'還是在visible_texts' – user1328021

同樣，每個'線，究竟是什麼它做什麼？它剝離回車，還有什麼？ – user1328021

它在\ n上分割，創建一個字符串列表，每行爲每行，然後剝離每行（這意味着它刪除開始和結尾處的空格......但是您可以將其更改爲僅在以lstrip開頭），然後再次使用\ n作爲分隔符將這些字符串連接在一起。 – LtWorf

寫入文件並得到奇怪的縮進

回答

相關問題