將標籤文本轉換爲html無序列表？

我是初學者程序員，所以這個問題聽起來簡單：我有一些文本文件，方含製表符分隔文本，如：將標籤文本轉換爲html無序列表？

現在我想產生無序的.html列出了這一點，與結構：

<ul> 
<li>A 
<ul><li>B</li> 
<li>C 
<ul><li>D</li> 
<li>E</li></ul></li></ul></li> 
</ul>

我的想法是編寫一個Python腳本，但如果有一個更容易（自動）的方式，那也沒關係。爲了識別縮進級別和項目的名字，我會嘗試使用此代碼：

import sys 
indent = 0 
last = [] 
for line in sys.stdin: 
    count = 0 
    while line.startswith("\t"): 
     count += 1 
     line = line[1:] 
    if count > indent: 
     indent += 1 
     last.append(last[-1]) 
    elif count < indent: 
     indent -= 1 
     last = last[:-1]

來源

2012-08-28 Elip

tokenize module瞭解您的輸入格式：行包含有效的Python標識符，語句的縮進級別非常重要。 ElementTree module讓您操作的樹結構在內存中，因此可能會更flexable從渲染爲HTML分離樹創建：

from tokenize import NAME, INDENT, DEDENT, ENDMARKER, NEWLINE, generate_tokens 
from xml.etree import ElementTree as etree 

def parse(file, TreeBuilder=etree.TreeBuilder): 
    tb = TreeBuilder() 
    tb.start('ul', {}) 
    for type_, text, start, end, line in generate_tokens(file.readline): 
     if type_ == NAME: # convert name to <li> item 
      tb.start('li', {}) 
      tb.data(text) 
      tb.end('li') 
     elif type_ == NEWLINE: 
      continue 
     elif type_ == INDENT: # start <ul> 
      tb.start('ul', {}) 
     elif type_ == DEDENT: # end </ul> 
      tb.end('ul') 
     elif type_ == ENDMARKER: # done 
      tb.end('ul') # end parent list 
      break 
     else: # unexpected token 
      assert 0, (type_, text, start, end, line) 
    return tb.close() # return root element

提供.start()，.end()，.data()，.close()方法可以用來爲任何類一個TreeBuilder例如，你可以直接寫html而不是建樹。

爲了解析標準輸入和寫HTML到標準輸出，你可以使用ElementTree.write()：

import sys 

etree.ElementTree(parse(sys.stdin)).write(sys.stdout, method='html')

輸出：

<ul><li>A</li><ul><li>B</li><li>C</li><ul><li>D</li><li>E</li></ul></ul></ul>

您可以使用任何文件，而不僅僅是sys.stdin/sys.stdout。

注意：要寫入Python 3的stdout，請使用sys.stdout.buffer或encoding="unicode"，這是由於字節/ Unicode的區別。

來源

2012-08-28 19:56:42 jfs

我認爲算法是這樣的：

跟蹤當前的縮進級別的（通過計算數每行製表符）
如果縮進級別增加：發射<ul> <li>current item</li>
如果indentati上水平下降：發出<li>current item</li></ul>
如果縮進級別保持不變：發出<li>current item</li>

投入代碼，這是留給OP鍛鍊

來源

2012-08-28 17:17:03

試試這個（適用於您的測試情況下）：

import itertools 
def listify(filepath): 
    depth = 0 
    print "<ul>"*(depth+1) 
    for line in open(filepath): 
     line = line.rstrip() 
     newDepth = sum(1 for i in itertools.takewhile(lambda c: c=='\t', line)) 
     if newDepth > depth: 
      print "<ul>"*(newDepth-depth) 
     elif depth > newDepth: 
      print "</ul>"*(depth-newDepth) 
     print "<li>%s</li>" %(line.strip()) 
     depth = newDepth 
    print "</ul>"*(depth+1)

希望這有助於

來源

2012-08-28 17:28:13 inspectorG4dget

-1

該算法很簡單。您可以獲取以選項卡\ t指示的線條的深度級別，並將下一個項目符號向右移動\ t + \ t或向左\ t \ t \ t或將其保持在相同的級別\ t。

確保您的「in.txt」包含選項卡或用標籤替換縮進，如果您從這裏複製它。如果縮進由空格組成，則不起任何作用。最後，分隔符是空行。如果需要，您可以在代碼中進行更改。

J.F. Sebastian的解決方案很好，但不處理unicode。

在UTF-8編碼創建一個文本文件「in.txt」：

qqq 
    www 
    www 
     яяя 
     яяя 
    ыыы 
    ыыы 
qqq 
qqq

和運行腳本「ul.py」。該腳本將創建「out.html」並在Firefox中打開它。

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

# The script exports a tabbed list from string into a HTML unordered list. 

import io, subprocess, sys 

f=io.open('in.txt', 'r', encoding='utf8') 
s=f.read() 
f.close() 

#--------------------------------------------- 

def ul(s): 

    L=s.split('\n\n') 

    s='<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n\ 
<html><head><meta content="text/html; charset=UTF-8" http-equiv="content-type"><title>List Out</title></head><body>' 

    for p in L: 
     e='' 
     if p.find('\t') != -1: 

      l=p.split('\n') 
      depth=0 
      e='<ul>' 
      i=0 

      for line in l: 
       if len(line) >0: 
        a=line.split('\t') 
        d=len(a)-1 

        if depth==d: 
         e=e+'<li>'+line+'</li>' 


        elif depth < d: 
         i=i+1 
         e=e+'<ul><li>'+line+'</li>' 
         depth=d 


        elif depth > d: 
         e=e+'</ul>'*(depth-d)+'<li>'+line+'</li>' 
         depth=d 
         i=depth 


      e=e+'</ul>'*i+'</ul>' 
      p=e.replace('\t','') 

      l=e.split('<ul>') 
      n1= len(l)-1 

      l=e.split('</ul>') 
      n2= len(l)-1 

      if n1 != n2: 
       msg='<div style="color: red;">Wrong bullets position.<br>&lt;ul&gt;: '+str(n1)+'<br>&lt;&frasl;ul&gt;: '+str(n2)+'<br> Correct your source.</div>' 
       p=p+msg 

     s=s+p+'\n\n' 

    return s 

#-------------------------------------  

def detach(cmd): 
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) 
    sys.exit() 

s=ul(s) 

f=io.open('out.html', 'w', encoding='utf8') 
s=f.write(s) 
f.close() 

cmd='firefox out.html' 
detach(cmd)

HTML將是：

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 
<html><head><meta content="text/html; charset=UTF-8" http-equiv="content-type"><title>List Out</title></head><body><ul><li>qqq</li><ul><li>www</li><li>www</li><ul><li>яяя</li><li>яяя</li></ul><li>ыыы</li><li>ыыы</li></ul><li>qqq</li><li>qqq</li></ul>

來源

2015-06-03 20:58:03

將標籤文本轉換爲html無序列表？

回答

相關問題