2014-05-15 37 views
0

我想要做的是打開一段文本文件,並給每行最大寬度的X個字符。但是,我的算法存在一個缺陷,因爲這會削減文字,並且不起作用。我不太清楚如何去做這件事。此外,我不知道如何讓它改變路線。如何限制每行字符數而不影響單詞?

我查了一下textwrap,因爲我想提高自己的算法技巧,所以在這一點上我並不想使用它。

所以我的算法是打開文件:

f.open("file.txt", "r", encoding="utf-8") 
lines = f.readlines() 
f.close() 

現在我把所有的行的列表。這是我卡住的地方。我打印時如何限制每行的長度?

我真的不知道如何去做這件事,真的很感謝一些幫助。

謝謝。

+3

你應該看看:https://docs.python.org/3.4/library/textwrap.html –

+0

由於你問的算法而不是代碼或語言:基本上,你想看看字符那就是start + n,如果它是空白的,就換行;如果不是空格,請看start + n - 1;如果這不是空格,請查看start + n - 2 ...將有代碼中優化此方法。 –

回答

4

您可以使用標準textwrap module

import textwrap 
txt = "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
print '\n'.join(textwrap.wrap(txt, 20, break_long_words=False)) 

首先,閱讀文件,你應該使用with建設:

with open(filename, 'r') as f: 
    lines = f.readlines() 

def wrap(line): 
    broken = textwrap.wrap(line, 20, break_long_words=False) 
    return '\n'.join(broken) 

wrapped = [wrap(line) for line in lines] 

但你說,你不希望使用內置textwrap,但你自己做,所以這裏是沒有進口的解決方案:

import textwrap 

lorem = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Phasellus ac commodo libero, at dictum leo. Nunc convallis est id purus porta, 
malesuada erat volutpat. Cras commodo odio nulla. Nam vehicula risus id lacus 
vestibulum. Maecenas aliquet iaculis dignissim. Phasellus aliquam facilisis 
pellentesque ultricies. Vestibulum dapibus quam leo, sed massa ornare eget. 
Praesent euismod ac nulla in lobortis. 
Sed sodales tellus non semper feugiat.""" 

def wrapped_lines(line, width=80): 
    whitespace = set(" \n\t\r") 
    length = len(line) 
    start = 0 

    while start < (length - width): 
     # we take next 'width' of characters: 
     chunk = line[start:start+width+1] 
     # if there is a newline in it, let's return first part 
     if '\n' in chunk: 
      end = start + chunk.find('\n') 
      yield line[start:end] 
      start = end+1 # we set new start on place where we are now 
      continue 

     # if no newline in chunk, let's find the first whitespace from the end 
     for i, ch in enumerate(reversed(chunk)): 
      if ch in whitespace: 
       end = (start+width-i) 
       yield line[start:end] 
       start = end + 1 
       break 
      else: # just for readability 
       continue 
    yield line[start:] 

for line in wrapped_lines(lorem, 30): 
    print line 

編輯我不喜歡上面的版本,它對我的​​口味有點醜陋和非pythonic。以下是另一種:

def wrapped_lines(line, width=80): 
    whitespace = set(" \n\t\r") 
    length = len(line) 
    start = 0 

    while start < (length - width): 
     end = start + width + 1 
     chunk = line[start:end] 
     try: 
      end = start + chunk.index('\n') 
     except ValueError: # no newline in chunk 
      # we iterate characters from the end: 
      for i, ch in enumerate(reversed(chunk)): 
       if ch in whitespace: 
        end -= i # we have our end on first whitespace 
        break 
     yield line[start:end] 
     start = end + 1 
    yield line[start:] 
+1

注意:OP特別聲明他知道'textwrap'選項,並不想使用它(我明白使用'textwrap'是最好的選擇,但OP希望學習)。 – Ffisegydd

+0

感謝您指出,@Ffisegydd,我完全錯過了它。我會寫另一個答案。 –

+0

「從零開始」解決方案添加 –

-2

要獲得正確的方法,您需要首先確定您想要對比任何長於定義長度的任何事情做什麼。假設你想有一個相當傳統的包裝,其中多餘的話流到下一行,你應該有邏輯類似(注 - 這是僞代碼)

for(int lineCount=0; lineCount<totalLines; lineCount++){ 
    currentLine=lines[lineCount]; 
    if(currentLine.length < targetLength){ 
     int snipStart=currentLine.find_whitespace_before_targetLength; 
     snip = currentLine.snip(snipStart, currentLine.length); 
     if(lineCount<totalLines-1){ 
     lines[lineCount+1].prepend(snip); 
     }else{ 
     //Add snip to line array, since the last line is too long 
     } 
    } 
} 
+0

錯誤的語言,花花公子 –

+0

是的。我完成後意識到。 – Drux

0

有幾種方法去這個問題。一種方法是查找右邊距之前的最後一個空格,然後將字符串拆分,打印第一部分並在第二部分重複搜索和分割。

下面是另一種方法:將文本拆分爲單詞並逐個將單詞添加到行緩衝區。如果下一個單詞會溢出該行,則首先打印該行並將其重置。 (作爲一個額外的,該代碼允許您指定的左邊界。)

def par(s, wrap = 72, margin = 0): 
    """Print a word-wrapped paragraph with given width and left margin""" 

    left = margin * " " 
    line = "" 

    for w in s.split(): 
     if len(line) + len(w) >= wrap: 
      print left + line 
      line = "" 

     if line: line += " " 
     line += w 

    print left + line 
    print 



par("""What I'm trying to do is open up a text file with some 
    paragraphs and give each line a maximum width of X number of 
    characters.""", 36) 

par("""However, I'm having a flaw in my algorithm as this 
    will cut out words and it's not going to work. I'm not really 
    sure how to go about this. Also I'm not sure how to make it 
    change line.""", 36, 44) 

par("""I checked textwrap and I don't really want to use it at 
    this point since I want to improve my algorithmic skills.""", 
     64, 8) 

除了打印,你可以,當然,返回多行字符串用換行或,可能會更好,行的列表。

+0

你的想法很好,全局的使用是不好的。 – Davidmh

+0

@Davidmh:也許吧,但OP是「真的不知道該怎麼做」,所以它更多的是關於這個想法而不是實現。我所展示的方法與此處的其他解決方案有所不同,因爲文本可以以連續的獨立調用打印,所以它需要一個狀態。全球化可能並不是Pythonic最爲理想的解決方案,但它起作用,並且我有點不願意爲概述一個想法的幾行創建一個班級。 –

+0

正是因爲這個原因,OP是一個新手,不應該給壞的建議。它可以輕鬆地重新實現,無需類或全局變量。 – Davidmh

0

的Test.txt包含:

""" 
What I'm trying to do is open up a text file with some paragraphs and give each line a maximum width of X number of characters. 
However, I'm having a flaw in my algorithm as this will cut out words and it's not going to work. 
I'm not really sure how to go about this. Also I'm not sure how to make it change line. 
""" 
with open("test.txt") as f: 
    lines = f.readlines() 
    max_width = 25 
    result = "" 
    col = 0 
    for line in lines: 
     for word in line.split(): 
      end_col = col + len(word) 
      if col != 0: 
       end_col += 1 
      if end_col > max_width: 
       result += '\n' 
       col = 0  
      if col != 0: 
       result += ' ' 
       col += 1 
      result += word 
      col += len(word) 
     print result 


What I'm trying to do is 
open up a text file with 
some paragraphs and give 
each line a maximum width 
of X number of 
characters. 
What I'm trying to do is 
open up a text file with 
some paragraphs and give 
each line a maximum width 
of X number of 
characters. However, I'm 
having a flaw in my 
algorithm as this will 
cut out words and it's 
not going to work. 
What I'm trying to do is 
open up a text file with 
some paragraphs and give 
each line a maximum width 
of X number of 
characters. However, I'm 
having a flaw in my 
algorithm as this will 
cut out words and it's 
not going to work. I'm 
not really sure how to go 
about this. Also I'm not 
sure how to make it 
change line. 
1

的程序員的技能的一部分應該是閱讀和理解別人寫的源代碼的能力。我明白你不想使用textwrap模塊。但是,您可以從其源代碼中學習。原因是你必須反向工程也反映了從別人的頭腦中的問題的心理形象的部分。這樣你也可以學習如何更好地寫出東西。

您可以在c:\Python34\Lib\textwrap.py找到textwrap實現。您可以將其複製並重命名爲您的工作目錄以進行實驗。

相關問題