2017-05-04 58 views
1

提取特定部分這是我的Python文件如何從文本文件中python3

path = '/my/file/list.txt' 
with open(path,'rt') as file: 
    print("step 1") 
    collected_lines = [] 
    started = False 
    for line in file: 
     for n in range(1, 10): 
      if line.startswith('PLAY NO.{}'.format(n)): 
       started = True 
       print("started at line {}".format(line[0])) 
       continue 
      if started: 
       collected_lines.append(line)   
      if started and line == 'PLAY NO.{}'.format(n+1): 
       print("end at line {}".format(line[0])) 
       break   
      print(collected_lines.append(line)) 

這是我的代碼.. OUTPUT:

None 
None 
None 
None 
None 
None 

現在我想從行開始玩No2玩No3 ....但我得到無..請任何建議...我正在使用Python 3.5

對不起,這是第一次問這個網站的問題.. 我的文件看起來像這樣..

TextFile.txt的

Hello and Welcome This is the list of plays being performed here 
       PLAY NO. 1 
1. adknjkd 
2. skdi 
3. ljdij 

       PLAY NO. 2 
1. hsnfhkjdnckj 
2. sjndkjhnd and so on 
+0

什麼線?你不顯示行。 –

+0

我們需要一個文件示例 –

+0

我會推薦過濾你想要的行。看看http:// stackoverflow。com/questions/2401785/in-python-can-i-single-line-a-for-loop-over-iterator-with-an-if-filter可能會給你提示如何以一種方便的方式做到這一點。此外,我會建議使用正則表達式來過濾像'None!= re.match(「^ No [0-9] [^ 0-9]」,line)' – TobiSH

回答

0
path = 'list.txt' 
collected_lines = [] 
with open(path,'rt') as file: 
    print("step 1") 
    started = False 
    lineNo = 0 
    for line in file: 
     lineNo += 1 
     for n in range(1, 10): 
      # print('PLAY NO. {}'.format(n)) 
      if started and line.lstrip().startswith('PLAY NO. {}'.format(n)): 
       print("### end  at line {}".format(lineNo)) 
       started = False 
       break   
      if line.lstrip().startswith('PLAY NO. {}'.format(n)): 
       started = True 
       print("### started at line {}".format(lineNo)) 
       break 
     if started: 
      collected_lines.append(line) 

print("collected_lines: \n\n", *[ item for item in collected_lines ]) 

給出:

step 1 
### started at line 2 
### end  at line 7 
collected_lines: 

       PLAY NO. 1 
    1. adknjkd 
    2. skdi 
    3. ljdij 

NOTES修正的問題:

  1. 使用.lstrip()以使.startswith()作爲expecte工作d
  2. startswith('PLAY NO. {}'.format(n)NO.{}之間增加了一個空間,使得如果條件可以找到線
  3. 在順序重新排列的if s量級,以避免端線被認爲是在添加開始行
  4. 發現started = False到循環停止收集線。

前導空格的問題已經足以防止代碼找到該行。解決這個問題並不能解決問題,因爲format字符串中缺少空格,所以必須修復這兩個問題才能使代碼按預期工作。等等......參見上面的註釋。

+0

哦..非常感謝你...它的工作! :) :) :) –

+0

你可以用'for lineNo,'enumerate(file):' –

+0

'替換'for line in file: lineNo + = 1',這是如何區分play 1和2的? –

0

如果你想爲標籤的遊戲​​數量和項目有關該劇的線列表的字典,你可以使用defaultdict

定義文本

text = """Hello and Welcome This is the list of plays being performed here 
       PLAY NO. 1 
1. adknjkd 
2. skdi 
3. ljdij 

       PLAY NO. 2 
1. hsnfhkjdnckj 
2. sjndkjhnd and so on""" 

定義正則表達式

regex = re.compile('^\s*PLAY NO. (\d+)$') 

解析線

label = None # no play to start with 
recorded_lines = defaultdict(list) 

for line_no, line in enumerate(StringIO(text)): 
# In the real code replace the 'StringIO(text)' with 'file' 
    try: 
     play_no = int(regex.findall(line)[0]) 
     # If this regex does not match, it will throw an IndexError 
     # The code underneath is only executed when a new play starts 
     if label: # if there is no play underway, there can be no ending 
      print('PLAY NO. %i ended at line number %i' % (label, line_no-1)) 
     label = play_no 
     print('PLAY NO. %i started at line number %i' % (play_no, line_no)) 
    except IndexError: 
     # no new play started 
     if label and line.strip(): 
      recorded_lines[play_no].append(line.strip()) 
    print(line_no, line) 
print(recorded_lines) 

產量

defaultdict(list, 
      {1: [(2, '1. adknjkd'), (3, '2. skdi'), (4, '3. ljdij')], 
      2: [(7, '1. hsnfhkjdnckj'), (8, '2. sjndkjhnd and so on')]}) 

這個輸出上stout

0 Hello and Welcome This is the list of plays being performed here 

PLAY NO. 1 started at line number 1 
1    PLAY NO. 1 

2 1. adknjkd 

3 2. skdi 

4 3. ljdij 

5 

PLAY NO. 1 ended at line number 5 
PLAY NO. 2 started at line number 6 
6    PLAY NO. 2 

7 1. hsnfhkjdnckj 

8 2. sjndkjhnd and so on 
+0

在python中,Try/except是可接受的[方法](https://docs.python.org/3/glossary.html#term-eafp),不應該不惜一切代價避免。正確使用它可以給出一些非常優雅的代碼 –

+0

這個正則表達式是一個非常簡單的表達式,可能對新手來說是一個很好的介紹。在這種情況下,當你想獲得遊戲編號時,'regulr表達式'比原始代碼中的'for循環更容易,更正確或更易讀 –

+0

感謝您的反饋(y):)。現在我知道你已閱讀我的評論,我將刪除它,因爲這對未來的訪問者來說沒有用處。 – Claudio