2016-12-07 107 views
-1

我在python中有一個基本問題,那就是我試圖長時間找到解決方案,但是我無法獲得正確的輸出。根據python中的特殊字符將動態列表拆分爲子列表

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

這裏我需要根據「特殊字符」將上面的列表拆分成子列表。上面的列表是樣本列表,主列表是動態的,列表的長度可能不同。在任何情況下,列表都需要用「'字符分隔。

解決方案,我曾嘗試:

MainText = str(textvalues) 
split_index = MainText.index('',) 
l2 = MainText[:split_index] 
print(l2) 

預期的解決方案:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'] ,['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

請幫我解決這個問題。由於

+0

檢查右腿的解決方案。它適用於一些修改。在他的回答的評論中看到我的代碼。 – MYGz

+0

檢查我的解決方案,如果它適合你。 – MYGz

回答

1
import itertools 

textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 
groups = [] 
for a,b in itertools.groupby(textvalues[0], lambda x: x is not ''): 
    if a: 
     groups.append(list(b)) 
print groups 

輸出:

[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company'], ['2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 
+0

好的解決方案。非常棘手。感謝分享它。 –

0

基本上,你可以遍歷的內容,存儲在緩衝區中的子串,並轉儲緩衝區主列表跨越''分離器何時到來:

result = list() 
line = list() 
for element in textvalues[0]: 
    if element != '': 
     line.append(element) 
    else: 
     result.append(line) 
     line = list() 
+0

修復您的解決方案。檢查並編輯你的答案。 'textvalues = [['asd','','asd d','','c as d','','asd f','','lskd']] result = [] line = [] 爲元件在textvalues [0]: 如果元素= '': line.append(元件) 否則: result.append(線) 線= [] 否則: result.append(線) 打印結果' – MYGz

+0

上述代碼的輸出:'[['asd'],['asd d'],['c as d'],['asd f'],['lskd']]' – MYGz

+0

它引發錯誤,因爲多個其他人在那裏。 – Mho

0
textvalues=[['1 of 2 DOCUMENTS', 'The New York Times', 'March 17, 2016 Thursday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section A; Column 0; Classified; Pg. 19', 'LENGTH: 176 words', 'LOAD-DATE: March 17, 2016', 'Copyright 2016 The New York Times Company', '', '2 of 2 DOCUMENTS', 'The New York Times', 'March 16, 2016 Wednesday\xa0\xa0Late Edition - Final', 'Paid Notice: Deaths THORNTON, ROBERT', 'SECTION: Section B; Column 0; Classified; Pg. 16', 'LENGTH: 176 words', 'LOAD-DATE: March 16, 2016', 'Copyright 2016 The New York Times Company']] 

textvalues2 = [] 

for i in ','.join(i for i in textvalues[0]).split(',,') : 
    textvalues2.append(i.split(',')) 
相關問題