2017-10-20 86 views
-3

我正在使用Python「Pattern.en」包,該包給了我有關特定句子的主題,對象和其他細節。Pandas DataFrame的模式表

但我想將這個輸出存儲到另一個變量或數據框進行進一步處理,我不能這樣做。

對此的任何輸入都會有所幫助。

示例代碼在下面提到以供參考。

from pattern.en import parse 
from pattern.en import pprint 
import pandas as pd 

input = parse('I want to go to the Restaurant as I am hungry very much') 
print(input)  
I/PRP/B-NP/O want/VBP/B-VP/O to/TO/I-VP/O go/VB/I-VP/O to/TO/O/O the/DT/B-NP/O Restaurant/NNP/I-NP/O as/IN/B-PP/B-PNP I/PRP/B-NP/I-PNP am/VBP/B-VP/O hungry/JJ/B-ADJP/O very/RB/I-ADJP/O much/JJ/I-ADJP/O 

pprint(input) 

     WORD TAG CHUNK ROLE ID  PNP LEMMA             
     I PRP NP  -  -  -  -  
     want VBP VP  -  -  -  -  
     to TO  VP^ -  -  -  -  
     go VB  VP^ -  -  -  -  
     to TO  -  -  -  -  -  
     the DT  NP  -  -  -  -  
Restaurant NNP NP^ -  -  -  -  
     as IN  PP  -  -  PNP -  
     I PRP NP  -  -  PNP -  
     am VBP VP  -  -  -  -  
    hungry JJ  ADJP  -  -  -  -  
     very RB  ADJP^ -  -  -  -  
     much JJ  ADJP^ -  -  -  -  

請注意打印和pprint語句的輸出。我試圖將其中一個存儲到變量中。如果我可以將pprint語句的輸出存儲到數據框中,那麼它會更好,因爲它以表格格式打印。

但是當我嘗試這樣做,我遇到下面

df = pd.DataFrame(input) 

ValueError: DataFrame constructor not properly called!

+0

似乎基本,你讀過熊貓的文檔嗎? https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html你的錯誤表明你沒有正確地調用構造函數 - 這似乎確實如此。 – Jacob

+0

謝謝@Jacob。但我的問題不是如何解決我得到的錯誤。這是如何將pattern.en包的輸出存儲到變量或數據框中的。所以請讓我知道,如果你有任何想法。希望這不是一個基本的,你可以重新思考,如果你認爲這不是最基本的 – JKC

回答

1

提到的錯誤採取的table功能源,我拿出這

from pattern.en import parse 
from pattern.text.tree import WORD, POS, CHUNK, PNP, REL, ANCHOR, LEMMA, IOB, ROLE, MBSP, Text 
import pandas as pd 

def sentence2df(sentence, placeholder="-"): 
    tags = [WORD, POS, IOB, CHUNK, ROLE, REL, PNP, ANCHOR, LEMMA] 
    tags += [tag for tag in sentence.token if tag not in tags] 
    def format(token, tag): 
     # Returns the token tag as a string. 
     if tag == WORD : s = token.string 
     elif tag == POS : s = token.type 
     elif tag == IOB : s = token.chunk and (token.index == token.chunk.start and "B" or "I") 
     elif tag == CHUNK : s = token.chunk and token.chunk.type 
     elif tag == ROLE : s = token.chunk and token.chunk.role 
     elif tag == REL : s = token.chunk and token.chunk.relation and str(token.chunk.relation) 
     elif tag == PNP : s = token.chunk and token.chunk.pnp and token.chunk.pnp.type 
     elif tag == ANCHOR : s = token.chunk and token.chunk.anchor_id 
     elif tag == LEMMA : s = token.lemma 
     else    : s = token.custom_tags.get(tag) 
     return s or placeholder 

    columns = [[format(token, tag) for token in sentence] for tag in tags] 
    columns[3] = [columns[3][i]+(iob == "I" and " ^" or "") for i, iob in enumerate(columns[2])] 
    del columns[2] 
    header = ['word', 'tag', 'chunk', 'role', 'id', 'pnp', 'anchor', 'lemma']+tags[9:] 

    if not MBSP: 
     del columns[6] 
     del header[6] 

    return pd.DataFrame(
     [[x[i] for x in columns] for i in range(len(columns[0]))], 
     columns=header, 
    ) 

使用

>>> string = parse('I want to go to the Restaurant as I am hungry very much') 
>>> sentence = Text(string, token=[WORD, POS, CHUNK, PNP])[0] 
>>> df = sentence2df(sentence) 
>>> print(df) 
      word tag chunk role id pnp lemma 
0   I PRP  NP - - -  - 
1   want VBP  VP - - -  - 
2   to TO VP^ - - -  - 
3   go VB VP^ - - -  - 
4   to TO  - - - -  - 
5   the DT  NP - - -  - 
6 Restaurant NNP NP^ - - -  - 
7   as IN  PP - - PNP  - 
8   I PRP  NP - - PNP  - 
9   am VBP  VP - - -  - 
10  hungry JJ ADJP - - -  - 
11  very RB ADJP^ - - -  - 
12  much JJ ADJP^ - - -  - 
+0

哇,請刪除downvote。棒極了。你很棒@pacholik – JKC