重新排列熊貓數據框的數據？

我從服務器收到一個製表符分隔的文件，該文件根據每個應答者輸出問題答案。我想將數據導入熊貓數據框，其中列是每個問題，行是每個答覆者的答案。以下是一位受訪者的看法：重新排列熊貓數據框的數據？

[2072] Anonymous 
Q-0 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.14 Student (Graduate/ Undergraduate) 
Q-1 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00  
Q-1 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00 1|1|1|1|4| 
Q-2 [01] Sat 25 May 2013 7:43 PM UTC +0000 1.00 1-3 
Q-3 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.50 Male 
Q-4 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.33 18-24 
Q-5 [01] Sat 25 May 2013 7:43 PM UTC +0000 1.00  
Q-6 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00 Prefer not to answer 
Q-7 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.50 Yes 
Q-8 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.13 Bachelor's Degree 
Q-9 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00 Other 
Q-10 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00 Mathematics 
Q-11 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.33 High school 
Q-11 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.33 College (introductory courses) 
Q-12 [01] Sat 25 May 2013 7:43 PM UTC +0000 1.00 Professional 
Q-13 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.50 Mac OS X 
Q-14 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.25 Every week 
Q-15 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00 A test that proves or disproves of some abstract theory about the world 
Q-16 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00  
Q-17 [01] Sat 25 May 2013 7:43 PM UTC +0000 2.00 Yes 
Q-18 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00  
Q-19 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.20 Timely feedback from the instructor 
Q-20 [01] Sat 25 May 2013 7:43 PM UTC +0000 0.00

每位受訪者的回答之間都有回車。謝謝你的幫助！

來源

2013-05-30 dannycab

嗯......爲什麼downvote，幫派？這似乎是一個很好的用例，可能適用於其他人。 –

不平凡的一步是劃定每個受訪者的區塊。如何重寫文件以在每一行前加上被訪者的ID？例如，在「匿名」的情況下，我看到「2072」。

import re 

f = open('new_file', 'w') 
for line in open('filename'): 
    # line might be like [####] Student_Name or Q-... 
    m = re.match('\[(\d+)\] .*', line) 
    if m: 
     # Line is like [####] Student_name. 
     respondent_id = m.group(1) 
     continue 
    # Line is like Q-... 
    # Write new line like #### Q-... 
    f.write(str(respondent_id) + line)

然後使用pandas read_csv加載這個修改過的文件，給索引分配前兩列。（它們將是MultiIndex。）然後使用unstack將Q的索引轉換爲列。

（全面披露：我測試了正則表達式，但我沒有測試過所有）

來源

2013-05-30 15:28:01

實際上，如果它們是固定大小的塊（例如每個10行），那麼可以只讀它，然後BinGroup，我認爲 – Jeff

很酷。我不知道這是一件事。 –

實際上，更容易做到這一點：''''df.groupby（df.index.to_series（）/ 3）.sum（）''（每3行）'''BinGrouper''必須直接指定標籤 – Jeff

下面是我工作：

import re 

f = open('new_file', 'w') 
for line in open('filename'): 
    m = re.match('\[\d+\]*', line) 
    if m: 
     respondent_id = m.group() 
     continue 
    f.write(str(respondent_id) + line)

來源

2013-05-30 16:21:40 dannycab

重新排列熊貓數據框的數據？

回答

相關問題