2016-11-07 68 views
-4

所以我有一個.txt文件我需要讀入Python,但格式化阻止我使用簡單的pandas.read_csv函數。 .txt文件看起來是這樣的:Python - 讀取沒有格式化列的txt文件

"giver_username_if_known", "N/A" 

"in_test_set", false 

"number_of_downvotes_of_request_at_retrieval", 2 

"number_of_upvotes_of_request_at_retrieval", 6 

"post_was_edited", false 

"request_id", "t3_w5491" 

"request_number_of_comments_at_retrieval", 7 

"request_text", "I'm not in College, or a starving artist or anything like that. I've just been a bit unlucky lately." 

"request_text_edit_aware", "I'm not in College, or a starving artist or anything like that. I've just been a bit unlucky lately. I'm a 36 year old single guy with a job. But rent, and other bills killed me this month." 

"request_title", "[Request] Ontario, Canada - On my 3rd of 5 days without food, and it's getting unbearable. Can anyone help?" 

"requester_account_age_in_days_at_request", 14.416875 

"requester_account_age_in_days_at_retrieval", 531.9697222222222 

"requester_days_since_first_post_on_raop_at_request", 0.0 

"requester_days_since_first_post_on_raop_at_retrieval", 517.5111805555556 

"requester_number_of_comments_at_request", 8 

"requester_number_of_comments_at_retrieval", 93 

"requester_number_of_comments_in_raop_at_request", 0 

"requester_number_of_comments_in_raop_at_retrieval", 4 

"requester_number_of_posts_at_request", 1 

"requester_number_of_posts_at_retrieval", 6 

"requester_number_of_posts_on_raop_at_request", 0 

"requester_number_of_posts_on_raop_at_retrieval", 2 

"requester_number_of_subreddits_at_request", 8 

"requester_received_pizza", true 

"requester_subreddits_at_request", { 
    "AdviceAnimals" 
    "WTF" 
    "funny" 
    "gaming" 
    "movies" 
    "technology" 
    "todayilearned" 
    "videos" 
    } 

%%%%%%%%%% 

%%%%%%%%%% 

每組的「%」之後還有另一個項目(5671整體)使用相同的格式。每行中的第一個字符串是列名稱,以下字符串/整數是數據條目。如何提取每個列名稱後面的數據?

+0

你可以使用正則表達式 – Olian04

+1

你試過的代碼在哪裏? – user2728397

+0

@Rakesh_K theres沒有代碼,只是常識。這就是爲什麼我留下評論,而不是答案.... – Olian04

回答

0

我有兩個建議:

1)您的來電read_csv,加上海峽= ''。這告訴解析器分隔每行中的列/數據。

2)同樣在您對read_csv的調用中,添加comment =「%%%%%」。這告訴解析器,任何以「%%%%%」開頭的行都將被視爲註釋並被忽略。