2012-11-30 68 views
1

多個文本文件(可能必須通過行不平衡數列),我已經等文本以下(CSV)文件的讀取,其中前四列是我的興趣,但我之後有很多垃圾。我只是想閱讀前四列到R.閱讀第4列從r中

enter image description here

我想前四列,從而使輸出(CSV在Excel中打開)看起來像:

enter image description here

我可以由於SO的限制,不能粘貼整個文件也不要附加它。這裏是鍛鍊的小例子:

type,latitude,longitude,name,link1, 
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls hskks > ","<just link, jjksskkls hskks > " 
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls hskks > ","<just link, hskks , hsksks > " 
W,43.074887,-89.391011,Chautara Restaurant,"<just link, hskks , hsksks > ","<just link, jjksskkls hskks > " 
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls , hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > " 
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100 hskks > ","<just link, jjksskkls , ssjjs hskks > " 

任何想法只讀前四列,而導入到R?

+0

你想4行,4列或兩者兼而有之? –

+0

4列的任何數量的行 – jon

+0

請澄清你的問題(因爲最後一個問題說「行」) –

回答

2

根據您的意見對你的問題,你的標題是有點誤導。如果遇到問題,您不知道您的最終data.frame將具有的確切列數。

?read.table幫助頁面:

count.fields可確定與閱讀這會導致不正確記錄的報告文件中的問題非常有用長度

所以,讓我們嘗試了不同的答案。

首先,讓這代表你的數據:

"W",43.075319,-89.386145,"Mirch Masala","<J, K>" 
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>" 
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>" 
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>" 
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>" 
"W",43.072801,-89.395718,"Dotty Dumplings Dowry" 

(這一步不會從你身邊需要的,如果這是已經保存爲文本或CSV文件,但對於最小的緣故重複的例子......)

寫這些行文本文件來模擬read.table過程:

writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>" 
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>" 
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>" 
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>" 
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>" 
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt") 

這將創建一個「破爛」的文件在使用閱讀read.tableread.csv。這一招,雖然是用count.fields找出該文件應該有多少列有。

dat <- read.csv("myRaggedFile.txt", header=FALSE, 
       col.names=1:max(count.fields("myRaggedFile.txt", sep=","))) 
dat 
#  X1  X2  X3      X4  X5  X6  X7  X8 
# 1  W 43.07532 -89.38614   Mirch Masala <J, K>      
# 2  W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>  
# 3  W 43.07489 -89.39101 Chautara Restaurant <J, K> <J, K>    
# 4  W 43.09287 -89.35159  Dobhan Restaurant <J, K> <J, K> <J, K> <J, K> 
# 5  W 43.07475 -89.39314 State Street Cash Mart <J, K>      
# 6  W 43.07280 -89.39572 Dotty Dumplings Dowry    
dat <- dat[1:4] # To keep just the first four columns 
## Or, continuing with my original answer: 
## read.csv("myRaggedFile.txt", header=FALSE, 
##   col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4] 
+0

順便說一下,我不誠實地知道這是否比先讀取文件和丟棄不需要的列R. – A5C1D2H2I1M1N2O1R2T1

+0

這就是你正在做的事情!如果文件不太大,這是非常可以接受的。 – flodel

+0

@ flodel,這就是我以爲我在做什麼,因此我的評論。但我現在也要去做晚餐了,所以我很懶,不想研究它;) – A5C1D2H2I1M1N2O1R2T1

0
當你在你的文件使用像讀

fist4columns < - 函數read.table( 「/文件/路徑/ filename.csv」,標題= TRUE,月= 「」) ,C(1:4)]