2016-01-24 52 views
0

我想在R中導入一個CSV數據。它是一行數據,並有逗號分隔的條目。數據虛擬提供如下:如何在R中逐字輸入CSV?

Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD 

這是一個單行的CSV。如何正確讀取它。 這只是一個虛擬數據集。提供給我的數據集有35個變量和10000個觀察值。任何人都可以向我提供正確的邏輯和相關的代碼。

編輯:所需的輸出是:

Id SecoId TertioID CreateDate Lat  Long Duration Istrue  JournalDate   Post 
3232 123  345 30/04/14 2:00 11.726 11.728 5  FALSE 02/04/2014 05:02 +01:00 ABC 
3233 124  346 30/04/14 3:00 11.789 11.779 6  TRUE 03/04/2014 06:00 +01:00 BCD 

Logic Thought by me: 

1. Count the number of variables in the dataset. 
2. read the file word by word. 
3. Store the values between "," in a cell of the table, and doesnot alter the spaces between the values i.e. in CreateDate value it accepts "30/04/14 2:00" as a single value. 
4. the loop runs until the last variable is encountered. and when the loop ends the new row is created and observation is stored from there. 

雖然我不能成功地創建一個相關的代碼。

如果在R中逐字逐句閱讀,那麼誰能幫我解決相關問題?

+0

你應該可以通過逐行讀取來達到你想要的效果,這是'read.csv()'的默認行爲。 –

+0

請閱讀編輯@TimBiegeleisen –

回答

1

嘗試這種情況:

# Read in data 
vec <- "Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD" 

# Put in delimiters for where the line breaks should have been and split the data for each line. 
vec <- unlist(strsplit(gsub("([a-z]|[A-Z]) (\\d)", "\\1;\\2", vec), ";")) 

# Process data for each column 
list.split <- strsplit(vec, ",") 

# Write out the data to a matrix 
mat.out <- matrix(unlist(list.split), ncol = length(list.split[[1]]), nrow = length(list.split), byrow = TRUE) 
colnames(mat.out) <- mat.out[1,] 
mat.out <- mat.out[-1,] 
+0

謝謝@JackeJR,但我有一個10000個觀測值的CSV文件。 –

+2

你可以做一個'readLine'來將整個文件讀入一個向量。 – JackeJR

+0

@JackeJR這是我的感覺,我不認爲OP真的需要閱讀一個字和一個時間。 –

2

如果inp是輸入的單線然後計算領域,k的數量,並從該計算圖案pat以匹配它們。使用gsub使用read.csv每個圖案匹配之後插入一個新行,並最終在結果顯示爲:

k <- length(read.table(text = inp, comment = " ", sep = ",")) # no of fields 
pat <- sprintf("((.*?,){%d}.*? +)", k-1) # pattern to match k fields 
read.csv(text = gsub(pat, "\\1\n", inp), strip.white = TRUE, as.is = TRUE) 

如果inp是在問題的端輸入線上面的代碼輸出該數據幀:

PulseId JourneyId TransmissionId CreateDate  Lat  Long Speed 
1 367515  3237    1 30/04/14 4:02 51.53749 -3.590589  7 
2 3657521  3237    1 30/04/14 4:02 51.53704 -3.589859 11 
3 3657522  3237    1 30/04/14 4:02 51.53695 -3.589748 12 
    Heading HAccuracy Altitude VAccuracy DDuration DDistance DHeading RSL 
1  129  15  98   0   1 8.639347  1292 22.4 
2  141  10  99   0   1 11.811534  1 22.4 
3  144  10  100   0   1 12.805132  3 22.4 
    RSLRoadTypeId RSLValidation RSLCountryId PulseTypeId IsNightTime Congestion 
1    2    1   826   2  FALSE   0 
2    5    1   826   2  FALSE   0 
3    2    1   826   2  FALSE   0 
    Idle AccelBrake Cornering IsNearRailway IsSpeedValid Familiar IntLat3 
1 0 0.2038734 1.60655912   FALSE   TRUE  1 51537 
2 0 0.0000000 0.01957049   FALSE   TRUE  1 51537 
3 0 0.1019367 0.06404887   FALSE   TRUE  1 51537 
    IntLong3    LocalDateTime Smoothness PhoneId  PolicyId 
1 -3591 30/04/2014 05:02:45 +01:00   2  43 4663627m000010 
2 -3590 30/04/2014 05:02:51 +01:00   0  43 4663627m000010 
3 -3590 30/04/2014 05:02:52 +01:00   1  43 4663627m000010 
          DevideId  DNA 
1 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
2 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
3 829eba198fa483a49f14b66b8f1dadb5 0.04444444