如何在R中逐字輸入CSV？

我想在R中導入一個CSV數據。它是一行數據，並有逗號分隔的條目。數據虛擬提供如下：如何在R中逐字輸入CSV？

Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD

這是一個單行的CSV。如何正確讀取它。這只是一個虛擬數據集。提供給我的數據集有35個變量和10000個觀察值。任何人都可以向我提供正確的邏輯和相關的代碼。

編輯：所需的輸出是：

Id SecoId TertioID CreateDate Lat  Long Duration Istrue  JournalDate   Post 
3232 123  345 30/04/14 2:00 11.726 11.728 5  FALSE 02/04/2014 05:02 +01:00 ABC 
3233 124  346 30/04/14 3:00 11.789 11.779 6  TRUE 03/04/2014 06:00 +01:00 BCD 

Logic Thought by me: 

1. Count the number of variables in the dataset. 
2. read the file word by word. 
3. Store the values between "," in a cell of the table, and doesnot alter the spaces between the values i.e. in CreateDate value it accepts "30/04/14 2:00" as a single value. 
4. the loop runs until the last variable is encountered. and when the loop ends the new row is created and observation is stored from there.

雖然我不能成功地創建一個相關的代碼。

如果在R中逐字逐句閱讀，那麼誰能幫我解決相關問題？

來源

2016-01-24 desmond.carros

你應該可以通過逐行讀取來達到你想要的效果，這是'read.csv（）'的默認行爲。 –

請閱讀編輯@TimBiegeleisen –

嘗試這種情況：

# Read in data 
vec <- "Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD" 

# Put in delimiters for where the line breaks should have been and split the data for each line. 
vec <- unlist(strsplit(gsub("([a-z]|[A-Z]) (\\d)", "\\1;\\2", vec), ";")) 

# Process data for each column 
list.split <- strsplit(vec, ",") 

# Write out the data to a matrix 
mat.out <- matrix(unlist(list.split), ncol = length(list.split[[1]]), nrow = length(list.split), byrow = TRUE) 
colnames(mat.out) <- mat.out[1,] 
mat.out <- mat.out[-1,]

來源

2016-01-24 12:01:09 JackeJR

謝謝@JackeJR，但我有一個10000個觀測值的CSV文件。 –

你可以做一個'readLine'來將整個文件讀入一個向量。 – JackeJR

@JackeJR這是我的感覺，我不認爲OP真的需要閱讀一個字和一個時間。 –

如果inp是輸入的單線然後計算領域，k的數量，並從該計算圖案pat以匹配它們。使用gsub使用read.csv每個圖案匹配之後插入一個新行，並最終在結果顯示爲：

k <- length(read.table(text = inp, comment = " ", sep = ",")) # no of fields 
pat <- sprintf("((.*?,){%d}.*? +)", k-1) # pattern to match k fields 
read.csv(text = gsub(pat, "\\1\n", inp), strip.white = TRUE, as.is = TRUE)

如果inp是在問題的端輸入線上面的代碼輸出該數據幀：

PulseId JourneyId TransmissionId CreateDate  Lat  Long Speed 
1 367515  3237    1 30/04/14 4:02 51.53749 -3.590589  7 
2 3657521  3237    1 30/04/14 4:02 51.53704 -3.589859 11 
3 3657522  3237    1 30/04/14 4:02 51.53695 -3.589748 12 
    Heading HAccuracy Altitude VAccuracy DDuration DDistance DHeading RSL 
1  129  15  98   0   1 8.639347  1292 22.4 
2  141  10  99   0   1 11.811534  1 22.4 
3  144  10  100   0   1 12.805132  3 22.4 
    RSLRoadTypeId RSLValidation RSLCountryId PulseTypeId IsNightTime Congestion 
1    2    1   826   2  FALSE   0 
2    5    1   826   2  FALSE   0 
3    2    1   826   2  FALSE   0 
    Idle AccelBrake Cornering IsNearRailway IsSpeedValid Familiar IntLat3 
1 0 0.2038734 1.60655912   FALSE   TRUE  1 51537 
2 0 0.0000000 0.01957049   FALSE   TRUE  1 51537 
3 0 0.1019367 0.06404887   FALSE   TRUE  1 51537 
    IntLong3    LocalDateTime Smoothness PhoneId  PolicyId 
1 -3591 30/04/2014 05:02:45 +01:00   2  43 4663627m000010 
2 -3590 30/04/2014 05:02:51 +01:00   0  43 4663627m000010 
3 -3590 30/04/2014 05:02:52 +01:00   1  43 4663627m000010 
          DevideId  DNA 
1 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
2 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
3 829eba198fa483a49f14b66b8f1dadb5 0.04444444

來源

2016-01-24 20:14:17

如何在R中逐字輸入CSV？

回答

相關問題