R中的數據導入分隔符問題

我試圖將文本文件導入到R中，並將其與其他數據一起放入數據框中。R中的數據導入分隔符問題

我的分隔符是"|"和我的數據樣本是在這裏：

|無痛辦理登機手續。 AC上的兩條腿：AC105，YYZ-YVR。寬敞而乾淨的A321與奇妙的船員。 AC33：YVR-SYD，非常輕的負載，並有3個席位給我自己。像往常一樣，我非常熱情友好的工作人員每年參加幾次這個太平洋航線。提前20分鐘到達。我們的國旗航空公司加拿大航空公司的的預期高水平服務。 Altitude Elite會員。 |我們最近從都柏林返回多倫多，然後返回溫尼伯。除了削減它關閉由於有限的在多倫多的工作人員，我們的飛行非常好。由於在多倫多匆匆忙忙，我們的一個隨行人員被放入貨艙。當我們抵達溫尼伯時，它住在多倫多，他們在溫尼伯機場最有幫助和善良，我們第二天接到3個電話，關於錯放的包，它被送到我們的家。我們非常感謝並感謝我們收到的服務，這是一個美好假期的完美結局。 |飛往希思羅機場的多倫多。遠比出路更糟糕的飛行。我們爲出口座位付了很高的額外費用，其中沒有任何存儲，甚至沒有座位下的任何房間。荒謬。船員很窮，不友善。一位年長的男性工作人員態度很好，就好像他正在通過爲他們服務來幫助每個人一樣。一頓合理的晚餐，但早餐是一塊香蕉麪包。而已！最糟糕的航空公司早餐我有。 enter image description here

正如你所看到的，有很多"|"，但正如下面的屏幕截圖所示，當我在R中導入數據時，它只分離了一次，而不是大約152次。

如何在數據框內的不同列中獲取每段單獨的文本？我想長度152的數據幀，而不是2

編輯：代碼行是：

myData <- read.table("C:/Users/Norbert/Desktop/research/Important files/Airline Reviews/Reviews/air_can_Review.txt", sep="|",quote=NULL, comment='',fill = TRUE, header=FALSE) 

length(myData) 
[1] 2 
class(myData) 
[1] "data.frame" 
str(myData) 
'data.frame': 1244 obs. of 2 variables: 
$ V1: Factor w/ 1093 levels "","'delayed' on departure (I reference flights between March 2014 and January 2015 in this regard: Denver, SFO,",..: 210 367 698 853 1 344 483 87 757 52 ... 
$ V2: Factor w/ 154 levels ""," hotel","5/9/2014, LHR to Vancouver, AC855. 23/9/2014, Vancouver to LHR, AC854. For Economy the leg room was OK compared to",..: 1 1 1 1 78 1 1 1 1 1 ... 

myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", stringsAsFactors = FALSE) 
str(myDataFrame) 
'data.frame': 531 obs. of 3 variables: 
    $ text  : chr "BRU-YUL, May 26th, A330-300. Departed on-time, landed 30 minutes late due to strong winds, nice flight, food" "excellent, cabin-crew smiling and attentive except for one old lady throwing meal trays like boomerangs. Seat-" "pitch was very generous, comfortable seat, IFE a bit outdated but selection was Okay. Air Canadas problem is\nthat the new pro"| __truncated__ "" ... 
$ otherVar2 : num 1 1 1 1 1 1 1 1 1 1 ... 
$ otherVar2.1: chr "blue" "blue" "blue" "blue" ... 

length(myDataFrame) 
[1] 3

來源

2015-06-01 Uther Pendragon

看看[這裏]（http://stackoverflow.com/questions/24679042/problems-with-reading-a-txt-file-eof-within-quoted-string）。你可能需要在read.table（）中添加兩個參數：'quote = NULL，comment =''' – Parfait

@Parfait它工作正常，但警告信息消失了。數據幀的長度仍然是2，當它應該是152 –

'str（myData）'輸出什麼？ – Parfait

一種更好的方式在文本讀取使用scan()，然後把它放進一個數據與你的其他變量框架（在這裏我只是做了一些）。請注意，我將上面的文字粘貼到一個名爲sample.txt的文件中，刪除開始的「|」後。

myData <- scan("sample.txt", what = "character", sep = "|") 
myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", 
          stringsAsFactors = FALSE) 
str(myDataFrame) 
## 'data.frame': 3 obs. of 3 variables: 
## $ text  : chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__ 
## $ otherVar2 : num 1 1 1 
## $ otherVar2.1: Factor w/ 1 level "blue": 1 1 1

的otherVar1，otherVar2是自己的變量只是佔位，因爲你說你想與其他變量data.frame。我選擇了一個整數變量和一個文本變量，並且通過指定一個單一的值，它將被回收用於數據集中的所有觀測值（在本例中爲3）。

我意識到你的問題是問如何讓每個文本在不同的列中，但這不是一個使用data.frame的好方法，因爲data.frames被設計用來保存列中的變量。（每列有一個文本，您不能添加其他變量。）

如果你真的要做到這一點，你必須調換之後要挾數據，如下所示：

myDataFrame <- as.data.frame(t(data.frame(text = myData, stringsAsFactors = FALSE)), stringsAsFactors = FALSE) 
str(myDataFrame) 
## 'data.frame': 1 obs. of 3 variables: 
## $ V1: chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ 
## $ V2: chr "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ 
## $ V3: chr "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__ 
length(myDataFrame) 
## [1] 3

「可憐的香蕉麪包」？絕對是經濟艙。

來源

2015-06-01 16:11:58

另一個變量2代表你編碼的行中代表什麼？它應該代表什麼？ @Ken Benoit –

我想你錯誤地理解了我真正想要完成的事情。我試圖將每個評論放在不同的列中，但是您的代碼將所有文本放在1列中......我也知道如何做，但我想在分隔符處分割文本，並將下一個評論放入新的專欄...我編輯了問題中的代碼和輸出。 @Ken Benoit –

不，我明白，試圖輕輕地建議你不應該使用這樣的data.frame。查看修改。 –

R中的數據導入分隔符問題

回答

相關問題