我想我並沒有提出正確的問題。沒有正確讀取數據?
新問題: 我有一個1.5gig tsv文件。它在頂部有6行垃圾,底部有一行垃圾,所有這些我都想在不打開文件的情況下移除。第7行是標題。我有13個標題。行數未知。
怎樣文件讀入到一個數據幀,這樣我可以做基本的描述性統計,箱線圖,等....
原題:
嗨
我有這種感覺真的很容易。我只是想念一些東西。
我有一個txt文件,tab分開,頂部有6行垃圾,底部也有垃圾行。 在垃圾我有形式 的Label1 Label2的LABEL3 Label4的數據之間.... Label13 文本ID號百分之....號
這裏是我的R中輸入:
datadump <- read.delim2("truncate.txt", header=TRUE, skip="6")
cleandata <- datadump[c(-dim(datadump)[1]),]
avgposition <- cleandata$Avg.Position
hist(avgposition)
魅力.POSITION是label13和一些形式的##
然而,我得到一個錯誤: 錯誤hist.default(avgposition):「X」必須是數字
爲什麼沒有看到DAT一個數字?
謝謝!
由於這裏要求的一些數據:
> dput(cleandata)
structure(list(Account = structure(c(2L, 2L), .Label = c("Crap1",
"XXS"), class = "factor"), Campaign = structure(c(1L, 1L), .Label = c("3098012",
"Crap2"), class = "factor"), Customer.Id = structure(c(2L, 2L
), .Label = c("", "nontech broad (7)"), class = "factor"), Ad.Group = structure(c(2L,
2L), .Label = c("", "RR 236 (300)"), class = "factor"), Keyword = structure(2:3, .Label = c("",
"chagall pro", "matisse"), class = "factor"), Keyword.Matching = structure(c(2L,
2L), .Label = c("", "Broad"), class = "factor"), Impressions = c(4L,
16L), Clicks = c(1L, 1L), CTR = structure(2:3, .Label = c("",
"25.00%", "6.25%"), class = "factor"), Avg.CPC = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.CPM = structure(2:3, .Label = c("",
"$12.50 ", "$6.88 "), class = "factor"), Cost = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.Position = structure(2:3, .Label = c("",
"3", "3.1"), class = "factor")), .Names = c("Account", "Campaign",
"Customer.Id", "Ad.Group", "Keyword", "Keyword.Matching", "Impressions",
"Clicks", "CTR", "Avg.CPC", "Avg.CPM", "Cost", "Avg.Position"
), row.names = 1:2, class = "data.frame")
是否有機會發布文本文件幾行內容的確切內容? – 2010-09-27 23:05:25
修改數據以保持匿名,但本質上我有1演出它的形式: – datayoda 2010-09-27 23:12:14
嘗試使用頭(x,5),然後複製並粘貼一個dput(x)它使人們更容易看你的例子。 – 2010-09-27 23:22:37