2013-10-14 54 views
1

我有這樣的微博有關的數據集:[R data.table設置鍵列長度誤差

uid mid annotations bmiddle_pic created_at favorited geo in_reply_to_screen_name in_reply_to_status_id in_reply_to_user_id original_pic reTweetId reUserId source thumbnail_pic truncated dateTime year month date 
2025135630 3431909076450860   Fri Apr 06 20:12:27 +0800 2012 FALSE None NA NA NA  3.42867E+15 1292317643 <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android???</a>  FALSE 6/4/12 20:12 2012 4 6 
1707427294 3439478742005300   Fri Apr 27 17:31:36 +0800 2012 FALSE None NA NA NA  3.43689E+15 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a>  FALSE 27/4/12 17:31 2012 4 27 
1707427294 3449202430032250   Thu May 24 13:30:06 +0800 2012 FALSE None NA NA NA  3.44822E+15 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a>  FALSE 24/5/12 13:30 2012 5 24 
1444865141 3432162475292600   Sat Apr 07 12:59:23 +0800 2012 FALSE None NA NA NA  3.43215E+15 1406200033 <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone???</a>  FALSE 7/4/12 12:59 2012 4 7 
1444865141 3451309551846890   Wed May 30 09:03:02 +0800 2012 FALSE None NA NA NA  3.45109E+15 1406200033 <a href=http://localhostrel=nofollow>????</a>  FALSE 30/5/12 9:03 2012 5 30 
1422308692 3449219618915960 [{'name': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9\u8bdd', 'title': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9...', 'url': u'http://localhost/ft/201205215587', 'detailid': u'201205215587', 'appid': 47, 'id': u''}]  Thu May 24 14:38:21 +0800 2012 FALSE None NA NA NA  3.44922E+15 1438620052 <a href=http://localhost rel=nofollow>???</a>  FALSE 24/5/12 14:38 2012 5 24 

,我把它變成一個data.table但我不能設置鍵值:

DT <- data.table(df) 
keycols = c("reUserId", "year","month") 
setkeyv(DT, keycols) 

它說:

Error in setkeyv(eihun.im60k, keycols) : 
Column 17 is length 9 which differs from length of column 1 (143). Invalid data.table. Check NEWS link at top of ?data.table for latest bug fixes. If not already reported and fixed, please report to datatable-help. 

當我嘗試測試作爲作者馬特Dowle建議`data.table` error: "reorder received irregular lengthed list" in setkey

sapply(DT, length) 

,並返回:

   uid      mid    annotations    bmiddle_pic 
       143      143      143      143 
     created_at    favorited      geo in_reply_to_screen_name 
       143      143      143      143 
in_reply_to_status_id  in_reply_to_user_id   original_pic    reTweetId 
       143      143      143      143 
      reUserId     source   thumbnail_pic    truncated 
       143      143      143      143 
      dateTime     year     month     date 
       143      143      143      143 

所以,如果每列有143的長度,爲什麼我還收到此錯誤,即列17的長度是9?提前致謝!

P.S.

dput(head(df)) 

,並返回

structure(list(uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month = structure(c(2L, 
3L, 4L, 5L, 6L, 1L), .Label = c("105411 1422308692 3449219618915963 Thu May 24 14:38:21 +0800 2012 False None NA NA NA 3449215332521999 1438620052 <a href=http://localhost rel=nofollow>\345\276\256\350\256\277\350\260\210</a> False 2012-05-24 14:38:21 2012 5", 
"22527 2025135630 3431909076450865 Fri Apr 06 20:12:27 +0800 2012 False None NA NA NA 3428667298503554 1292317643 <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android\345\256\242\346\210\267\347\253\257</a> False 2012-04-06 20:12:27 2012 4", 
"90933 1707427294 3439478742005300 Fri Apr 27 17:31:36 +0800 2012 False None NA NA NA 3436888868360479 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a> False 2012-04-27 17:31:36 2012 4", 
"91994 1707427294 3449202430032258 Thu May 24 13:30:06 +0800 2012 False None NA NA NA 3448224780547857 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a> False 2012-05-24 13:30:06 2012 5", 
"93408 1444865141 3432162475292602 Sat Apr 07 12:59:23 +0800 2012 False None NA NA NA 3432146591339391 1406200033 <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone\345\256\242\346\210\267\347\253\257</a> False 2012-04-07 12:59:23 2012 4", 
"93772 1444865141 3451309551846895 Wed May 30 09:03:02 +0800 2012 False None NA NA NA 3451094757864706 1406200033 <a href=http://localhost rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232</a> False 2012-05-30 09:03:02 2012 5" 
), class = "factor")), .Names = "uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month", row.names = c(NA, 
6L), class = "data.frame") 
+0

您在那裏給出的樣本看起來對於data.frame來說真的很奇怪......請發佈'dput(head(df))'的輸出。 – Roland

+0

謝謝@Roland!儘管我在這裏沒有看到更好的格式。 – leoce

+0

你的data.frame中只有一列(變量)(你可以使用'str(df)'來檢查)。這可能不是你所期望的。您需要更改導入數據的方式。 – Roland

回答

2

包 'data.table' 不能拿POSIXlt的變量作爲有效的表的一部分。在data.table(mydf)之前轉換它。

+0

這在'?data.table'中有記錄:* POSIXlt不支持列類型,因爲它使用40個字節來存儲單個日期時間。如果您設法創建POSIXlt類型的列,則可能會發生意外錯誤。請參閱1.6.3的NEWS和IDateTime。 IDateTime有方法來轉換和POSIXlt * – eddi