我有一個大的JSON文件(8 GB,800萬個案例),但我只需要它的一個小樣本。簡單的stream_in
不起作用,因爲文件太大。大JSON文件的樣本
爲了解決這個問題,我想下面的代碼:
books <- list("Books_5.json")
books <- map(books, ~ stream_in(file(.x)) %>% sample_n(385))
books <- as.data.frame(books)
的問題是,經過3萬頁[R停止在文件中讀取,因爲該文件是如此之大。任何想法如何獲得該文件的385個案例的樣本?
較小文件的示例。變量是相同的。
Variables: 9
$ reviewerID <chr> "AF50PEUSO9MSV", "A1L0TVAJ1TYE06", "A64NRL5OSR3KB", ...
$ asin <chr> "B0000A1G05", "B009SQQF9C", "B005HRT88G", "B00D5T3QK...
$ reviewerName <chr> "Matthew J. Hodgkins", "TCG", "Me", "J. Lee", "A. Bu...
$ helpful <list> [<1, 1>, <0, 1>, <1, 1>, <0, 0>, <0, 0>, <0, 0>, <0...
$ reviewText <chr> "This is the lens that I always keep on my camera by...
$ overall <dbl> 5, 5, 5, 5, 5, 5, 5, 4, 5, 2, 5, 4, 5, 4, 5, 5, 3, 4...
$ summary <chr> "Great lens!", "I love them! What else can I say", "...
$ unixReviewTime <int> 1370736000, 1404518400, 1387411200, 1385769600, 1379...
$ reviewTime <chr> "06 9, 2013", "07 5, 2014", "12 19, 2013", "11 30, 2...
你能提供一些示例數據嗎? –