2016-02-03 33 views
0

我已檢索到的數據IMDB的(由於http://www.omdbapi.com/和一個小捐贈)轉儲作爲(含有1111073線)的TSV文件。每一行代表一個電影,他們是這樣的:如何解析與節點大TSV文件和流

ID imdbID Title Year Rating Runtime Genre Released Director Writer Cast Metacritic imdbRating imdbVotes Poster Plot FullPlot Language Country Awards lastUpdated 
1 tt0000001 Carmencita 1894 NOT RATED 1 min Documentary, Short  William K.L. Dickson  Carmencita  5.8 1100 http://ia.media-imdb.com/images/M/[email protected]@._V1_SX300.jpg Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face. Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face.  USA  2015-12-10 01:09:33.043000000 

我的目標是可視化的電影長度的隨時間的變化。因此,我孃家創建兩個陣列,一個用於最小/最大,一個用於每年(因爲Highcharts圖表類型「區域和線圖」期望格式)的平均值。所以我寫了一個腳本,對於一小部分子集可以正常工作,但在嘗試讀取整個文件時會引發錯誤,而不是意外。

我很清楚地知道,流應該能夠解決這個問題,但我的專業知識是有限的,這個小項目是居然還有幫我神交流更好...

這裏是腳本,因爲它目前爲:

https://gist.github.com/jfix/f79f011ce99d2049613c

如果是最好有我的問題直接顯示出來的整個劇本,我能明顯添加。

這裏是拋出的錯誤:我嘗試重新創建你的情況

$ node each.js 
buffer.js:382 
    throw new Error('toString failed'); 
    ^

Error: toString failed 
    at Buffer.toString (buffer.js:382:11) 
    at StringDecoder.write (string_decoder.js:129:21) 
    at Parser._transform (/Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:154:26) 
    at Transform._read (_stream_transform.js:167:10) 
    at Transform._write (_stream_transform.js:155:12) 
    at doWrite (_stream_writable.js:292:12) 
    at writeOrBuffer (_stream_writable.js:278:5) 
    at Writable.write (_stream_writable.js:207:11) 
    at /Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:46:14 
    at doNTCallback0 (node.js:419:9) 

感謝在正確的方向的任何指針...

回答

0

,我只是通過運行得到同樣的錯誤:

csv(file, {delimiter: tab, relax: true, columns: true}, (err, out) => { }); 

因此,似乎csv解析模塊使進程內存不足,因爲回調分配了很多數組。您可能需要爲csv-parse模塊使用流api。一個例子說明如下:http://csv.adaltas.com/parse/examples/

+0

感謝@Mathias,我看了一下他們,但他們並沒有真正似乎適合我的問題,但是,在不理解他們不夠好,可能是我的問題。我會再給它一次。 – jfix