2015-02-23 27 views
1

我試圖用this example(下組合柱段)實行卡桑德拉:如何使用CQL填充Cassandra中的相關表?

所以,我創建的表鳴叫,它看起來像如下:

cqlsh:twitter> SELECT * from tweets; 

tweet_id        | author  | body 
--------------------------------------+-------------+-------------- 
73954b90-baf7-11e4-a7d0-27983e9e7f51 | gwashington | I chopped... 

(1 rows) 

現在我想填充時間線,這是一個使用CQL的相關表,我不知道如何去做。我已經試過SQL方法,但它沒有工作:

cqlsh:twitter> INSERT INTO timeline (user_id, tweet_id, author, body) SELECT 'gmason', 73954b90-baf7-11e4-a7d0-27983e9e7f51, author, body FROM tweets WHERE tweet_id = 73954b90-baf7-11e4-a7d0-27983e9e7f51; 
Bad Request: line 1:55 mismatched input 'select' expecting K_VALUES 

所以我有兩個問題:

  1. 如何填充時間表表的SQL,所以它會涉及到鳴叫
  2. 如何確保時間軸物理佈局將按照該示例中所示創建?

感謝。

編輯:

這是解釋我上面的問題#2(畫面從here取):

This is explanation for my question #2 above:

回答

3

tldr;

  1. 使用cqlsh COPY出口tweets,修改文件,使用COPY導入timeline

  2. 使用cassandra-cli驗證物理結構。

長版...

  1. 我會去在這一個不同的方式,並認爲它會在cqlsh使用本地COPY命令可能會更容易。

我跟着類似examples found here。在cqlsh中創建tweetstimeline表之後,我按照指示將行插入到tweets中。我tweets表則是這樣的:

[email protected]:stackoverflow> SELECT * FROM tweets; 

tweet_id        | author  | body 
--------------------------------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------- 
05a5f177-f070-486d-b64d-4e2bb28eaecc |  gmason | Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state. 
b67fe644-4dbe-489b-bc71-90f809f88636 | jmadison |                     All men having power ought to be distrusted to a certain degree. 
819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1 | gwashington |                 To be prepared for war is one of the most effectual means of preserving peace. 

我然後出口他們是這樣的:

[email protected]:stackoverflow> COPY tweets TO '/home/aploetz/tweets_20150223.txt' 
WITH DELIMITER='|' AND HEADER=true; 

3 rows exported in 0.052 seconds. 

然後我編輯的tweets_20150223.txt file,在前面加一個user_id列和複製幾排,像這樣的:

userid|tweet_id|author|body 
gmason|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state. 
jmadison|b67fe644-4dbe-489b-bc71-90f809f88636|jmadison|All men having power ought to be distrusted to a certain degree. 
gwashington|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace. 
jmadison|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace. 
ahamilton|819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1|gwashington|To be prepared for war is one of the most effectual means of preserving peace. 
ahamilton|05a5f177-f070-486d-b64d-4e2bb28eaecc|gmason|Those gentlemen, who will be elected senators, will fix themselves in the federal town, and become citizens of that town more than of your state. 

我保存的文件timeline_20150223.txt,並將其導入到timeline噸能,例如:

[email protected]:stackoverflow> COPY timeline FROM '/home/aploetz/timeline_20150223.txt' 
WITH DELIMITER='|' AND HEADER=true; 

6 rows imported in 0.016 seconds. 
  • 是,timeline將是寬行的表,分區上user_id,然後在tweet_id聚類。我通過運行cassandra-cli工具和timeline列族(表)驗證了「引擎蓋下」結構。在這裏,您可以看到行是如何被user_id分區,每列有tweet_id UUID作爲其名稱的一部分:
  • -

    [[email protected]] list timeline; 
    Using default limit of 100 
    Using default cell limit of 100 
    ------------------- 
    RowKey: ahamilton 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585904) 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585904) 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585904) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585715) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585715) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585715) 
    ------------------- 
    RowKey: gmason 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:, value=, timestamp=1424707827585150) 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:author, value=676d61736f6e, timestamp=1424707827585150) 
    => (name=05a5f177-f070-486d-b64d-4e2bb28eaecc:body, value=54686f73652067656e746c656d656e2c2077686f2077696c6c20626520656c65637465642073656e61746f72732c2077696c6c20666978207468656d73656c76657320696e20746865206665646572616c20746f776e2c20616e64206265636f6d6520636974697a656e73206f66207468617420746f776e206d6f7265207468616e206f6620796f75722073746174652e, timestamp=1424707827585150) 
    ------------------- 
    RowKey: gwashington 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585475) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585475) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585475) 
    ------------------- 
    RowKey: jmadison 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:, value=, timestamp=1424707827585597) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:author, value=6777617368696e67746f6e, timestamp=1424707827585597) 
    => (name=819d95e9-356c-4bd5-9ad0-8cd36a7aa5e1:body, value=546f20626520707265706172656420666f7220776172206973206f6e65206f6620746865206d6f73742065666665637475616c206d65616e73206f662070726573657276696e672070656163652e, timestamp=1424707827585597) 
    => (name=b67fe644-4dbe-489b-bc71-90f809f88636:, value=, timestamp=1424707827585348) 
    => (name=b67fe644-4dbe-489b-bc71-90f809f88636:author, value=6a6d616469736f6e, timestamp=1424707827585348) 
    => (name=b67fe644-4dbe-489b-bc71-90f809f88636:body, value=416c6c206d656e20686176696e6720706f776572206f7567687420746f206265206469737472757374656420746f2061206365727461696e206465677265652e, timestamp=1424707827585348) 
    
    4 Rows Returned. 
    Elapsed time: 35 msec(s). 
    
    +1

    寫得很好! +1 – 2015-02-24 05:58:44

    +1

    非常詳細,謝謝! – jazzblue 2015-02-24 16:37:42

    2
    1. 爲了做到這一點,你需要使用一個ETL工具。使用Hadoop或Spark。 CQL中沒有INSERT/SELECT,這是有原因的。在現實世界中,您需要從應用程序中執行2次插入 - 每次插入一次。

    2. 您將不得不相信,當您使用分區鍵和集羣鍵的主鍵時,這將以寬行格式存儲數據。

    +0

    謝謝,羅馬。另外,關於我的問題#2,我在上面編輯了以下預期時間線物理佈局的圖片。你知道這是如何自動組織「時間表」表:寬行?謝謝。 – jazzblue 2015-02-23 15:41:44

    +0

    布萊斯在這個答案上做了很棒的工作,而我在第一天忙於新工作時太忙了:) – 2015-02-24 05:57:59

    相關問題