2017-05-15 53 views
0

我想導入CSV文件卡桑德拉所以,首先我創建密鑰空間和ColumnFamily中這樣如何導入CSV文件卡桑德拉

CREATE COLUMNFAMILY Consumer_complaints(
    Date_received varchar, 
    Product varchar, 
    Sub_product varchar, 
    Issue varchar, 
    Sub_issue varchar, 
    Consumer_complaint_narrative varchar, 
    Company_public_response varchar, 
    Company varchar, 
    State varchar, 
    ZIP_code varint, 
    Tags varchar, 
    Consumer_consent_provided varchar, 
    Submitted_via varchar, 
    Date_sent_to_company varchar, 
    Company_response_to_consumer varchar, 
    Timely_response varchar, 
    Consumer_disputed varchar, 
    Complaint_ID varint, 
    PRIMARY KEY(Complaint_ID) 
); 

我從www.data.gov稱爲消費者投訴csv文件 然後我在命令行鍵入

COPY consumer_complaints (Date_received,Product,Sub_product, Issue, Sub_issue, Consumer_complaint_narrative, Company_public_response, Company, State, ZIP_code, Tags, Consumer_consent_provided, Submitted_via, Date_sent_to_company, Company_response_to_consumer, Timely_response, Consumer_disputed, Complaint_ID) FROM 'consumer_complaints.csv'; 

採樣輸入

3/21/2017,Credit reporting,,Incorrect information on credit report,Information is not mine,,Company has responded to the consumer and the CFPB and chooses not to provide a public response,EXPERIAN DELAWARE GP,TX,77075,Older American,N/A,Phone,03/21/2017,Closed with non-monetary relief,Yes,No,2397100 
04/19/2017,Debt collection,"Other (i.e. phone, health club, etc.)",Disclosure verification of debt,Not disclosed as an attempt to collect,,,"Security Credit Services, LLC",IL,60643,,,Web,04/20/2017,Closed with explanation,Yes,No,2441777 

錯誤

Failed to import 1 rows: ParseError - Failed to parse 797XX : invalid lit for int() with base 10: '797XX', given up without retries 
Failed to import 1 rows: ParseError - Failed to parse 354XX : invalid lit for int() with base 10: '354XX', given up without retries 
Failed to import 2 rows: ParseError - Failed to parse 313XX : invalid lit for int() with base 10: '313XX', given up without retries 
Failed to import 2 rows: ParseError - Failed to parse 054XX : invalid lit for int() with base 10: '054XX', given up without retries 

我該如何解決?

+0

顯示示例csv文件 –

+0

嗨@ImJa您正在使用cassandra版本的cqlsh?我已經使用cassandra 2.2.4的cqlsh並且它正常工作 –

+0

我檢查數據,一些郵政編碼具有非整數值,如'797XX','354XX','313XX'和'054XX'。你可以看到它顯然不是整數。您可以將這些值更改爲整數或更改您的表並將ZIP_code的類型更改爲'varchar' –

回答

1

Cassandra在創建時不保留列的順序。 您需要在導入數據時指定列名稱。

試試這個命令:

COPY consumer_complaints (Date_received,Product,Sub_product, Issue, Sub_issue, Consumer_complaint_narrative, Company_public_response, Company, State, ZIP_code, Tags, Consumer_consent_provided, Submitted_via, Date_sent_to_company, Company_response_to_consumer, Timely_response, Consumer_disputed, Complaint_ID) FROM 'c.csv' WITH HEADER = true; 

樣品輸入:

Date_received,Product,Sub_product, Issue, Sub_issue, Consumer_complaint_narrative, Company_public_response, Company, State, ZIP_code, Tags, Consumer_consent_provided, Submitted_via, Date_sent_to_company, Company_response_to_consumer, Timely_response, Consumer_disputed, Complaint_ID 
07/26/2013,Mortgage,FHA mortgage,"Loan servicing, payments, escrow account",,,,"CITIBANK, N.A.",NC,28056,,N/A,Web,07/29/2013,Closed with explanation,Yes,No,467750 
09/26/2014,Consumer Loan,Vehicle loan,Managing the loan or lease,,,,HSBC NORTH AMERICA HOLDINGS INC.,NY,12572,,N/A,Web,09/26/2014,Closed with explanation,Yes,No,1046323 

輸出:

complaint_id | company       | company_public_response | company_response_to_consumer | consumer_complaint_narrative | consumer_consent_provided | consumer_disputed | date_received | date_sent_to_company | issue         | product  | state | sub_issue | sub_product | submitted_via | tags | timely_response | zip_code 
--------------+----------------------------------+-------------------------+------------------------------+------------------------------+---------------------------+-------------------+---------------+----------------------+------------------------------------------+---------------+-------+-----------+--------------+---------------+------+-----------------+---------- 
     1046323 | HSBC NORTH AMERICA HOLDINGS INC. |     null |  Closed with explanation |       null |      N/A |    No | 09/26/2014 |   09/26/2014 |    Managing the loan or lease | Consumer Loan | NY |  null | Vehicle loan |   Web | null |    Yes | 12572 
     467750 |     CITIBANK, N.A. |     null |  Closed with explanation |       null |      N/A |    No | 07/26/2013 |   07/29/2013 | Loan servicing, payments, escrow account |  Mortgage | NC |  null | FHA mortgage |   Web | null |    Yes | 28056 

編輯

我檢查來自https://catalog.data.gov/dataset/consumer-complaint-database的數據,一些郵政編碼具有非整數值,如797XX,354XX,313XX054XX。你可以看到它顯然不是整數。您可以將這些值更改爲整數或更改您的表格,並將您的ZIP_code字段的類型更改爲varchar

+0

我試過了你的命令。但錯誤仍然存​​在,如下所示:無法導入1行:ParseError - 無法解析797XX:無效文字爲int()與基地10:'797XX',放棄而不重試 – ImJa

+0

@IJJ它應該工作。檢查我的輸入和輸出。並顯示您的輸入不起作用 –

+0

它是一個原始CSV文件文件的一部分 - 收到的日期,產品,子產品,問題,子問題,消費者投訴訴訟,公司公衆響應,公司,州,郵政編碼,標籤,提供的消費者同意?,提交通過,發送給公司的日期,公司對消費者的迴應,及時迴應?,消費者有爭議?,投訴ID 正如您所見,列中有空格,所以我對其進行了修正。我認爲這是產生錯誤的原因。如果這是真的,我該怎麼辦? – ImJa

1

你的錯誤消息指出:

無法導入1行:ParseError - 解析失敗2013年11月8日:無效字面對於int()與基體10:'11/08/2013' ,放棄而不重試

它看起來像卡桑德拉試圖插入日期字符串作爲一個整數。在沒有看到CSV的情況下,我猜測列的順序不正確,日期被解析爲你的一個varint字段。如果您共享CSV樣本,則可能更易於調試。

+0

您可以從https://catalog.data.gov/dataset/consumer-complaint-database下載csv文件。我不知道爲什麼Cassandra試圖以整數形式插入數據字符串。 – ImJa

+0

這是一個樣本::::::::::: 2017/3/21,信用報告,,信用報告信息不正確,信息不是我的,公司已經回覆了消費者和C FPB並選擇了不提供公開回復,EXPERIAN DELAWARE GP,TX,77075,美國老年人,N/A,電話號碼,2017年3月21日,非貨幣寬免,是,不,2397100 – ImJa