2011-05-09 134 views
5

我有一大堆的字符串看起來像這樣:C#解析字符串

mc_gross = 22.99invoice = ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility = Ineligibleaddress_status = confirmedpayer_id = SGA8X3TX9HCVYtax = 0.00address_street = 155 第五大道sepayment_date = 16:08: 11月28日 15,2010 PSTpayment_status = Completedcharset = Windows的1252address_zip = 98045first_name = jackobmc_fee = 1.08address_country_code = USaddress_name =約翰 martinnotify_version=3.0custom=ff1ca5asdf7d[email protected]hotmail.comaddress_country=United Statesaddress_city =北 bendquantity = 1verify_sign = AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYLpayer_email =我@ gmail.comtxn_id = 4DU53818WJ271531Mpayment_type = instantlast_name = Martinaddress_state = WAreceiver_email = cravbill @ hotmail.compayment_fee = 1.08receiver_id = QG8JPB4RZJGG4txn_type = web_acceptitem_name =的 consequenceSpecifiemc_currency = USDitem_number = G10W151residence_country = UShandling_amount = 0.00一些 項目transaction_subject = ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross = 22.99shipping = 0.00

什麼是解析這個最好的方法?你會發現創建它的人會在其中放入某種突破...

無論如何,任何幫助將不勝感激。

編輯:

我感謝大家的帖子。我想知道我是否可以這樣做:

  1. 創建一個標籤列表。恩。 mc_gross=first_name=,...
  2. 做一個替換字符串:thestring.replace("first_name","\r\nfirst_name") 我在想這會給我我需要進一步解析它的休息。

您怎麼看?

+1

哇。他們在想什麼? – BoltClock 2011-05-09 04:51:12

+2

與創建這個的人一起檢查,肯定有什麼問題。你確定每個鍵/值對之間沒有CR/LF嗎? – 2011-05-09 04:52:24

+3

所以這是一個名稱/值對列表,但沒有任何種類的分隔符之間?你有回到給你這個人的選擇,並問:1)如果他們可以提供一個分隔符和2)他們創造這個時吸菸是什麼。 – DXM 2011-05-09 04:53:36

回答

3

除非這是固定的寬度(非常懷疑它),否則我會說你需要得到一個表示字段的關鍵字列表。將它們放入數據庫(SQL,XML,CSV等 - 在哪裏並不重要),然後使用它們來解析文件。希望這將以相同的順序出現,並且不會留下任何標籤。如果是這樣,請執行一個子字符串,該字符串可以在標記後的等號後面查找下一個標記在行的開頭。這會給你對應於適當標籤的值。例如,如果我們只取第一部分mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmed,我們的標籤將是mc_gross, invoice, protection_eligibility, and address_status然後,我們將從mc_gross=開始,使用Substring在字符串中找到它。對於給它的長度,我們會去找到我們的下一個標籤invoice。子串行會很複雜,但它應該完成這項工作。遍歷每個標籤。當你到達最後一個標籤時,你需要找到字符串的末尾而不是另一個標籤。

+0

這是什麼給了我的想法插入到字符串中斷,不知道它是否仍然工作 – ErocM 2011-05-09 12:00:16

+0

這給了我一個很好的開始,我可以將它分解成單獨的行,然後在'='上分割。謝謝! – ErocM 2011-05-09 16:23:34

-1

等號是一個很好的指標。在等號之間,我建議使用一些類型推理引擎的詞法工具。

2

看看使用System.Text.RegularExpressions他們可以是非常有用的。

但是,一個簡單的方法是使用字符串類中的分割函數。

string head = "mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmedpayer_id=SGA8X3TX9HCVYtax=0.00address_street=155 5th ave sepayment_date=16:08:28 Nov 15, 2010 PSTpayment_status=Completedcharset=windows-1252address_zip=98045first_name=jackobmc_fee=1.08address_country_code=USaddress_name=john martinnotify_version=3.0custom=ff1ca5asdf7d[email protected]hotmail.comaddress_country=United Statesaddress_city=north bendquantity=1verify_sign=AZussRXZRk[email protected]gmail.comtxn_id=4DU53818WJ271531Mpayment_type[email protected]hotmail.compayment_fee=1.08receiver_id=QG8JPB4RZJGG4txn_type=web_acceptitem_name=Some item of consequenceSpecifiemc_currency=USDitem_number=G10W151residence_country=UShandling_amount=0.00transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross=22.99shipping=0.00"; 

string splitStrings[] = new string[2]; 
splitString[0] = "mc_gross"; 
splitString[1] = "invoice"; 
string headArray[] = head.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries); 

你明白了,它把一切都分解成了幾部分。

+3

但是,如果字符串中沒有明確定義的模式,那就沒有用。 – 2011-05-09 04:58:20

3

正如其他人所說,除非您可以獲得原始數據以在適當的區域中包含換行符,否則下一個最好的辦法是獲取鍵名稱列表。

我假設60K其他行與您提供的一個樣本行具有相同的關鍵名稱?如果是這樣,如果有人不能提供你的列表,然後手動(不是編程)手動識別鍵名似乎是唯一的方法。

我自己試了一下。這似乎不太難做(最多幾分鐘),但可能仍需要有專業人士來確認關鍵列表是否正確。

一旦你的列表中,那麼你可以通過按鍵分開,然後將它們重新組合成一個新的列表:

mc_gross=22.99 
invoice=ff1ca57d9fa80cf93e6b300dd7f063e1 
protection_eligibility=Ineligible 
address_status=confirmed 
payer_id=SGA8X3TX9HCVY 
tax=0.00 
address_street=155 5th ave se 
payment_date=16:08:28 Nov 15, 2010 PST 
payment_status=Completed 
charset=windows-1252 
address_zip=98045 
first_name=jackob 
mc_fee=1.08 
address_country_code=US 
address_name=john martin 
notify_version=3.0 
custom=ff1ca5asdf7d9fa80cf93e6b300dd7f063e1 
payer_status=unverified 
[email protected] 
address_country=United States 
address_city=north bend 
quantity=1 
verify_sign=AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYL 
[email protected] 
txn_id=4DU53818WJ271531M 
payment_type=instant 
last_name=Martin 
address_state=WA 
[email protected] 
payment_fee=1.08 
receiver_id=QG8JPB4RZJGG4 
txn_type=web_accept 
item_name=Some item of consequenceSpecifie 
mc_currency=USD 
item_number=G10W151 
residence_country=US 
handling_amount=0.00 
transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1 
payment_gross=22.99 
shipping=0.00  

您:

string rawData = 
    "mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmedpayer_id=SGA8X3TX9HCVYtax=0.00address_street=155 5th ave sepayment_date=16:08:28 Nov 15, 2010 PSTpayment_status=Completedcharset=windows-1252address_zip=98045first_name=jackobmc_fee=1.08address_country_code=USaddress_name=john martinnotify_version=3.0custom=ff1ca5asdf7d[email protected]hotmail.comaddress_country=United Statesaddress_city=north bendquantity=1verify_sign=AZussRXZRk[email protected]gmail.comtxn_id=4DU53818WJ271531Mpayment_type[email protected]hotmail.compayment_fee=1.08receiver_id=QG8JPB4RZJGG4txn_type=web_acceptitem_name=Some item of consequenceSpecifiemc_currency=USDitem_number=G10W151residence_country=UShandling_amount=0.00transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross=22.99shipping=0.00"; 

string[] keys = { 
        "mc_gross", "invoice", "protection_eligibility", "address_status", "payer_id", "tax", 
        "address_street", "payment_date", "payment_status", "charset", "address_zip", 
        "first_name", "mc_fee", "address_country_code", "address_name", "notify_version", 
        "custom", "payer_status", "business", "address_country", "address_city", "quantity", 
        "verify_sign", "payer_email", "txn_id", "payment_type", "last_name", "address_state", 
        "receiver_email", "payment_fee", "receiver_id", "txn_type", "item_name", 
        "mc_currency", "item_number", "residence_country", "handling_amount", 
        "transaction_subject", "payment_gross", "shipping" 
       }; 

string[] values = rawData.Split(keys, StringSplitOptions.RemoveEmptyEntries); 

IEnumerable<string> parsedList = keys.Zip(values, (key, value) => key + value); 

foreach (string item in parsedList) 
{ 
    Console.WriteLine(item); 
} 

這將輸出以這種格式的數據可以通過用等號(「=」)分割每個項目來進一步解析列表,或者將原始數據串替換爲現在包含缺失換行符的數據串:

string newData = parsedList.Aggregate((data, next) => data + Environment.NewLine + next); 
+1

我做過但是你的做法更加優雅。謝謝你的提示! – ErocM 2011-05-20 02:15:06