2013-05-08 41 views
2

正則表達式模式我發現這個正則表達式模式在http://gskinner.com/RegExr/校正跨語言

,(?=(?:[^"]*"[^"]*")*(?![^"]*")) 

哪個模式匹配CSV分隔的值(更具體而言,分離逗號,可以在被分割),其在該網站上的作品與我的測試數據非常好。您可以在測試時看到我認爲是站點底部面板中的JavaScript實現。

但是,當我嘗試在C#/ .net中實現這一點時,匹配不能正常工作。 我的實現:

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))", RegexOptions.ECMAScript); 
//get data... 
foreach (string match in r.Split(sr.ReadLine())) 
{ 
    //lblDev.Text = lblDev.Text + match + "<br><br><br><p>column:</p><br>"; 
    dtF.Columns.Add(match); 
} 

//more of the same to get rows 

在某些數據行的結果上面的網站上所產生的結果完全一致,但在其他的頭6點左右的行失敗分裂或根本不存在分割陣列英寸

任何人都可以告訴我,爲什麼模式似乎不以相同的方式表現?

我的測試數據:

CategoryName,SubCategoryName,SupplierName,SupplierCode,ProductTitle,Product Company ,ProductCode,Product_Index,ProductDescription,Product BestSeller,ProductDimensions,ProductExpressDays,ProductBrandName,ProductAdditionalText ,ProductPrintArea,ProductPictureRef,ProductThumnailRef,ProductQuantityBreak1 (QB1),ProductQuantityBreak2 (QB2),ProductQuantityBreak3 (QB3),ProductQuantityBreak4 (QB4),ProductPlainPrice1,ProductPlainPrice2,ProductPlainPrice3,ProductPlainPrice4,ProductColourPrice1,ProductColourPrice2,ProductColourPrice3,ProductColourPrice4,ProductExtraColour1,ProductExtraColour2,ProductExtraColour3,ProductExtraColour4,SellingPrice1,SellingPrice2,SellingPrice3,SellingPrice4,ProductCarriageCost1,ProductCarriageCost2,ProductCarriageCost3,ProductCarriageCost4,BLACK,BLUE,WHITE,SILVER,GOLD,RED,YELLOW,GREEN,ProductOtherColors,ProductOrigination,ProductOrganizationCost,ProductCatalogEntry,ProductPageNumber,ProductPersonalisationType1 (PM1),ProductPrintPosition,ProductCartonQuantity,ProductCartonWeight,ProductPricingExpering,NewProduct,ProductSpecialOffer,ProductSpecialOfferEnd,ProductIsActive,ProductRepeatOrigination,ProductCartonDimession,ProductSpecialOffer1,ProductIsExpress,ProductIsEco,ProductIsBiodegradable,ProductIsRecycled,ProductIsSustainable,ProductIsNatural 
Audio,Speakers and Headphones,The Prime Time Company,CM5064:In-ear headphones,Silly Buds,,10058,372,"Small, trendy ear buds with excellent sound quality and printing area actually on each ear- piece. Plastic storage box, with room for cables be wrapped around can also be printed.",FALSE,70 x 70 x 20mm,,,,10mm dia,10058.jpg,10058.jpg,100,250,500,1000,2.19,2.13,2.06,1.99,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,3.81,3.71,3.42,3.17,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,Earpiece,200,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE 
Audio,Speakers and Headphones,The Prime Time Company,CM5058:Headstart,Head Start,,10060,372,"Lightweight, slimline, foldable and patented headphones ideal for the gym or exercise. These 
headphones uniquely hang from the ears giving security, comfort and an excellent sound quality. There is also a secret cable winding facility.",FALSE,130 x 85 x 45mm,,,,30mm dia,10060.jpg,10060.jpg,100,250,500,1000,5.6,5.43,5.26,5.09,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,9.47,8.96,8.24,7.97,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,print plate on ear (s),100,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE 
+2

你的意思是前六行還是前六列?如果行;那麼你需要看看'sr.ReadLine()'和圍繞它的循環,以確保你正在正確讀取數據。也;我注意到,您的測試數據在第二個數據行的產品描述列的中間包含一個換行符。換行符會影響你的結果。 – 2013-05-08 10:36:06

回答

3

使用了合適的工具。正則表達式不適合解析可以包含無限數量的嵌套引號的CSV。

使用這個代替:

快速CSV閱讀

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

我們用它在生產代碼。它的效果很好,讓你明白複雜的解析過程。有關複雜性的更多信息,請查看解決方案中包含的800多個單元測試。

0

您的C#正則表達式在LinqPad中對我很好,但是您的數據在最後一行數據中包含換行符。所以你不能簡單地使用sr.ReadLine()來讀取數據。