2017-06-13 26 views
1

我有一些在線訂單數據爲XML。我想和訂單,銷售,退貨總數的報告等使用R從XML數據生成銷售報告?轉換爲數據框?

<ArrayOfItem> 
<Item> 
<total>333.3</total> 
<terminalid>1</terminalid> 
<subtotal>330</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontenders>4</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:52:54Z</transdate> 
<transtime>09:52</transtime> 
</Item> 
<Item> 
<total>343.59</total> 
<terminalid>1</terminalid> 
<subtotal>340.29</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine><LineNumber>1</LineNumber><Name>Moto G Turbo Edition Black</Name><ItemUPC>5479892348535</ItemUPC><Quantity>1</Quantity><SalePrice>330</SalePrice><IndividualPrice>330</IndividualPrice><CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>3.3</TotalTax><AppliedTaxes><LineTax><TaxId>0</TaxId><Amount>0</Amount><CreatedDate>0001-01-01T00:00:00</CreatedDate></LineTax></AppliedTaxes><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
<TransactionLine><LineNumber>2</LineNumber><Name>This Was A Man</Name><ItemUPC>777221028297</ItemUPC><Quantity>1</Quantity><SalePrice>4.99</SalePrice><IndividualPrice>4.99</IndividualPrice><CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
<TransactionLine><LineNumber>3</LineNumber><Name>A Prisoner of Birth</Name><ItemUPC>4000111222302</ItemUPC><Quantity>1</Quantity><SalePrice>5.3</SalePrice><IndividualPrice>5.3</IndividualPrice><CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate><Status>0</Status><ShippingCost>0</ShippingCost><TotalTax>0</TotalTax><AppliedTaxes /><AppliedDiscounts /><ItemCondition>SellableAsNew</ItemCondition><ReturnReason>PoorQuality</ReturnReason></TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders><transactiontenders>2</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:53:29Z</transdate> 
<transtime>09:53</transtime> 
</Item> 
</ArrayOfItem> 

我做了這樣的事情:

library(XML) 
y <- xmlToDataFrame('C:\\App\\06122017.XML') 
nrow(y) # To get total number of order 
doc = xmlInternalTreeParse('C:\\App\\06122017.XML') 
transactionlineItems <- xpathSApply(doc, '//TransactionLine') # list 
transactionlineItems 

我嘗試這樣得到的總和,但它不起作用。

colSums(y[,c("total")]) # not working 

transactionlineItems是XML元素,從中我想得出一個數據幀,應用一些邏輯(查看是否在特定項目的銷售或收益),併爲銷售單獨總數的列表並返回。此外,獲得每個產品的數量,以查看哪個產品銷售得更多。現在我正在做這個瀏覽器端,通過將邏輯應用於JSON格式的相同數據。我想將它移到服務器端並選擇了R編程。

回答

0

如果你真的有你的熱量的數據幀轉換設置:

你在正確的軌道上。這個答案結合了你的想法xmlToDataFramexpathSApply。您應該小心確保數字值不作爲字符或甚至因素處理。

library(XML) 

order.xml.string <- '<?xml version="1.0" encoding="UTF-8"?> 
<ArrayOfItem> 
<Item> 
<total>333.3</total> 
<terminalid>1</terminalid> 
<subtotal>330</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine> 
<LineNumber>1</LineNumber> 
<Name>Moto G Turbo Edition Black</Name> 
<ItemUPC>5479892348535</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>330</SalePrice> 
<IndividualPrice>330</IndividualPrice> 
<CreatedDate>2017-06-13T09:42:52.1411148Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>3.3</TotalTax> 
<AppliedTaxes> 
<LineTax> 
<TaxId>0</TaxId> 
<Amount>0</Amount> 
<CreatedDate>0001-01-01T00:00:00</CreatedDate> 
</LineTax> 
</AppliedTaxes> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontenders>4</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:52:54Z</transdate> 
<transtime>09:52</transtime> 
</Item> 
<Item> 
<total>343.59</total> 
<terminalid>1</terminalid> 
<subtotal>340.29</subtotal> 
<storeid>1000</storeid> 
<itemlist> 
<TransactionLine> 
<LineNumber>1</LineNumber> 
<Name>Moto G Turbo Edition Black</Name> 
<ItemUPC>5479892348535</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>330</SalePrice> 
<IndividualPrice>330</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:00.8548823Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>3.3</TotalTax> 
<AppliedTaxes> 
<LineTax> 
<TaxId>0</TaxId> 
<Amount>0</Amount> 
<CreatedDate>0001-01-01T00:00:00</CreatedDate> 
</LineTax> 
</AppliedTaxes> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
<TransactionLine> 
<LineNumber>2</LineNumber> 
<Name>This Was A Man</Name> 
<ItemUPC>777221028297</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>4.99</SalePrice> 
<IndividualPrice>4.99</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:07.8263895Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>0</TotalTax> 
<AppliedTaxes/> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
<TransactionLine> 
<LineNumber>3</LineNumber> 
<Name>A Prisoner of Birth</Name> 
<ItemUPC>4000111222302</ItemUPC> 
<Quantity>1</Quantity> 
<SalePrice>5.3</SalePrice> 
<IndividualPrice>5.3</IndividualPrice> 
<CreatedDate>2017-06-13T09:53:11.124866Z</CreatedDate> 
<Status>0</Status> 
<ShippingCost>0</ShippingCost> 
<TotalTax>0</TotalTax> 
<AppliedTaxes/> 
<AppliedDiscounts/> 
<ItemCondition>SellableAsNew</ItemCondition> 
<ReturnReason>PoorQuality</ReturnReason> 
</TransactionLine> 
</itemlist> 
<transactiontenders>1</transactiontenders> 
<transactiontenders>2</transactiontenders> 
<transactiontype>1</transactiontype> 
<transdate>2017-06-13T09:53:29Z</transdate> 
<transtime>09:53</transtime> 
</Item> 
</ArrayOfItem>' 

然後

doc <- xmlParse(order.xml.string, asText = TRUE) 
y <- 
    xmlToDataFrame(nodes = getNodeSet(doc, "//TransactionLine"), 
       stringsAsFactors = FALSE) 
nrow(y) # To get total number of order 

numeric.cols <- c("Quantity", 
        "SalePrice", 
        "IndividualPrice", 
        "ShippingCost", 
        "TotalTax") 

y[, numeric.cols] <- 
    lapply(y[, numeric.cols], as.numeric) 

colSums(y[(y$ItemCondition == "SellableAsNew" & 
      y$ReturnReason == "PoorQuality"), numeric.cols]) 

Quantity  SalePrice IndividualPrice ShippingCost  TotalTax 
    4.00   670.29   670.29   0.00   6.60 

xmlToList方法:

我愛dataframes不亞於任何人,但我不經常發現xmlToDataFrame是一個很好的解決方案。我不認爲這個XML內容現在真的具有嚴格的矩形形狀。例如,即使在TransactionLine路徑中,它看起來像稅和回扣路徑是嵌套的(而不是平坦的)。即使當前的格式適合於數據幀轉換,它可能會在將來發生變化,然後您需要從數據幀單元中解析出數據結構。可以考慮xmlToList而不是?或者甚至將數據保留爲XML並將XPath表達式中的所有邏輯應用於xmlApply函數。

order.xml <- 
    xmlTreeParse(order.xml.string, 
       asText = TRUE, 
       useInternalNodes = TRUE) 
orders <- xmlRoot(order.xml) 
y <- xmlToList(orders) 

my.totals <- sapply(y, function(one.item) { 
    return(as.numeric(one.item$total)) 
}) 

total.total <- sum(my.totals) 
print(total.total) 

[1] 676.89 
+0

謝謝,這是一天的在線訂單XML。在購物網站上,一天內會有多筆訂單。每個訂單由標記表示。在一個訂單中,顧客可能已經購買了許多項目,每個項目由「」表示。每個交易行都會有數量,狀態(銷售或退貨),價格。如果我可以將所有交易行的列表轉換爲單個數據框,那麼執行其他步驟會更容易。比如哪個項目是最購買的,完成多少銷售或退貨。我在上面的代碼中看到一個元素被取出用於求和。 – user3327953

+0

謝謝,我會嘗試使用數據框和列表。截至目前,在服務器端,我將XML轉換成JSON。整個邏輯使用JavaScript通過每個訂單分離必需的性質循環上的瀏覽器來完成。我很擔心,如果響應變得太大,那麼瀏覽器可能會崩潰。我的同事要求我嘗試R編程。 – user3327953