2012-09-25 94 views
0

前4天,我正在嘗試這種XML文件轉換爲CSV這一領域均有分佈如何將這個XML轉換爲CSV

XML文件部分

<!-- language: lang-xml --> 

<ponudba podjetje="SO d.o.o." velja_od="23.09.2012 @ 12:30:48"> 
    <artikel koda="LS593EAR" naziv="HP ENVY 17-2199e" kategorija="Prenosniki" podkategorija="Hewlett Packard (HP)" v_akciji="ne" kosovnost="več"> 
    <opis> 
    HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit) 
    </opis> 
    <opis_detail> 
    HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)<br/><table> <col width="25%" /> <col /> <tbody> <tr> <th>Procesor</th> <td>Intel® Core™ i7-2630QM/2.00 GHz/Quad-Core</td> </tr> <tr> <th>Delovni pomnilnik</th> <td>8 GB DDR3</td> </tr> <tr> <th>Trdi disk</th> <td>1 TB (1000 GB)/5400/SATA</td> </tr> <tr> <th>LCD zaslon</th> <td>43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080)</td> </tr> <tr> <th>Grafična kartica</th> <td>AMD Radeon™ HD 6850 Graphics</td> </tr> <tr> <th>Optična enota</th> <td>SuperMulti DVD-RW Double Layer</td> </tr> <tr> <th>USB 2.0</th> <td>2x</td> </tr> <tr> <th>USB 3.0</th> <td>1x</td> </tr> <tr> <th>eSATA</th> <td>da</td> </tr> <tr> <th>HDMI</th> <td>da</td> </tr> <tr> <th>WiFi</th> <td>da</td> </tr> <tr> <th>Bluetooth</th> <td>da</td> </tr> <tr> <th>WWAN</th> <td>ne</td> </tr> <tr> <th>Spletna kamera</th> <td>da</td> </tr> <tr> <th>Card Reader</th> <td>da</td> </tr> <tr> <th>Express Card</th> <td>ne</td> </tr> <tr> <th>TV kartica</th> <td>ne</td> </tr> <tr> <th>Finger Print</th> <td>ne</td> </tr> <tr> <th>Vhodne naprave</th> <td>brez</td> </tr>  <tr> <th>Operacijski sistem</th> <td>Microsoft Windows 7 Home Premium (64 bit)</td> </tr> <tr> <th>Država uvoza</th> <td>Italijanska tipkovnica (priložene SLO nalepke)</td> </tr> <tr> <th>Stanje modela</th> <td>HP Renew</td> </tr>  </tbody> </table> 
    </opis_detail> 
    <garancija_v_mesecih>12</garancija_v_mesecih> 
    <cena_v_EUR>1.049,00</cena_v_EUR> 
    <proizvajalec>HP</proizvajalec> 
    <stanje>na zalogi</stanje> 
    <url_foto_artikla> 
    http://www.so-doo.si/media/catalog/product/cache/1/image/265x/9df78eab33525d08d6e5fb8d27136e95/c/0/c02034964.jpg.hri_4.jpg 
    </url_foto_artikla> 
    <vec_fotk_artikla> 
    <slika href="http://www.so-doo.si/media/catalog/product/c/0/c02034982.jpg.hri_4.jpg"/> 
    <slika href="http://www.so-doo.si/media/catalog/product/c/0/c02034991.jpg.hri_4.jpg"/> 
    </vec_fotk_artikla> 
    <teza_artikla_v_kg>2.9000</teza_artikla_v_kg> 
    </artikel> 

這是CSV文件,我想有 - 標題的所有字段XML不僅僅是一些數據:(

<!-- language: lang-csv --> 

koda naziv kategorija podkategorija v_akciji kosovnost opis opis_detail garancija_v_mesecih cena_v_EUR proizvajalec stanje password url_foto_artikla vec_fotk_artikla 

所有的數據我嘗試這樣做:

// The order here determines the order in the output CSV file 
$columns = array(
    'koda', 
    'naziv', 
    'kategorija', 
    'podkategorija', 
    'v_akciji', 
    'kosovnost' 
); 

// This will be used later on to correctly sort in the attribute values 
// Note: the third paramter of "array_fill" determines what value to use 
// in case a node lacks an attribute 
$csv_blueprint = array_combine(
    $columns, 
    array_fill(0, count($columns), '') 
); 

$data = array($columns); 
$filexml = 'so_feed.xml'; 

if (!file_exists($filexml)) { 
    // Do some error routine 
} else { 
    $xml = simplexml_load_file($filexml); 
    $artikel = $xml->artikel; 

    if (!count($artikel)) { 
     // Stop processing 'cause there's nothing to do 
    } else { 
     foreach ($artikel as $item) { 
      // Clone the row blueprint to leave the original unspoiled 
      $row = $csv_blueprint; 

而且我想這:

$xml = simplexml_load_file($filexml); 
//$artikel = $xml->artikel; 
$ponudbas = $xml->ponudba; 
... 
    foreach ($ponudbas as $ponudba) { 
     // Clone the row blueprint to leave the original unspoiled 
     $row = $csv_blueprint; 

但是這兩種情況下不會從XML解析所有數據。 我不知道該怎麼辦:(

+0

是否必須是PHP? – JMK

回答

0

如果XML正是你已拷貝它不是一個有效的XML文檔,它在最後缺少</ponudba>

另一個要考慮的是XML格式是元素內部的數據,在你的情況下,我們可以看到在兩個元素(17'')中使用了雙引號'',這在某些特殊情況下會導致解析錯誤,如果你真的想使用它們,更好地使用CDATA塊內的數據來轉義那些特殊字符。

編輯:我剛剛看到你的XML包含HTML元素insi de XML元素,鼓勵您爲這種XML元素使用CDATA塊。

如果容易的話,你可以簡單的XML轉換成JSON並直接解碼到一個PHP對象:

$json = json_encode($xml); 
$data = json_decode($json, TRUE); 

如果你想回寫csv文件,你應該考慮使用fputcsv( http://php.net/manual/fr/function.fputcsv.php)

編輯2 嘗試一個簡單的測試:

使用:

$file='file.xml'; 
$xml = simplexml_load_file($file); 

foreach ($xml->artikel as $art) 
{  
    echo $art->opis_detail; 
} 

這隻會輸出:

HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit) 

現在,如果你對你的XML的節點上CDATA元素:

<opis_detail><![CDATA[HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)<br/><table> <col width="25%" /> <col /> <tbody> <tr> <th>Procesor</th> <td>Intel® Core™ i7-2630QM/2.00 GHz/Quad-Core</td> </tr> <tr> <th>Delovni pomnilnik</th> <td>8 GB DDR3</td> </tr> <tr> <th>Trdi disk</th> <td>1 TB (1000 GB)/5400/SATA</td> </tr> <tr> <th>LCD zaslon</th> <td>43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080)</td> </tr> <tr> <th>Grafična kartica</th> <td>AMD Radeon™ HD 6850 Graphics</td> </tr> <tr> <th>Optična enota</th> <td>SuperMulti DVD-RW Double Layer</td> </tr> <tr> <th>USB 2.0</th> <td>2x</td> </tr> <tr> <th>USB 3.0</th> <td>1x</td> </tr> <tr> <th>eSATA</th> <td>da</td> </tr> <tr> <th>HDMI</th> <td>da</td> </tr> <tr> <th>WiFi</th> <td>da</td> </tr> <tr> <th>Bluetooth</th> <td>da</td> </tr> <tr> <th>WWAN</th> <td>ne</td> </tr> <tr> <th>Spletna kamera</th> <td>da</td> </tr> <tr> <th>Card Reader</th> <td>da</td> </tr> <tr> <th>Express Card</th> <td>ne</td> </tr> <tr> <th>TV kartica</th> <td>ne</td> </tr> <tr> <th>Finger Print</th> <td>ne</td> </tr> <tr> <th>Vhodne naprave</th> <td>brez</td> </tr>  <tr> <th>Operacijski sistem</th> <td>Microsoft Windows 7 Home Premium (64 bit)</td> </tr> <tr> <th>Država uvoza</th> <td>Italijanska tipkovnica (priložene SLO nalepke)</td> </tr> <tr> <th>Stanje modela</th> <td>HP Renew</td> </tr>  </tbody> </table>]]> 
    </opis_detail> 

現在將對此輸出:

HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit) 
Procesor Intel® Core™ i7-2630QM/2.00 GHz/Quad-Core 
Delovni pomnilnik 8 GB DDR3 
Trdi disk 1 TB (1000 GB)/5400/SATA 
LCD zaslon 43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080) 
GrafiÄna kartica AMD Radeonâ„¢ HD 6850 Graphics 
OptiÄna enota SuperMulti DVD-RW Double Layer 
USB 2.0 2x 
USB 3.0 1x 
eSATA da 
HDMI da 
WiFi da 
Bluetooth da 
WWAN ne 
Spletna kamera da 
Card Reader da 
Express Card ne 
TV kartica ne 
Finger Print ne 
Vhodne naprave brez 
Operacijski sistem Microsoft Windows 7 Home Premium (64 bit) 
Država uvoza Italijanska tipkovnica (priložene SLO nalepke) 
Stanje modela HP Renew 

我認爲這是缺少的數據沒有?

+0

我忘了把,我的錯,對不起。你能給我一些json的例子,只是爲了看看你在說什麼。對不起,我是PHP新手 – Eager2Learn