2008-08-09 76 views
2

在閱讀其他內容之前,請花些時間閱讀original thread如何將xml文件編碼爲xfdl(base64-gzip)?

概述:.xfdl文件是經過gzip壓縮的.xml文件,然後用base64編碼。我希望將.xfdl解壓縮爲xml,然後我可以修改它,然後重新編碼回.xfdl文件。

XFDL> xml.gz> XML> xml.gz> XFDL

我已經能夠使用uudeview採取.xfdl文件,並取消它編碼的base64來自:

uudeview -i yourform.xfdl 

然後使用gunzip解

gunzip -S "" <UNKNOWN.001> yourform-unpacked.xml 

生成的XML是100%可讀和看起來精彩decommpressed它。如果沒有修改XML的話,我應該能夠使用gzip重新壓縮它:

gzip yourform-unpacked.xml 

然後重新編碼的基礎 - 64:

base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl 

如果我的想法是正確的,原來的文件並且重新編碼的文件應該相等。但是,如果我將yourform.xfdl和yourform_reencoded.xfdl放在一起,那麼它們不匹配。此外,可以在http://www.grants.gov/help/download_software.jsp#pureedge">.xfdl查看器中查看原始文件。查看器說重新編碼的xfdl不可讀。

I也嘗試了uuenview在base64中重新編碼,它也產生相同的結果。任何幫助將不勝感激

回答

0

gzip算法的不同實現將總是產生稍微不同但仍然正確的文件,也是壓縮級別的原始文件可能會有所不同然後你在哪裏運行它

2

據我所知,你找不到已壓縮文件的壓縮級別當你壓縮文件時,你可以指定壓縮級別爲 - #,其中#從1到9(1是最快的壓縮,9是壓縮最多的文件)。在實踐中,你絕對不應該將壓縮文件與已經提取並重新壓縮的壓縮文件進行比較,輕微變化很容易出現。在你的情況下,我會比較base64編碼版本而不是gzip版本。

0

有趣的,我會給它一個鏡頭。然而,變化並不輕微。新編碼的文件比較長,在比較前後的二進制文件時,數據幾乎沒有匹配。

之前(前三行)

H4sIAAAAAAAAC+19eZOiyNb3/34K3r4RT/WEU40ssvTtrhuIuKK44Bo3YoJdFAFZ3D79C6hVVhUq 
dsnUVN/qmIkSOLlwlt/JPCfJ/PGf9dwAlorj6pb58wv0LfcFUEzJknVT+/ml2uXuCSJP3kNf/vOQ 
+TEsFVkgoDfdn18mnmd/B8HVavWt5TsKI2vKN8magyENiH3Lf9kRfpd817PmF+jpiOhQRFZcXTMV 

後(前三行):

H4sICJ/YnEgAAzEyNDQ2LTExNjk2NzUueGZkbC54bWwA7D1pU+JK19/9FV2+H5wpByEhJMRH 
uRUgCMom4DBYt2oqkAZyDQlmQZ1f/3YSNqGzKT3oDH6RdE4vOXuf08vFP88TFcygYSq6dnlM 
naWOAdQGuqxoo8vjSruRyGYzfII6/id3dPGjVKwCBK+Zl8djy5qeJ5NPT09nTduAojyCZwN9 

正如你可以看到H4SI匹配起來,那麼,它的混亂之後。

+0

但是,除非您使用完全相同的gzip實現,否則您只能指望H4sI是相同的。 「Pandemonium」是正常的:-) – 2011-03-29 08:04:11

1

你需要把下面一行XFDL文件的開頭:

application/vnd.xfdl; content-encoding="base64-gzip"

你已經產生的64位編碼的文件後,在文本編輯器打開它並粘貼在第一行上面的線。確保base64的塊在第二行開始時啓動。

保存並在查看器中試試!如果它仍然無法正常工作,那麼對XML所做的更改可能會導致它不符合某種方式。在這種情況下,在修改了XML之後,在對它進行gzip和base64編碼之前,請使用.xfdl文件擴展名保存它,並嘗試使用Viewer工具打開它。如果查看器處於有效的XFDL格式,那麼查看器應該能夠解析並顯示未壓縮/未編碼的文件。

0

gzip將把文件名放在文件頭中,這樣一個gzip文件的長度根據未壓縮文件的文件名而不同。

如果在流gzip的行爲中,省略了文件名和文件是有點更短,所以下面應該工作:

gzip的yourform-unpacked.xml.gz

然後重新編碼在BASE64: 的base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

也許這將產生相同長度的文件

1

檢查這些了:

http://www.ourada.org/blog/archives/375

http://www.ourada.org/blog/archives/390

他們是在Python,Ruby的不是,但應該讓你非常接近。

該算法實際上用於頭文件'application/x-xfdl; content-encoding =「asc-gzip」'而不是'application/vnd.xfdl; content-encoding =「base64-gzip」' 但是,好消息是PureEdge(又名IBM Lotus Forms)將會毫無問題地打開該格式。

然後最糟糕的是,這裏有一個基於64位的gzip解碼(在Python),這樣就可以使全往返:

with open(filename, 'r') as f: 
    header = f.readline() 
    if header == 'application/vnd.xfdl; content-encoding="base64-gzip"\n': 
    decoded = b'' 
    for line in f: 
     decoded += base64.b64decode(line.encode("ISO-8859-1")) 
    xml = zlib.decompress(decoded, zlib.MAX_WBITS + 16) 
+0

(這不是我的博客,順便說一句。)並信貸的MAX_WBITS魔術:http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream -with-的zlib – CrazyPyro 2011-02-16 21:49:59

1

我用的Base64類從幫助做這在Java中http://iharder.net/base64

我一直在研究一個應用程序來在Java中進行表單操作。我解碼文件,從XML創建一個DOM文檔,然後將其寫回文件。

我在Java代碼中讀取文件看起來是這樣的:

public XFDLDocument(String inputFile) 
     throws IOException, 
      ParserConfigurationException, 
      SAXException 

{ 
    fileLocation = inputFile; 

    try{ 

     //create file object 
     File f = new File(inputFile); 
     if(!f.exists()) { 
      throw new IOException("Specified File could not be found!"); 
     } 

     //open file stream from file 
     FileInputStream fis = new FileInputStream(inputFile); 

     //Skip past the MIME header 
     fis.skip(FILE_HEADER_BLOCK.length()); 

     //Decompress from base 64     
     Base64.InputStream bis = new Base64.InputStream(fis, 
       Base64.DECODE); 

     //UnZIP the resulting stream 
     GZIPInputStream gis = new GZIPInputStream(bis); 

     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
     DocumentBuilder db = dbf.newDocumentBuilder(); 
     doc = db.parse(gis); 

     gis.close(); 
     bis.close(); 
     fis.close(); 

    } 
    catch (ParserConfigurationException pce) { 
     throw new ParserConfigurationException("Error parsing XFDL from file."); 
    } 
    catch (SAXException saxe) { 
     throw new SAXException("Error parsing XFDL into XML Document."); 
    } 
} 

我在Java代碼中是這樣寫的文件保存到磁盤:

/** 
    * Saves the current document to the specified location 
    * @param destination Desired destination for the file. 
    * @param asXML True if output needs should be as un-encoded XML not Base64/GZIP 
    * @throws IOException File cannot be created at specified location 
    * @throws TransformerConfigurationExample 
    * @throws TransformerException 
    */ 
    public void saveFile(String destination, boolean asXML) 
     throws IOException, 
      TransformerConfigurationException, 
      TransformerException 
     { 

     BufferedWriter bf = new BufferedWriter(new FileWriter(destination)); 
     bf.write(FILE_HEADER_BLOCK); 
     bf.newLine(); 
     bf.flush(); 
     bf.close(); 

     OutputStream outStream; 
     if(!asXML) { 
      outStream = new GZIPOutputStream(
       new Base64.OutputStream(
         new FileOutputStream(destination, true))); 
     } else { 
      outStream = new FileOutputStream(destination, true); 
     } 

     Transformer t = TransformerFactory.newInstance().newTransformer(); 
     t.transform(new DOMSource(doc), new StreamResult(outStream)); 

     outStream.flush(); 
     outStream.close();  
    } 

希望有所幫助。

1

我一直在做這樣的事情,這應該適用於PHP。你必須有一個可寫的tmp文件夾,這個php文件被命名爲example.php!

<?php 
    function gzdecode($data) { 
     $len = strlen($data); 
     if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { 
      echo "FILE NOT GZIP FORMAT"; 
      return null; // Not GZIP format (See RFC 1952) 
     } 
     $method = ord(substr($data,2,1)); // Compression method 
     $flags = ord(substr($data,3,1)); // Flags 
     if ($flags & 31 != $flags) { 
      // Reserved bits are set -- NOT ALLOWED by RFC 1952 
      echo "RESERVED BITS ARE SET. VERY BAD"; 
      return null; 
     } 
     // NOTE: $mtime may be negative (PHP integer limitations) 
     $mtime = unpack("V", substr($data,4,4)); 
     $mtime = $mtime[1]; 
     $xfl = substr($data,8,1); 
     $os = substr($data,8,1); 
     $headerlen = 10; 
     $extralen = 0; 
     $extra  = ""; 
     if ($flags & 4) { 
      // 2-byte length prefixed EXTRA data in header 
      if ($len - $headerlen - 2 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $extralen = unpack("v",substr($data,8,2)); 
      $extralen = $extralen[1]; 
      if ($len - $headerlen - 2 - $extralen < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $extra = substr($data,10,$extralen); 
      $headerlen += 2 + $extralen; 
     } 

     $filenamelen = 0; 
     $filename = ""; 
     if ($flags & 8) { 
      // C-style string file NAME data in header 
      if ($len - $headerlen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $filenamelen = strpos(substr($data,8+$extralen),chr(0)); 
      if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $filename = substr($data,$headerlen,$filenamelen); 
      $headerlen += $filenamelen + 1; 
     } 

     $commentlen = 0; 
     $comment = ""; 
     if ($flags & 16) { 
      // C-style string COMMENT data in header 
      if ($len - $headerlen - 1 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); 
      if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { 
       return false; // Invalid header format 
       echo "INVALID FORMAT"; 
      } 
      $comment = substr($data,$headerlen,$commentlen); 
      $headerlen += $commentlen + 1; 
     } 

     $headercrc = ""; 
     if ($flags & 1) { 
      // 2-bytes (lowest order) of CRC32 on header present 
      if ($len - $headerlen - 2 < 8) { 
       return false; // Invalid format 
       echo "INVALID FORMAT"; 
      } 
      $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; 
      $headercrc = unpack("v", substr($data,$headerlen,2)); 
      $headercrc = $headercrc[1]; 
      if ($headercrc != $calccrc) { 
       echo "BAD CRC"; 
       return false; // Bad header CRC 
      } 
      $headerlen += 2; 
     } 

     // GZIP FOOTER - These be negative due to PHP's limitations 
     $datacrc = unpack("V",substr($data,-8,4)); 
     $datacrc = $datacrc[1]; 
     $isize = unpack("V",substr($data,-4)); 
     $isize = $isize[1]; 

     // Perform the decompression: 
     $bodylen = $len-$headerlen-8; 
     if ($bodylen < 1) { 
      // This should never happen - IMPLEMENTATION BUG! 
      echo "BIG OOPS"; 
      return null; 
     } 
     $body = substr($data,$headerlen,$bodylen); 
     $data = ""; 
     if ($bodylen > 0) { 
      switch ($method) { 
       case 8: 
        // Currently the only supported compression method: 
        $data = gzinflate($body); 
        break; 
       default: 
        // Unknown compression method 
        echo "UNKNOWN COMPRESSION METHOD"; 
       return false; 
      } 
     } else { 
      // I'm not sure if zero-byte body content is allowed. 
      // Allow it for now... Do nothing... 
      echo "ITS EMPTY"; 
     } 

     // Verifiy decompressed size and CRC32: 
     // NOTE: This may fail with large data sizes depending on how 
     //  PHP's integer limitations affect strlen() since $isize 
     //  may be negative for large sizes. 
     if ($isize != strlen($data) || crc32($data) != $datacrc) { 
      // Bad format! Length or CRC doesn't match! 
      echo "LENGTH OR CRC DO NOT MATCH"; 
      return false; 
     } 
     return $data; 
    } 
    echo "<html><head></head><body>"; 
    if (empty($_REQUEST['upload'])) { 
     echo <<<_END 
    <form enctype="multipart/form-data" action="example.php" method="POST"> 
    <input type="hidden" name="MAX_FILE_SIZE" value="100000" /> 
    <table> 
    <th> 
    <input name="uploadedfile" type="file" /> 
    </th> 
    <tr> 
    <td><input type="submit" name="upload" value="Convert File" /></td> 
    </tr> 
    </table> 
    </form> 
    _END; 

    } 
    if (!empty($_REQUEST['upload'])) { 
     $file   = "tmp/" . $_FILES['uploadedfile']['name']; 
     $orgfile  = $_FILES['uploadedfile']['name']; 
     $name   = str_replace(".xfdl", "", $orgfile); 
     $convertedfile = "tmp/" . $name . ".xml"; 
     $compressedfile = "tmp/" . $name . ".gz"; 
     $finalfile  = "tmp/" . $name . "new.xfdl"; 
     $target_path = "tmp/"; 
     $target_path = $target_path . basename($_FILES['uploadedfile']['name']); 
     if (move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) { 
     } else { 
      echo "There was an error uploading the file, please try again!"; 
     } 
     $firstline  = "application/vnd.xfdl; content-encoding=\"base64-gzip\"\n"; 
     $data   = file($file); 
     $data   = array_slice($data, 1); 
     $raw   = implode($data); 
     $decoded  = base64_decode($raw); 
     $decompressed = gzdecode($decoded); 
     $compressed  = gzencode($decompressed); 
     $encoded  = base64_encode($compressed); 
     $decoded2  = base64_decode($encoded); 
     $decompressed2 = gzdecode($decoded2); 
     $header   = bin2hex(substr($decoded, 0, 10)); 
     $tail   = bin2hex(substr($decoded, -8)); 
     $header2  = bin2hex(substr($compressed, 0, 10)); 
     $tail2   = bin2hex(substr($compressed, -8)); 
     $header3  = bin2hex(substr($decoded2, 0, 10)); 
     $tail3   = bin2hex(substr($decoded2, -8)); 
     $filehandle  = fopen($compressedfile, 'w'); 
     fwrite($filehandle, $decoded); 
     fclose($filehandle); 
     $filehandle  = fopen($convertedfile, 'w'); 
     fwrite($filehandle, $decompressed); 
     fclose($filehandle); 
     $filehandle  = fopen($finalfile, 'w'); 
     fwrite($filehandle, $firstline); 
     fwrite($filehandle, $encoded); 
     fclose($filehandle); 
     echo "<center>"; 
     echo "<table style='text-align:center' >"; 
     echo "<tr><th>Stage 1</th>"; 
     echo "<th>Stage 2</th>"; 
     echo "<th>Stage 3</th></tr>"; 
     echo "<tr><td>RAW DATA -></td><td>DECODED DATA -></td><td>UNCOMPRESSED DATA -></td></tr>"; 
     echo "<tr><td>LENGTH: ".strlen($raw)."</td>"; 
     echo "<td>LENGTH: ".strlen($decoded)."</td>"; 
     echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>"; 
     echo "<tr><td><a href='tmp/".$orgfile."'/>ORIGINAL</a></td><td>GZIP HEADER:".$header."</td><td><a href='".$convertedfile."'/>XML CONVERTED</a></td></tr>"; 
     echo "<tr><td></td><td>GZIP TAIL:".$tail."</td><td></td></tr>"; 
     echo "<tr><td><textarea cols='30' rows='20'>" . $raw . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decoded . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>"; 
     echo "<tr><th>Stage 6</th>"; 
     echo "<th>Stage 5</th>"; 
     echo "<th>Stage 4</th></tr>"; 
     echo "<tr><td>ENCODED DATA <-</td><td>COMPRESSED DATA <-</td><td>UNCOMPRESSED DATA <-</td></tr>"; 
     echo "<tr><td>LENGTH: ".strlen($encoded)."</td>"; 
     echo "<td>LENGTH: ".strlen($compressed)."</td>"; 
     echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>"; 
     echo "<tr><td></td><td>GZIP HEADER:".$header2."</td><td></td></tr>"; 
     echo "<tr><td></td><td>GZIP TAIL:".$tail2."</td><td></td></tr>"; 
     echo "<tr><td><a href='".$finalfile."'/>FINAL FILE</a></td><td><a href='".$compressedfile."'/>RE-COMPRESSED FILE</a></td><td></td></tr>"; 
     echo "<tr><td><textarea cols='30' rows='20'>" . $encoded . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $compressed . "</textarea></td>"; 
     echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>"; 
     echo "</table>"; 
     echo "</center>"; 
    } 
    echo "</body></html>"; 
    ?>