2016-03-04 72 views
1

我想分析大的XML文件,並存儲到他們的數據庫(MySQL的) 這樣的XML: 文件XML〜200MB 我該怎麼做才能解析這個XML文件? 如何獲得像孩子一樣的元素。它有2個部分'vuln'和'脆弱配置' 謝謝!如何分析NVD CVE XML和導入數據庫

<entry id="CVE-2015-0002"> 
    <vuln:vulnerable-configuration id="http://www.nist.gov/"> 
     <cpe-lang:logical-test operator="OR" negate="false"> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_7:-:sp1"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2008:r2:sp1"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8:-"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8.1:-"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:-:gold"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt:-:gold"/> 
     <cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt_8.1:-"/> 
     </cpe-lang:logical-test> 
    </vuln:vulnerable-configuration> 
    <vuln:vulnerable-software-list> 
     <vuln:product>cpe:/o:microsoft:windows_server_2012:-:gold</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_rt:-:gold</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_7:-:sp1</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_rt_8.1:-</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_8:-</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_8.1:-</vuln:product> 
     <vuln:product>cpe:/o:microsoft:windows_server_2008:r2:sp1</vuln:product> 
    </vuln:vulnerable-software-list> 
    <vuln:cve-id>CVE-2015-0002</vuln:cve-id> 
    <vuln:published-datetime>2015-01-13T17:59:01.253-05:00</vuln:published-datetime> 
    <vuln:last-modified-datetime>2015-01-14T16:51:14.253-05:00</vuln:last-modified-datetime> 
    <vuln:cvss> 
     <cvss:base_metrics> 
     <cvss:score>7.2</cvss:score> 
     <cvss:access-vector>LOCAL</cvss:access-vector> 
     <cvss:access-complexity>LOW</cvss:access-complexity> 
     <cvss:authentication>NONE</cvss:authentication> 
     <cvss:confidentiality-impact>COMPLETE</cvss:confidentiality-impact> 
     <cvss:integrity-impact>COMPLETE</cvss:integrity-impact> 
     <cvss:availability-impact>COMPLETE</cvss:availability-impact> 
     <cvss:source>http://nvd.nist.gov</cvss:source> 
     <cvss:generated-on-datetime>2015-01-14T16:20:33.273-05:00</cvss:generated-on-datetime> 
     </cvss:base_metrics> 
    </vuln:cvss> 
    <vuln:cwe id="CWE-264"/> 
    <vuln:references xml:lang="en" reference_type="VENDOR_ADVISORY"> 
     <vuln:source>MS</vuln:source> 
     <vuln:reference href="http://technet.microsoft.com/security/bulletin/MS15-001" xml:lang="en">MS15-001</vuln:reference> 
    </vuln:references> 
    <vuln:references xml:lang="en" reference_type="UNKNOWN"> 
     <vuln:source>MISC</vuln:source> 
     <vuln:reference href="https://code.google.com/p/google-security-research/issues/detail?id=118" xml:lang="en">https://code.google.com/p/google-security-research/issues/detail?id=118</vuln:reference> 
    </vuln:references> 
    <vuln:references xml:lang="en" reference_type="UNKNOWN"> 
     <vuln:source>MISC</vuln:source> 
     <vuln:reference href="http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/" xml:lang="en">http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/</vuln:reference> 
    </vuln:references> 
    <vuln:references xml:lang="en" reference_type="UNKNOWN"> 
     <vuln:source>MISC</vuln:source> 
     <vuln:reference href="http://twitter.com/sambowne/statuses/550384131683520512" xml:lang="en">http://twitter.com/sambowne/statuses/550384131683520512</vuln:reference> 
    </vuln:references> 
    <vuln:summary>The AhcVerifyAdminContext function in ahcache.sys in the Application Compatibility component in Microsoft Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 does not verify that an impersonation token is associated with an administrative account, which allows local users to gain privileges by running AppCompatCache.exe with a crafted DLL file, aka MSRC ID 20544 or "Microsoft Application Compatibility Infrastructure Elevation of Privilege Vulnerability."</vuln:summary> 
    </entry> 

回答

2

部分回答。

首先看看這個鏈接回答您的大多數問題, How to import XML with nested nodes (parent/child relationships) into Access?

導入XML來訪問,並用一個文件來轉換XML使每個子表拿到鑰匙vuln:CVE- ID鏈接回主表

下面的代碼適用於一些子表,但不是全部,如果任何人都可以指出爲什麼它不適合所有子表的工作,請這樣做。 但是它確實爲您提供了的主表:vulve:cve-id vuln:published-datetime vuln:last-modified-datetime vuln:summary plus cvss:base_metrics cvss:score cvss:access-vector cvss:access-複雜性cvss:源代碼cvss:生成日期時間

將下面的代碼放到名爲transform.xslt的文件中,並在導入到訪問時使用它。您需要添加相應的XSL headders,我無法將它們添加在這篇文章爲「至少需要10聲譽後超過2個鏈接」 :-(

<xsl:template match="/"> 
    <dataroot> 
     <xsl:apply-templates select="@*|node()"/> 
    </dataroot> 
</xsl:template> 

<xsl:template match="@*|node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
</xsl:template> 

<xsl:template match="entry"> 
    <xsl:apply-templates select="@*|node()"/> 
</xsl:template> 

<xsl:template match="cpe-lang:logical-test"> 
    <cpe-lang:logical-test> 
     <vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </cpe-lang:logical-test> 
</xsl:template> 

<xsl:template match="vuln:vulnerable-configuration"> 
    <vuln:vulnerable-configuration> 
     <vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </vuln:vulnerable-configuration> 
</xsl:template> 

<xsl:template match="vuln:vulnerable-software-list"> 
    <vuln:vulnerable-software-list> 
     <vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </vuln:vulnerable-software-list> 
</xsl:template> 

<xsl:template match="cvss:base_metrics"> 
    <cvss:base_metrics> 
     <vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </cvss:base_metrics> 
</xsl:template> 

<xsl:template match="vuln:references"> 
    <vuln:references> 
     <vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </vuln:references> 
</xsl:template> 

<xsl:template match="vuln:scanner"> 
    <vuln:scanner> 
     <vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id> 
     <xsl:apply-templates select="@*|node()"/> 
    </vuln:scanner> 
</xsl:template> 

0

我知道這有點舊了,但我曾經做過類似於你的問題的簡單工作,這是一些非常難看的代碼(在幾個小時內完成),但我認爲它大部分是你要求的,除了導出到一個數據庫,它使用巨大的XML文件(CVE),解析它們的特定鍵/值對,並將它們與網絡ork掃描。

https://github.com/bhealy/netScan

import xml.etree.ElementTree as ET 

tree = ET.parse(XMLfile) 
root = tree.getroot() 
stuffYouCareAbout = root[0][1][2][3].text 

我能夠解析使用這讓人們更方便查找特定項目etree的XML文件。很顯然,樣本正在尋找一個非常具體的指數,但它應該是一個很好的起點(如果這個帖子還不算太晚!)