我正在嘗試做一些研究,以瞭解如何從已經完成的Informatica Powercenter映射中創建文檔,並且由於不同選項的數量,初始方法對我來說很困難。這裏採用的方法是根據需要多次訪問映射中的每個框,將信息複製到一個word文檔中,進行格式化,每週進行數千次。如何解析xml以提取文檔的字段?
現在我有我認爲是解決方案的一個次級理念:將映射導出到XML,用一個腳本(或程序,我已經嘗試了幾次,用excel,不正當地)解析XML到更多容易複製粘貼,並以這種方式改善我的生活。
XML看起來像這樣(簡化爲儘可能少的行來作爲例子,它可能不是100%有效的,但原始的XML也是,顯然值賦值是我所提出的不與任何相關的東西是價值,而不是它的每一次該字符串):
Type 1 Document:
<!DOCTYPE POWERMART SYSTEM "ValueAssigned">
<POWERMART CREATION_DATE="ValueAssigned" REPOSITORY_VERSION="ValueAssigned">
<REPOSITORY NAME="ValueAssigned" VERSION="ValueAssigned" CODEPAGE="ValueAssigned" DATABASETYPE="ValueAssigned">
<FOLDER NAME="ValueAssigned" GROUP="" OWNER="ValueAssigned" SHARED="ValueAssigned" DESCRIPTION="ValueAssigned" PERMISSIONS="ValueAssigned" UUID="ValueAssigned">
<CONFIG DESCRIPTION ="ValueAssigned" ISDEFAULT ="YES" NAME ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<ATTRIBUTE NAME ="Field1" VALUE =""/>
<ATTRIBUTE NAME ="Field2" VALUE ="NO"/>
<WORKFLOW DESCRIPTION ="" ISENABLED ="ValueAssigned" ISRUNNABLESERVICE ="ValueAssigned" ISSERVICE ="ValueAssigned" ISVALID ="ValueAssigned" NAME ="ValueAssigned" REUSABLE_SCHEDULER ="ValueAssigned" SCHEDULERNAME ="ValueAssigned" SERVERNAME ="ValueAssigned" SERVER_DOMAINNAME ="ValueAssigned" SUSPEND_ON_ERROR ="ValueAssigned" TASKS_MUST_RUN_ON_SERVER ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<SCHEDULER DESCRIPTION ="" NAME ="SchedulerName" REUSABLE ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<SCHEDULEINFO SCHEDULETYPE ="ONDEMAND"/>
</SCHEDULER>
<TASK DESCRIPTION ="ValueAssigned" NAME ="Start" REUSABLE ="NO" TYPE ="Start" VERSIONNUMBER ="1"/>
<SESSION DESCRIPTION ="ValueAssigned" ISVALID ="ValueAssigned" MAPPINGNAME ="ValueAssigned" NAME ="ValueAssigned" REUSABLE ="ValueAssigned" SORTORDER ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="ValueAssigned" PARTITIONTYPE ="ValueAssigned" PIPELINE ="ValueAssigned" SINSTANCENAME ="ValueAssigned" STAGE ="ValueAssigned" TRANSFORMATIONNAME ="ValueAssigned" TRANSFORMATIONTYPE ="Target Definition">
<ATTRIBUTE NAME ="ValueAssigned" VALUE ="ValueAssigned"/>
<ATTRIBUTE NAME ="ValueAssigned" VALUE ="ValueAssigned"/>
</SESSTRANSFORMATIONINST>
因此,如果我們專注於一個任何標記,如
<CONFIG DESCRIPTION ="Default session configuration object" ISDEFAULT ="YES" NAME ="default_session_config" VERSIONNUMBER ="29">
<ATTRIBUTE NAME ="Field1" VALUE =""/>
<ATTRIBUTE NAME ="Field2" VALUE ="NO"/>
我們可以看到有一個標籤,config說明,接下來是幾個屬性名稱。我想到的其中一個選項有點幼稚,但是如果我要將它轉換爲列,使用excel或類似的命令,我可以看到一行包含根標記,然後是不同的類別,以及該分類到達我可以看到的地步:好的,這是標籤,這是一個包含所有值的列,我將它複製到我的Word文檔中並稱之爲一天。因爲在XML中有300到900行之間的任何地方,並且由於引號,常量標籤,列未被對齊,因爲行不具有相同的長度,所以它不容易看見也不容易使用(所以我不能使用列模式)...
我把其他類型的文件,以防萬一它使的信息如何differen是一個更清晰的概念,爲什麼我不跳直入做我自己的解析器的時候了:
<?xml version="ValueAssigned" encoding="ValueAssigned"?>
<!DOCTYPE POWERMART SYSTEM "ValueAssigned">
<POWERMART CREATION_DATE="ValueAssigned" REPOSITORY_VERSION="ValueAssigned">
<REPOSITORY NAME="ValueAssigned" VERSION="ValueAssigned" CODEPAGE="ValueAssigned" DATABASETYPE="ValueAssigned">
<FOLDER NAME="ValueAssigned" GROUP="ValueAssigned" OWNER="ValueAssigned" SHARED="ValueAssigned" DESCRIPTION="ValueAssigned" PERMISSIONS="ValueAssigned" UUID="ValueAssigned">
<SOURCE BUSINESSNAME ="ValueAssigned" DATABASETYPE ="ValueAssigned" DBDNAME ="ValueAssigned" DESCRIPTION ="ValueAssigned" NAME ="ValueAssigned" OBJECTVERSION ="ValueAssigned" OWNERNAME ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<SOURCEFIELD BUSINESSNAME ="ValueAssigned" DATATYPE ="ValueAssigned" DESCRIPTION ="ValueAssigned" FIELDNUMBER ="ValueAssigned" FIELDPROPERTY ="ValueAssigned" FIELDTYPE ="ValueAssigned" HIDDEN ="ValueAssigned" KEYTYPE ="ValueAssigned" LENGTH ="ValueAssigned" LEVEL ="ValueAssigned" NAME ="ValueAssigned" NULLABLE ="ValueAssigned" OCCURS ="ValueAssigned" OFFSET ="ValueAssigned" PHYSICALLENGTH ="ValueAssigned" PHYSICALOFFSET ="ValueAssigned" PICTURETEXT ="ValueAssigned" PRECISION ="ValueAssigned" SCALE ="ValueAssigned" USAGE_FLAGS ="ValueAssigned"/>
<FOLDER NAME="ValueAssigned" GROUP="ValueAssigned" OWNER="ValueAssigned" SHARED="ValueAssigned" DESCRIPTION="ValueAssigned" PERMISSIONS="ValueAssigned" UUID="ValueAssigned">
<SOURCE BUSINESSNAME ="ValueAssigned" CRCVALUE ="ValueAssigned" DATABASETYPE ="ValueAssigned" DBDNAME ="ValueAssigned" DESCRIPTION ="ValueAssigned" IBMCOMP ="ValueAssigned" NAME ="ValueAssigned" OBJECTVERSION ="ValueAssigned" OWNERNAME ="ValueAssigned" VERSIONNUMBER ="ValueAssigned">
<FLATFILE CODEPAGE ="ValueAssigned" CONSECDELIMITERSASONE ="ValueAssigned" DELIMITED ="ValueAssigned" DELIMITERS ="ValueAssigned" ESCAPE_CHARACTER ="ValueAssigned" KEEPESCAPECHAR ="ValueAssigned" LINESEQUENTIAL ="ValueAssigned" MULTIDELIMITERSASAND ="ValueAssigned" NULLCHARTYPE ="ValueAssigned" NULL_CHARACTER ="ValueAssigned" PADBYTES ="ValueAssigned" QUOTE_CHARACTER ="ValueAssigned" REPEATABLE ="ValueAssigned" ROWDELIMITER ="ValueAssigned" SHIFTSENSITIVEDATA ="ValueAssigned" SKIPROWS ="ValueAssigned" STRIPTRAILINGBLANKS ="ValueAssigned"/>
<SOURCEFIELD BUSINESSNAME ="ValueAssigned" DESCRIPTION ="ValueAssigned" FIELDNUMBER ="ValueAssigned" FIELDPROPERTY ="ValueAssigned" FIELDTYPE ="ValueAssigned" HIDDEN ="ValueAssigned" LENGTH ="ValueAssigned" LEVEL ="ValueAssigned" NAME ="ValueAssigned" OCCURS ="ValueAssigned" OFFSET ="ValueAssigned" PHYSICALLENGTH ="ValueAssigned" PHYSICALOFFSET ="ValueAssigned">
這是不是很清楚你想達到什麼。你有興趣從XML中提取什麼特定的數據? 你想在你的word文檔中看到你的信息是什麼樣的? –
關於不清楚,我對此感到抱歉:S。我會試着解釋,好像它已經解決了。我想分析儘可能多的xml,並且以這樣一種乾淨的方式,如果我想複製Session中的Attributes,我會轉到它所在的位置並複製所有這些(這就是爲什麼我想到了excel)。因此,我希望有一個流程能夠輕鬆地從XML中複製非常不同的數據,這些數據看起來像我寫下來的內容,同時考慮到此XML並不總是相同的,並且具有相同的標籤等等。 –