BASH腳本遍歷XML文件中的ID列表並將名稱打印/輸出到shell或輸出文件？

我正在尋找遍歷XML數據文件中與ID號匹配的ID號列表，並使用BASH（和AWK）將下面的行打印到shell或將其重定向到第三個輸出文件（output.txt）BASH腳本遍歷XML文件中的ID列表並將名稱打印/輸出到shell或輸出文件？

這裏是擊穿：

ID_list.txt（縮短這個例子 - 其具有100點的ID）

XML_example.txt（數千個條目的）

<book> 
    <ID>4414</ID> 
    <name>Name of first book</name> 
</book> 
<book> 
    <ID>4561</ID> 
    <name>Name of second book</name> 
</book>

想我的腳本的輸出是100個標識的名稱從第一個文件：

Name of first book 
Name of second book 
etc

我相信這是可能做到這一點使用bash和AWK一個for循環（每個在文件1，在file2中找到相應的名稱）。我認爲你可以重新獲得身份證號碼的GREP，然後使用AWK打印下面的行。即使輸出看起來像這樣，我可以後刪除XML標籤：

<name>Name of first book</name> 
<name>Name of second book</name>

這是一個Linux服務器上，但我可以將它移植到PowerShell的Windows上。我認爲BASH/GREP和AWK是要走的路。

有人可以幫我腳本嗎？

來源

2014-01-21 Mike J

向我們展示你嘗試過什麼，你有什麼問題 - 否則看起來你希望我們爲你寫信。 – 2014-01-21 17:54:31

Shell和/或awk不是解析XML的正確選擇。 – chepner

@ user2062950，您是對的，請不要在發佈之前發佈我的版本。我正在閱讀時使用;在ID_list.txt解決方案中爲我做了一個，但下面的Dogbane的解決方案更乾淨。 –

這裏有一種方法：

while IFS= read -r id 
do 
    grep -A1 "<ID>$id</ID>" XML_example.txt | grep "<name>" 
done < ID_list.txt

這裏的另一種方式（只有一行）。這是更有效，因爲它使用一個單一的grep來提取所有的ID，而不是循環：

egrep -A1 $(sed -e 's/^/<ID>/g' -e 's/$/<\/ID>/g' ID_list.txt | sed -e :a -e '$!N;s/\n/|/;ta') XML_example.txt | grep "<name>"

輸出：

<name>Name of first book</name> 
<name>Name of second book</name>

來源

2014-01-21 17:56:45 dogbane

謝謝@dogbane。這兩項都按預期工作。我發現第一個更容易閱讀，但兩者都完全符合我的要求。 –

這個平臺上的每個響應都應該像這樣。 – intumwa

$ awk ' 
NR==FNR{ ids["<ID>" $0 "</ID>"]; next } 
found { gsub(/^.*<name>|<[/]name>.*$/,""); print; found=0 } 
$1 in ids { found=1 } 
' ID_list.txt XML_example.txt 
Name of first book 
Name of second book

來源

2014-01-21 17:56:22

給定一個ID，您可以使用XPath Xpressions的和xmllint獲取名稱命令，像這樣：

id=4414 
name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)

這個

所以，你可以寫這樣的：

while read id; do 
    name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml) 
    echo "$name" 
done < id_list.txt

與涉及awk，grep和朋友的解決方案不同，這是使用實際的XML解析工具。這意味着，雖然如果他們遇到的大多數其他的解決方案可能會破壞：

<book><ID>4561</ID><name>Name of second book</name></book>

...這會工作得很好。

xmllint是libxml2程序包的一部分，可用於大多數發行版。

還要注意最近版本的awk有native XML parsing。

來源

2014-01-21 18:00:27 larsks

我會去的BASH_REMATCH途徑，如果我不得不這樣做在bash

BASH_REMATCH 
      An array variable whose members are assigned by the =~ binary 
      operator to the [[ conditional command. The element with index 
      0 is the portion of the string matching the entire regular 
      expression. The element with index n is the portion of the 
      string matching the nth parenthesized subexpression. This vari‐ 
      able is read-only.

因此，像下面

#!/bin/bash 

while read -r line; do 
    [[ $print ]] && [[ $line =~ "<name>"(.*)"</name>" ]] && echo "${BASH_REMATCH[1]}" 

    if [[ $line == "<ID>"*"</ID>" ]]; then 
    print=: 
    else 
    print= 
    fi 
done < "ID_list.txt"

示例輸出

> abovescript 
Name of first book 
Name of second book

來源

2014-01-22 10:29:17 BroSlow

BASH腳本遍歷XML文件中的ID列表並將名稱打印/輸出到shell或輸出文件？

回答

相關問題