2014-05-18 26 views
0

如何使用awk或sed將以下XML標記轉換爲帶管道分隔文件的文本。 我嘗試了下面的awk,但它沒有從Content type標籤返回全文。任何幫助都會很棒。使用awk或sed將XML轉換爲PIPE分隔的輸出文件

Input_file.dat

 <entry> 
      <updated>2014-05-17T16:34:00-07:00</updated> 
       <id>994568497</id> 
       <title>No longer usable</title> 
       <content type="text">I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.</content> 
       <im:contentType term="Application" label="Application"/> 
       <im:voteSum>0</im:voteSum> 
       <im:voteCount>0</im:voteCount> 
       <im:rating>1</im:rating> 
       <im:version>4.2.0.165</im:version> 
       <author><name>Arcdouble</name><uri>https://test.com/us/reviews/id199894255</uri></author> 
     </entry> 

預計output_file.csv格式

|2014-05-17T16:34:00-07:00|994568497|No longer usable|I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.|1|Arcdouble|https://test.com/us/reviews/id199894255| 
+1

你不得不與像XSLT或至少一個XML解析器更好的運氣,比如Python自帶比使用awk或sed的ElementTree的模塊。它們分別用於處理記錄(組織的信息領域)或行,而不是像XML中那樣的分層結構。 –

+0

是的,沒錯,但我正在嘗試使用bash腳本,並嘗試使用以下命令返回值,但有一段時間它會截斷文本消息。 'awk -F'[<>]''{ORS =「|」}; \ /「output_file.csv」}; \ /「output_file.csv」}; \ /「output_file.csv」}; \ /<content type =「text」/ {split($ 3,d); print d [1]「\ n」>>「output_file.csv」}'Input_file.dat' – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/3347931/">user3347931</a></span> <span></span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+2</span></div> <div class="col-lg-11"> <p class="commenttext">請使用適當的xml解析器,可以使用任何語言的許多好的解析器。 – <span class="text-secondary"> <small> <span></span> </small> </span> </p> </div> </div> </div> </div> </div> </article> </div> <div class="answer-title"> <span class="text-logo margin-top-sm">A</span> <h2 class="title h4">回答</h2> </div> <div class="item-description text-md markdown-body margin-bottom-40 voidso"> <article class="board-top-1 padding-top-10"> <div class="post-col vote-info"> <span class="count">1<i class="fa fa-thumbs-up"></i></span> <i class="fa fa-check fa-2x"></i> </div> <div class="post-offset"> <div class="answer fmt"> <p>下面的代碼應該爲你工作:</p> <pre><code class="prettyprint-override">perl -ne '/<\/entry>/ && print "\n"; />(.*?)</ && !/<name>/ && print $1."|"; /<name>/ && /name>?(.*?)<\/.*?(uri>?)(.*)?<\/uri/ && print $1."|".$3' </code></pre> <p>輸入:</p> <pre><code class="prettyprint-override"><a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0e7a676f69614e6a6b6262">[email protected]</a>:~$ cat file <entry> <updated>2014-05-17T16:34:00-07:00</updated> <id>994568497</id> <title>No longer usable</title> <content type="text">I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.</content> <im:contentType term="Application" label="Application"/> <im:voteSum>0</im:voteSum> <im:voteCount>0</im:voteCount> <im:rating>1</im:rating> <im:version>4.2.0.165</im:version> <author><name>Arcdouble</name><uri>https://test.com/us/reviews/id199894255</uri></author> </entry> <entry> <updated>2014-05-17T16:34:00-07:00</updated> <id>994568497</id> <title>No longer usable</title> <content type="text">I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.</content> <im:contentType term="Application" label="Application"/> <im:voteSum>0</im:voteSum> <im:voteCount>0</im:voteCount> <im:rating>1</im:rating> <im:version>4.2.0.165</im:version> <author><name>Arcdouble</name><uri>https://test.com/us/reviews/id199894255</uri></author> </entry> </code></pre> <p>執行:</p> <pre><code class="prettyprint-override"><a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6b1f020a0c042b0f0e0707">[email protected]</a>:~$ cat file | perl -ne '/<\/entry>/ && print "\n"; />(.*?)</ && !/<name>/ && print $1."|"; /<name>/ && /name>?(.*?)<\/.*?(uri>?)(.*)?<\/uri/ && print $1."|".$3' 2014-05-17T16:34:00-07:00|994568497|No longer usable|I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.|0|0|1|4.2.0.165|Arcdouble|https://test.com/us/reviews/id199894255 2014-05-17T16:34:00-07:00|994568497|No longer usable|I happen to like the new look, but it crashes with each attempt to use it to perform any real action. Fix it quickly please!.|0|0|1|4.2.0.165|Arcdouble|https://test.com/us/reviews/id199894255 </code></pre> </div> <div class="post-info"> <div class="post-meta row"> <p class="text-secondary col-lg-6"> <span class="source"> <a rel="noopener" target="_blank" href="https://stackoverflow.com/q/23726614">來源</a> </span> </p> <p class="text-secondary col-lg-6"> <span class="float-right date"> <span>2014-05-18 20:38:01</span> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/3640161/">Tiago</a></span> </p> <p class="col-12"></p> <p class="col-12"></p></div> </div> <!-- comments --> <div class="comments"> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+2</span></div> <div class="col-lg-11"> <p class="commenttext">不要用正則表達式解析xml。請。 – <span class="text-secondary"> <small> <span></span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+0</span></div> <div class="col-lg-11"> <p class="commenttext">有時我們只需要用單線程完成工作,但是謝謝你的建議:) – <span class="text-secondary"> <small> <a rel="noopener" target="_blank" href="https://stackoverflow.com/users/3640161/">Tiago</a></span> <span></span> </small> </span> </p> </div> </div> </div> <div itemprop="comment" class="post-comment"> <div class="row"> <div class="col-lg-1"><span class="text-secondary">+3</span></div> <div class="col-lg-11"> <p class="commenttext">不,不要用正則表達式解析xml。請。不要。甚至不要爭辯說你需要完成這項工作,因爲這從一開始就被嚴重破壞了。只是不要用正則表達式來分析xml。相信我。而且,由於您使用的是Perl,請使用適當的解析器,例如LibXML。 – <span class="text-secondary"> <small> <span></span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相關問題</div> <ul class="relative_list"> <li> 1. <a href="http://hk.uwenku.com/question/p-rsljhppp-u.html" target="_blank" title="使用sed或awk格式化爲逗號分隔的XML"> 使用sed或awk格式化爲逗號分隔的XML </a> </li> <li> 2. <a href="http://hk.uwenku.com/question/p-eslpqigb-rc.html" target="_blank" title="使用SED或AWK使用分隔符分隔.csv列數據"> 使用SED或AWK使用分隔符分隔.csv列數據 </a> </li> <li> 3. <a href="http://hk.uwenku.com/question/p-yqpatvgb-g.html" target="_blank" title="替換AWK的XML文件的部分或sed的"> 替換AWK的XML文件的部分或sed的 </a> </li> <li> 4. <a href="http://hk.uwenku.com/question/p-vzruedfr-qp.html" target="_blank" title="使用sed或awk將「days hh:mm:ss」字段轉換爲hh:mm:ss或秒"> 使用sed或awk將「days hh:mm:ss」字段轉換爲hh:mm:ss或秒 </a> </li> <li> 5. <a href="http://hk.uwenku.com/question/p-zmqpmdpf-bmw.html" target="_blank" title="AWK/SED轉換正值降至負面分號隔開文件"> AWK/SED轉換正值降至負面分號隔開文件 </a> </li> <li> 6. <a href="http://hk.uwenku.com/question/p-zocobnis-tz.html" target="_blank" title="Awk輸入輸出文件分隔符"> Awk輸入輸出文件分隔符 </a> </li> <li> 7. <a href="http://hk.uwenku.com/question/p-raycepgl-hr.html" target="_blank" title="使用awk或sed將行轉換爲列"> 使用awk或sed將行轉換爲列 </a> </li> <li> 8. <a href="http://hk.uwenku.com/question/p-zzcbloeu-by.html" target="_blank" title="用分隔符awk將文件分解爲多個文件awk"> 用分隔符awk將文件分解爲多個文件awk </a> </li> <li> 9. <a href="http://hk.uwenku.com/question/p-thjbnciv-bmp.html" target="_blank" title="使用SED或AWK替換文本零"> 使用SED或AWK替換文本零 </a> </li> <li> 10. <a href="http://hk.uwenku.com/question/p-suwjslaj-bg.html" target="_blank" title="製表符分隔文本文件轉換爲XML(Javascript或PHP)?"> 製表符分隔文本文件轉換爲XML(Javascript或PHP)? </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://hk.uwenku.com/question/p-neqmphsg-bdq.html" target="_blank" title="用逗號使用AWK分隔或低於所用逗號使用AWK分隔或標籤SED以下"> 用逗號使用AWK分隔或低於所用逗號使用AWK分隔或標籤SED以下 </a> </li> <li> 12. <a href="http://hk.uwenku.com/question/p-wzpppqzj-bq.html" target="_blank" title="將sed單行語句轉換爲awk"> 將sed單行語句轉換爲awk </a> </li> <li> 13. <a href="http://hk.uwenku.com/question/p-syduqgqd-bka.html" target="_blank" title="將Unix grep/sed/awk轉換爲Ruby"> 將Unix grep/sed/awk轉換爲Ruby </a> </li> <li> 14. <a href="http://hk.uwenku.com/question/p-qzzpvvqi-bgq.html" target="_blank" title="PHP將分隔文本文件轉換爲XML"> PHP將分隔文本文件轉換爲XML </a> </li> <li> 15. <a href="http://hk.uwenku.com/question/p-yfgxyoqh-bmv.html" target="_blank" title="Java - 將管道分隔文本文件轉換爲XML"> Java - 將管道分隔文本文件轉換爲XML </a> </li> <li> 16. <a href="http://hk.uwenku.com/question/p-ubpkjjmg-bhz.html" target="_blank" title="使用修改文件awk或者sed"> 使用修改文件awk或者sed </a> </li> <li> 17. <a href="http://hk.uwenku.com/question/p-wxhsorqj-br.html" target="_blank" title="使用awk,sed或grep獲取文本文件的子部分"> 使用awk,sed或grep獲取文本文件的子部分 </a> </li> <li> 18. <a href="http://hk.uwenku.com/question/p-bpcudogu-db.html" target="_blank" title="sed或awk使用文本文件替換文本"> sed或awk使用文本文件替換文本 </a> </li> <li> 19. <a href="http://hk.uwenku.com/question/p-vzlrples-tr.html" target="_blank" title="使用SED或AWK"> 使用SED或AWK </a> </li> <li> 20. <a href="http://hk.uwenku.com/question/p-onhmcvcn-baa.html" target="_blank" title="更換使用awk圖案或sed的"> 更換使用awk圖案或sed的 </a> </li> <li> 21. <a href="http://hk.uwenku.com/question/p-dyznhxrk-og.html" target="_blank" title="將文件(csv,excel,製表符分隔)轉換爲XML"> 將文件(csv,excel,製表符分隔)轉換爲XML </a> </li> <li> 22. <a href="http://hk.uwenku.com/question/p-vvfesuwn-bkg.html" target="_blank" title="Java - 將管道和逗號分隔文件轉換爲xml"> Java - 將管道和逗號分隔文件轉換爲xml </a> </li> <li> 23. <a href="http://hk.uwenku.com/question/p-nwibyigx-oh.html" target="_blank" title="SED(或AWK)在CSV文件"> SED(或AWK)在CSV文件 </a> </li> <li> 24. <a href="http://hk.uwenku.com/question/p-edtbwjia-ek.html" target="_blank" title="需要使用XSLT轉換將輸入XML轉換爲輸出XML"> 需要使用XSLT轉換將輸入XML轉換爲輸出XML </a> </li> <li> 25. <a href="http://hk.uwenku.com/question/p-oqvrnohq-zg.html" target="_blank" title="將xml文件轉換爲.doc或.ppt"> 將xml文件轉換爲.doc或.ppt </a> </li> <li> 26. <a href="http://hk.uwenku.com/question/p-stupqekb-tq.html" target="_blank" title="使用LINQ將分隔字符串轉換爲C#中的xml"> 使用LINQ將分隔字符串轉換爲C#中的xml </a> </li> <li> 27. <a href="http://hk.uwenku.com/question/p-epefblqa-tn.html" target="_blank" title="SED,AWK替換XML元素"> SED,AWK替換XML元素 </a> </li> <li> 28. <a href="http://hk.uwenku.com/question/p-nbyytbrk-qu.html" target="_blank" title="將選擇列輸出轉換爲分號分隔列表sp_send_dbmail"> 將選擇列輸出轉換爲分號分隔列表sp_send_dbmail </a> </li> <li> 29. <a href="http://hk.uwenku.com/question/p-znfaomhr-do.html" target="_blank" title="用sed或awk對輸出數據進行分組"> 用sed或awk對輸出數據進行分組 </a> </li> <li> 30. <a href="http://hk.uwenku.com/question/p-hwaiunpy-bao.html" target="_blank" title="修改用於將XML轉換爲製表符分隔文本文件的XSLT"> 修改用於將XML轉換爲製表符分隔文本文件的XSLT </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048261"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新問題 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-admcamqy-zv.html" target="_blank" title="如何使用searchview在自定義ArrayAdapter上實現搜索功能?"> 如何使用searchview在自定義ArrayAdapter上實現搜索功能? </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-evwrhwhi-bab.html" target="_blank" title="在輸入時檢查輸入(僅限數字)"> 在輸入時檢查輸入(僅限數字) </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-xxbbncdr-bak.html" target="_blank" title="Pyqt5如何避免由while循環無限凍結程序?"> Pyqt5如何避免由while循環無限凍結程序? </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-xzdwhhry-bcq.html" target="_blank" title="typescript generic type guard"> typescript generic type guard </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-firqqgoj-bch.html" target="_blank" title="用戶名和密碼登錄java項目"> 用戶名和密碼登錄java項目 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-aeubrdrm-bca.html" target="_blank" title="如何在wordpress中的單一類別中顯示隨機帖子"> 如何在wordpress中的單一類別中顯示隨機帖子 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-nvtdcwlr-bbu.html" target="_blank" title="如何使用querySelector"> 如何使用querySelector </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-qquxyzho-bbd.html" target="_blank" title="與Java"> 與Java </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-rsbfmzar-bar.html" target="_blank" title="如何在顯示列表後刪除方括號?"> 如何在顯示列表後刪除方括號? </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-wmvmjowj-xg.html" target="_blank" title="System.TypeLoadException Microsoft.VisualBasic ASP.NET Core 2"> System.TypeLoadException Microsoft.VisualBasic ASP.NET Core 2 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相關問題</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-rsljhppp-u.html" target="_blank" title="使用sed或awk格式化爲逗號分隔的XML"> 使用sed或awk格式化爲逗號分隔的XML </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-eslpqigb-rc.html" target="_blank" title="使用SED或AWK使用分隔符分隔.csv列數據"> 使用SED或AWK使用分隔符分隔.csv列數據 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-yqpatvgb-g.html" target="_blank" title="替換AWK的XML文件的部分或sed的"> 替換AWK的XML文件的部分或sed的 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-vzruedfr-qp.html" target="_blank" title="使用sed或awk將「days hh:mm:ss」字段轉換爲hh:mm:ss或秒"> 使用sed或awk將「days hh:mm:ss」字段轉換爲hh:mm:ss或秒 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-zmqpmdpf-bmw.html" target="_blank" title="AWK/SED轉換正值降至負面分號隔開文件"> AWK/SED轉換正值降至負面分號隔開文件 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-zocobnis-tz.html" target="_blank" title="Awk輸入輸出文件分隔符"> Awk輸入輸出文件分隔符 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-raycepgl-hr.html" target="_blank" title="使用awk或sed將行轉換爲列"> 使用awk或sed將行轉換爲列 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-zzcbloeu-by.html" target="_blank" title="用分隔符awk將文件分解爲多個文件awk"> 用分隔符awk將文件分解爲多個文件awk </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-thjbnciv-bmp.html" target="_blank" title="使用SED或AWK替換文本零"> 使用SED或AWK替換文本零 </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-suwjslaj-bg.html" target="_blank" title="製表符分隔文本文件轉換爲XML(Javascript或PHP)?"> 製表符分隔文本文件轉換爲XML(Javascript或PHP)? </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://hk.uwenku.com/contact">聯系我們</a></li> <li>© 2020 HK.UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>