2016-02-03 218 views
1

我試着去維基百科打印出來的信息和它的作用:http://i.imgur.com/1vMj3df.jpg打印信息

這是我使用的代碼:

use strict; 
use warnings; 

use WWW::Wikipedia; 
use HTML::Restrict; 

my $wiki = WWW::Wikipedia->new(clean_html => 1); 
my $hr = HTML::Restrict->new; 
my $entry = $wiki->search('Perl'); 
my $processed = $hr->process($entry->text_basic); 
print $processed; 

1; 

IM希望去除上面的東西「Perl是一個家庭「,只打印這些段落。

例如,它打印了這一點:

 {{Infobox programming language 
| name     = Perl 
| logo     = 
| paradigm    = multi-paradigm: functional, imperative, object-oriented (class-based), reflective, procedural, event-driven, generic 
| year     = 
| designer    = Larry Wall 
| developer    = Larry Wall 
| latest_release_version = 5.22.1 
| latest_release_date = 
| latest_preview_version = 5.23.7 
| latest_preview_date = 
| turing-complete  = Yes 
| typing     = Dynamic 
| influenced_by   = AWK, Smalltalk 80, Lisp, C, C++, sed, Unix shell, Pascal 
| influenced    = Chapel, Coffeescript, ECMAScript/JavaScript, Falcon, Julia, LPC, Perl 6, PHP, Python, Qore, Ruby, Windows PowerShell 
| programming_language = C 
| operating_system  = Cross-platform 
| license    = GNU General Public License or Artistic License 
| website    = 
| file_ext    = .pl .pm .t .pod 
| wikibooks    = Perl Programming 
}} 

'Perl' is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6. 

Though Perl is not officially an acronym, there are various backronyms in use, the most well-known being "Practical Extraction and Reporting Language". Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions. Perl 6, which began as a redesign of Perl 5 in 2000, eventually evolved into a separate language. Both languages continue to be developed independently by different development teams and liberally borrow ideas from one another. 

The Perl languages borrow features from other programming languages including C, shell script (sh), AWK, and sed. They provide powerful text processing facilities without the arbitrary data-length limits of many contemporary Unix commandline tools, facilitating easy manipulation of text files. Perl 5 gained widespread popularity in the late 1990s as a CGI scripting language, in part due to its unsurpassed regular expression and string parsing abilities. 

In addition to CGI, Perl 5 is used for graphics programming, system administration, network programming, finance, bioinformatics, and other applications. It has been nicknamed "the Swiss Army chainsaw of scripting languages" because of its flexibility and power, and possibly also because of its "ugliness". In 1998, it was also referred to as the "duct tape that holds the Internet together", in reference to both its ubiquitous use as a glue language and its perceived inelegance. 

我想刪除:

{{Infobox programming language 
| name     = Perl 
| logo     = 
| paradigm    = multi-paradigm: functional, imperative, object-oriented (class-based), reflective, procedural, event-driven, generic 
| year     = 
| designer    = Larry Wall 
| developer    = Larry Wall 
| latest_release_version = 5.22.1 
| latest_release_date = 
| latest_preview_version = 5.23.7 
| latest_preview_date = 
| turing-complete  = Yes 
| typing     = Dynamic 
| influenced_by   = AWK, Smalltalk 80, Lisp, C, C++, sed, Unix shell, Pascal 
| influenced    = Chapel, Coffeescript, ECMAScript/JavaScript, Falcon, Julia, LPC, Perl 6, PHP, Python, Qore, Ruby, Windows PowerShell 
| programming_language = C 
| operating_system  = Cross-platform 
| license    = GNU General Public License or Artistic License 
| website    = 
| file_ext    = .pl .pm .t .pod 
| wikibooks    = Perl Programming 
}} 

我只希望每個搜索結果顯示的段落。

+0

爲什麼不說你從wiki返回的字符串樣本,並指出你想要刪除的部分。然後我們可以給你一個正則表達式來爲你做這項工作。 –

+0

完成了。對不起,我在手機上花了一段時間,這實際上是給朋友的。 – user2524169

+0

這更好,謝謝。請在下面找到我的答案。我希望它有幫助。 –

回答

2

匹配由維基對正則表達式返回的字符串來替換它的第一部分{{}}和空的新的生產線後馬上,由空字符串:

$processed =~ s/\{\{.*\}\}\R\R//s;

正如你可以看到我使用了/s選項來匹配多行,並且我沒有使用g選項,因此我們只替換第一個貪婪匹配。

+0

完美地工作,就在我嘗試其他搜索詞時,它顯示的內容高於第一段的起始位置 – user2524169

+0

您的意思是您想要移除的{{}}之外的額外文本? –

+0

當我嘗試例如,「水泥」一詞。這是它打印出來的:護目鏡和防護手套,與此處所示的工作人員不同,http://www.hse.gov.uk/pubns/cis26.pdfhttp://www.cementaustralia.com.au/wps/wcm/connect /website/packaged-products/home/hints-and-tips/FAQ-Safety/#FAQ-safety-PPE.htmlhttp://www.tdi.texas.gov/pubs/videoresource/stpcement.pdf]] A'水泥「是一種粘合劑,是一種固化並硬化並可將其他材料粘合在一起的物質。 「水泥」這個詞可以追溯到羅馬時期[opus caementicium]],用來描述類似於現代混凝土的砌體,它是 – user2524169