我有這個PowerShell腳本,它剝離了html標籤,並且只保留文本並在腳本執行時顯示該html文件的字數。我的問題是,當我執行:我如何使用PowerShell導出到Excel電子表格
function Html-ToText {
param([System.String] $html)
# remove line breaks, replace with spaces
$html = $html -replace "(`r|`n|`t)", " "
# write-verbose "removed line breaks: `n`n$html`n"
# remove invisible content
@('head', 'style', 'script', 'object', 'embed', 'applet', 'noframes', 'noscript', 'noembed') | % {
$html = $html -replace "<$_[^>]*?>.*?</$_>", ""
}
# write-verbose "removed invisible blocks: `n`n$html`n"
# Condense extra whitespace
$html = $html -replace "()+", " "
# write-verbose "condensed whitespace: `n`n$html`n"
# Add line breaks
@('div','p','blockquote','h[1-9]') | % { $html = $html -replace "</?$_[^>]*?>.*?</$_>", ("`n" + '$0')}
# Add line breaks for self-closing tags
@('div','p','blockquote','h[1-9]','br') | % { $html = $html -replace "<$_[^>]*?/>", ('$0' + "`n")}
# write-verbose "added line breaks: `n`n$html`n"
#strip tags
$html = $html -replace "<[^>]*?>", ""
# write-verbose "removed tags: `n`n$html`n"
# replace common entities
@(
@("&bull;", " * "),
@("&lsaquo;", "<"),
@("&rsaquo;", ">"),
@("&(rsquo|lsquo);", "'"),
@("&(quot|ldquo|rdquo);", '"'),
@("&trade;", "(tm)"),
@("&frasl;", "/"),
@("&(quot|#34|#034|#x22);", '"'),
@('&(amp|#38|#038|#x26);', "&"),
@("&(lt|#60|#060|#x3c);", "<"),
@("&(gt|#62|#062|#x3e);", ">"),
@('&(copy|#169);', "(c)"),
@("&(reg|#174);", "(r)"),
@("&nbsp;", " "),
@("&(.{2,6});", "")
) | % { $html = $html -replace $_[0], $_[1] }
# write-verbose "replaced entities: `n`n$html`n"
return $html + $a | Measure-Object -word
}
然後運行:
HTML的ToText(新對象net.webclient).DownloadString( 「test.html的」)
它顯示4顯示在PowerShell的輸出中的文字。如何將PowerShell窗口中的輸出導出到Excel電子表格中,其列字和計數?
我試過使用:$ x | select-Object Words | Export-Csv out.csv -NoTypeInformation和它吐出的唯一的東西是單詞但不是數量。不知道爲什麼它沒有這樣做。 –
這使我們兩個。 ''word1 word2 word3 word4'| measure-object -Word |選擇對象詞| ConvertTo-Csv -NoTypeInformation' ... – TessellatingHeckler
在控制檯窗口中吐出,而不是在.csv文件中。我仍然得到的只是單詞,但不是它下面的整數 –