-2
我是Perl新手,但我試圖編寫一個程序將單個HTML文件分割爲多個HTML文件。使用perl分割html文件
#!/usr/bin/env perl
use strict;
#use warnings;
my @file_names;
## Read the list of file names
open(my $fh, "$ARGV[0]");
while (<$fh>) {
chomp; #remove new line character from the end of the line
push @file_names, $_;
}
my $counter = 0;
my ($file_name, $fn);
## Read the input file
open($fh, "$ARGV[1]");
while (<$fh>) {
## If this is an opening class, open the next output file,
## and set $counter to 1.
if (/ class="bch_ha"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
## If this is a closing class, print the line and set $counter back to 0
if (/\n<p sourcepage="(\d+)" class="bch_ha"/) {
$counter = 0;
print $fn $_;
close($fn);
}
if (/ class="bcesu_tt"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
if (/\n<p sourcepage="(\d+)" class="bcekt_tt"/) {
$counter = 0;
print $fn $_;
close($fn);
}
if (/ class="bcekt_tt"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
if (/\n<p sourcepage="(\d+)" class="bcepq_tt"/) {
$counter = 0;
print $fn $_;
close($fn);
}
if (/ class="bcepq_tt"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
if (/\n<p sourcepage="(\d+)" class="bcecs_tt"/) {
$counter = 0;
print $fn $_;
close($fn);
}
if (/ class="bcecs_tt"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
if (/\n<p sourcepage="(\d+)" class="bceex_tt"/) {
$counter = 0;
print $fn $_;
close($fn);
}
if (/ class="bceex_tt"/) {
$counter = 1;
$file_name = shift(@file_names);
open($fn, ">", "$file_name");
#print "<html>\n<body>";
}
if (/\n<\/body>\n<\/html>/) {
$counter = 0;
print $fn $_;
close($fn);
}
## Print into the corresponding file handle if $counter is 1
print $fn $_ if $counter == 1
}
我需要添加更多的選項。代碼應該要求手動輸入分隔符,並且分割文件應該轉到文件夾名稱chapterxx
。請幫助我在這
是啊請找到下面的HTML示例。
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8" />
</head>
<body>
<p sourcepage="27" `class="bch_ha"`></p>
<p sourcepage="26" class="bopob_ct">XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</p>
<p sourcepage="26" class="bopob_cr">Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx% <i>Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</i></p>
<p sourcepage="26" class="bch_nmword">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bch_nm">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bch_tt">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="26" class="bopob_tt">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx% <b>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX% </b>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="26" class="bopob_tx">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</i></p>
<p sourcepage="26" class="bopob_lbfirst">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bopob_lb">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bopob_lb">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bopob_lb">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bch_ha">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b></p>
<p sourcepage="26" class="bopob_lblast">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</b> </p>
<p sourcepage="26" class="bopcs_txfirst">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="26" class="bopcs_tx">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="26" class="bopcs_tx">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="27" class="bopcs_tx">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="27" class="bopcs_tx">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%<span class="sup">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</sup>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
<p sourcepage="27" class="bch_txfirst">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx%</p>
</body>
</html>
我只需要基於類class="bch_ha"
的HTML拆分到下一class="bch_ha"
,譜寫reader_0.html命名爲新的HTML內容。文件名將像reader_1.html一樣增量。
您不能註釋'use warnings'。這些消息表明代碼中的某些內容不太正確,並將它們關閉並不能解決問題! – Borodin
這應該用適當的HTML解析器完成。請顯示原始HTML,以便我們能夠幫助您。如果它在線,那麼一個鏈接是好的 – Borodin
HTML我不能分享,因爲它的保密官方的東西。我只需要通過使用類名稱將html文件拆分爲多個文件,您可以在上面的代碼中看到它。但這應該是動態的,我需要創建一個名稱爲輸入文件的目錄以及所有需要在文件夾中移動的已拆分html。 –