我需要一個preg_match表達式來從.srt字幕文件中刪除所有的時間(導入爲一個字符串),但我永遠無法讓我的腦袋回合正則表達式模式。因此,例如它會改變:PHP Preg_match模式從字幕srt文件中刪除時間
5
00:05:50,141 --> 00:05:54,771
This is what was said
到
This is what was said
我需要一個preg_match表達式來從.srt字幕文件中刪除所有的時間(導入爲一個字符串),但我永遠無法讓我的腦袋回合正則表達式模式。因此,例如它會改變:PHP Preg_match模式從字幕srt文件中刪除時間
5
00:05:50,141 --> 00:05:54,771
This is what was said
到
This is what was said
不確定你卡在哪裏,它只是\ d +和冒號/逗號。
$re = '/\d+.\d+:\d+:\d+,\d+\s-->\s\d+:\d+:\d+,\d+./s';
//$re = '\d+.[0-9:,]+\s-->\s[\d+:,]+./s'; //slightly compacter version of the regex
$str = '5
00:05:50,141 --> 00:05:54,771
This is what was said';
$subst = '';
$result = preg_replace($re, $subst, $str);
echo $result;
工作演示here。
與小更緊湊的模式,它看起來像:https://regex101.com/r/QY9QXG/2
$str = "1
00:05:50,141 --> 00:05:54,771
This is what was said1
2
00:05:50,141 --> 00:05:54,771
This is what was said2
3
00:05:50,141 --> 00:05:54,771
This is what was said3
4
00:05:50,141 --> 00:05:54,771
This is what was said4
LLLL
5
00:05:50,141 --> 00:05:54,771
This is what was said5";
$count = explode(PHP_EOL.PHP_EOL, $str);
foreach($count as &$line){
$line = implode(PHP_EOL, array_slice(explode(PHP_EOL, $line), 2));
}
echo implode(PHP_EOL.PHP_EOL, $count);
的非正則表達式將第一分割雙新線,這意味着每一個新的字幕組是在陣列中的新項目。
然後通過它們循環並在新行上再次爆炸。
前兩行是不需要的,array將它們分開。
如果字幕不止一行,我需要合併它們。用新線路解決這個問題。
然後,作爲最後一步,重新生成字符串,使用兩行新的implode。
由於Casimir在下面的評論中寫道我已經使用了PHP_EOL作爲新行,並且在該示例中起作用。
但是,在真實的srt文件上使用時,新行可能會有所不同。
如果代碼無法按預期工作,請嘗試用其他一些新行替換PHP_EOL。
因此,考慮This is what was said
開始用大寫,並且可以用標點符號一文中,我提出以下建議:
$re = '/.*([A-Z]{1}[A-Za-z0-9 _.,?!"\/\'$]*)/';
$str = '5
00:05:50,141 --> 00:05:54,771
This is what was said.';
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE, 0);
// Print the entire match result
var_dump($matches);
PHP代碼:
$str = '5
00:05:50,141 --> 00:05:54,771
This is what was said';
$reg = '/(.{0,}[0,1]{0,}\s{0,}[0-9]{0,}.{0,}[0-9]+[0-9]+:[0-9]{0,}.{0,})/';
echo(trim(preg_replace($reg, '', $str)));
由於SRT文件具有總是相同的格式,你可以跳過兩個第一線每條線路塊,一旦達到空行返回結果。要做到這一點,以避免加載在內存中的整個文件,可以逐行讀取文件,並使用一臺發電機:
function getSubtitleLine($handle) {
$flag = 0;
$subtitle = '';
while (false !== $line = stream_get_line($handle, 1024, "\n")) {
$line = rtrim($line);
if (empty($line)) {
yield $subtitle;
$subtitle = '';
$flag = 0;
} elseif ($flag == 2) {
$subtitle .= empty($subtitle) ? $line : "\n$line";
} else {
$flag++;
}
}
if (!empty($subtitle))
yield $subtitle;
}
if (false !== $handle = fopen('./test.srt', 'r')) {
foreach (getSubtitleLine($handle) as $line) {
echo $line, PHP_EOL;
}
}
你有幾個例子,所以我們可以清楚地看到他們是如何變化吧。 – Doug
是否這樣? https://regex101.com/r/QY9QXG/1 – Andreas
@Doug他們真的沒有。第一個數字是字幕的計數,新行,然後是開始時間和結束時間。然後是新行和文本。 – Andreas