PHP Preg_match模式從字幕srt文件中刪除時間

我需要一個preg_match表達式來從.srt字幕文件中刪除所有的時間（導入爲一個字符串），但我永遠無法讓我的腦袋回合正則表達式模式。因此，例如它會改變：PHP Preg_match模式從字幕srt文件中刪除時間

5 
00:05:50,141 --> 00:05:54,771 
This is what was said

到

This is what was said

來源

2017-07-15 Hasen

你有幾個例子，所以我們可以清楚地看到他們是如何變化吧。 – Doug

是否這樣？ https://regex101.com/r/QY9QXG/1 – Andreas

@Doug他們真的沒有。第一個數字是字幕的計數，新行，然後是開始時間和結束時間。然後是新行和文本。 – Andreas

不確定你卡在哪裏，它只是\ d +和冒號/逗號。

$re = '/\d+.\d+:\d+:\d+,\d+\s-->\s\d+:\d+:\d+,\d+./s'; 
//$re = '\d+.[0-9:,]+\s-->\s[\d+:,]+./s'; //slightly compacter version of the regex 
$str = '5 
00:05:50,141 --> 00:05:54,771 
This is what was said'; 
$subst = ''; 

$result = preg_replace($re, $subst, $str); 

echo $result;

工作演示here。
與小更緊湊的模式，它看起來像：https://regex101.com/r/QY9QXG/2

和公正的樂趣和挑戰。這是一個非正則表達式的答案。 https://3v4l.org/r7hbO

$str = "1 
00:05:50,141 --> 00:05:54,771 
This is what was said1 

2 
00:05:50,141 --> 00:05:54,771 
This is what was said2 

3 
00:05:50,141 --> 00:05:54,771 
This is what was said3 

4 
00:05:50,141 --> 00:05:54,771 
This is what was said4 
LLLL 

5 
00:05:50,141 --> 00:05:54,771 
This is what was said5"; 


$count = explode(PHP_EOL.PHP_EOL, $str); 

foreach($count as &$line){ 
    $line = implode(PHP_EOL, array_slice(explode(PHP_EOL, $line), 2)); 
} 

echo implode(PHP_EOL.PHP_EOL, $count);

的非正則表達式將第一分割雙新線，這意味着每一個新的字幕組是在陣列中的新項目。
然後通過它們循環並在新行上再次爆炸。
前兩行是不需要的，array將它們分開。
如果字幕不止一行，我需要合併它們。用新線路解決這個問題。

然後，作爲最後一步，重新生成字符串，使用兩行新的implode。

由於Casimir在下面的評論中寫道我已經使用了PHP_EOL作爲新行，並且在該示例中起作用。
但是，在真實的srt文件上使用時，新行可能會有所不同。
如果代碼無法按預期工作，請嘗試用其他一些新行替換PHP_EOL。

來源

2017-07-15 13:25:22 Andreas

感謝Aravindh。忘記了鏈接。 – Andreas

任何人可以評論downvote？我做錯了什麼？ – Andreas

這絕對是答案...不能理解那一個...... – funilrys

因此，考慮This is what was said開始用大寫，並且可以用標點符號一文中，我提出以下建議：

$re = '/.*([A-Z]{1}[A-Za-z0-9 _.,?!"\/\'$]*)/'; 

$str = '5 
00:05:50,141 --> 00:05:54,771 
This is what was said.'; 

preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE, 0); 

// Print the entire match result 
var_dump($matches);

來源

2017-07-15 13:22:06 funilrys

請記住它是字幕文件。就像在電影和電視節目中看到的副標題一樣。因此，我認爲需要超過A-Z。 – Andreas

Right @Andreas ... – funilrys

您如何看待我的更新@Andreas？ – funilrys

PHP代碼：

$str = '5 
00:05:50,141 --> 00:05:54,771 
This is what was said'; 
$reg = '/(.{0,}[0,1]{0,}\s{0,}[0-9]{0,}.{0,}[0-9]+[0-9]+:[0-9]{0,}.{0,})/'; 
echo(trim(preg_replace($reg, '', $str)));

來源

2017-07-15 13:35:30

由於SRT文件具有總是相同的格式，你可以跳過兩個第一線每條線路塊，一旦達到空行返回結果。要做到這一點，以避免加載在內存中的整個文件，可以逐行讀取文件，並使用一臺發電機：

function getSubtitleLine($handle) { 
    $flag = 0; 
    $subtitle = ''; 
    while (false !== $line = stream_get_line($handle, 1024, "\n")) { 
     $line = rtrim($line); 
     if (empty($line)) { 
      yield $subtitle; 
      $subtitle = ''; 
      $flag = 0; 
     } elseif ($flag == 2) { 
      $subtitle .= empty($subtitle) ? $line : "\n$line"; 
     } else { 
      $flag++; 
     } 
    } 

    if (!empty($subtitle)) 
     yield $subtitle; 
} 

if (false !== $handle = fopen('./test.srt', 'r')) { 
    foreach (getSubtitleLine($handle) as $line) { 
     echo $line, PHP_EOL; 
    } 
}

來源

2017-07-15 14:41:46

PHP Preg_match模式從字幕srt文件中刪除時間

回答

相關問題