我寫了一個腳本,它向Google發送大塊文本進行翻譯,但有時文本是html源代碼)將最終分裂成html標籤的中間,Google會錯誤地返回代碼。將一個大字符串拆分成一個數組,但拆分點不能破壞標籤
我已經知道如何將字符串拆分成數組,但是有沒有更好的方法來做到這一點,同時確保輸出字符串不超過5000個字符並且不會在標籤上分割?
UPDATE:多虧了答案,這是我最終使用在我的項目的代碼,它的偉大工程
function handleTextHtmlSplit($text, $maxSize) {
//our collection array
$niceHtml[] = '';
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
//the current position of the index
$currentPiece = 0;
//start assembling a group until it gets to max size
foreach ($pieces as $piece) {
//make sure string length of this piece will not exceed max size when inserted
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
//advance current piece
//will put overflow into next group
$currentPiece += 1;
//create empty string as value for next piece in the index
$niceHtml[$currentPiece] = '';
}
//insert piece into our master array
$niceHtml[$currentPiece] .= $piece;
}
//return array of nicely handled html
return $niceHtml;
}
哇琥珀,謝謝你。它應該真的讓我的車輪轉動。我會放棄它。 – james 2010-07-21 01:57:49